Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some CI fails when it installs Apptainer 1.3.0 #517

Closed
trz42 opened this issue Mar 26, 2024 · 3 comments · Fixed by #528
Closed

Some CI fails when it installs Apptainer 1.3.0 #517

trz42 opened this issue Mar 26, 2024 · 3 comments · Fixed by #528
Labels
bug Something isn't working

Comments

@trz42
Copy link
Collaborator

trz42 commented Mar 26, 2024

We've seen this for various PRs (#488, #496, #507, #512, #514)

The last successful run was using Apptainer 1.2.5 (https://github.com/EESSI/software-layer/actions/runs/8284497353).

As a short-time workaround we stick to Apptainer < 1.3.0 (#516).

We should revert that change either when a newer Apptainer works again or when we have found another solution to the issue (e.g., running apptainer with additional arguments such as --underlay)

@trz42 trz42 added the bug Something isn't working label Mar 26, 2024
@boegel
Copy link
Contributor

boegel commented Mar 26, 2024

It seems like there's a breaking change in behavior, or perhaps a regression, in Apptainer 1.3.0 related to FUSE mounting, quite a bit of things were changed there, see Apptainer 1.3.0 release notes.

So ideally we figure out an easy way to reproduce the problem with Apptainer 1.3.0, so we can report this upstream...

@casparvl
Copy link
Collaborator

casparvl commented Apr 2, 2024

Well, something I found out is that it is not actually 'fixed' with Apptainer 1.2.5, but we get

ERROR: Not running in Gentoo Prefix environment, run '/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/startprefix' first!

In the CI run, which then quites (with exit status 1), and somehow that causes a 'pass' (green checkmark) on the CI run :\

See e.g. the 'success' in this run https://github.com/EESSI/software-layer/actions/runs/8442084868/job/23122717808

Whereas a 'failed' run actually proceeds much further: https://github.com/EESSI/software-layer/actions/runs/8521814031/job/23340720129

@casparvl
Copy link
Collaborator

casparvl commented Apr 2, 2024

Ok, the issue indeed was the installation of a full cuda SDK, which was too much for the CI (in terms of storage, I believe). A bit weird that we didn't get a very clear error here, but a 'hanging' CI (though Bob somewhere, at some point, mentioned having seen a disk space error?).

In any case, #528 adds an option to EESSI-install-software.sh so that the installation of the CUDA SDK in the host_injections prefix can be skipped. It means we are not testing that part, but... fine. It's the best we can do given the resources we have in CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants