Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prefer using Singularity's support for scratch directories over bind mounting for /var/lib/cvmfs and /var/run/cvmfs #40

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

boegel
Copy link
Contributor

@boegel boegel commented Sep 24, 2020

It turns out that bind mounting /var/lib/cvmfs and /var/run/cvmfs to temporary directories can lead to problems in some situations, leading to "Failed to initialize loader socket" errors when mounting the /cvmfs repositories.

This has proven to be an issue in multiple different situations:

Using the support that Singularity has for scratch directories (singularity --scratch or equivalently $SINGULARITY_SCRATCH) circumvents these issues.

…mounting for /var/lib/cvmfs and /var/run/cvmfs
@boegel boegel requested a review from ocaisa September 24, 2020 08:11
@ocaisa
Copy link
Member

ocaisa commented Sep 24, 2020

The downside of this is that you are removing your cache every time you shut the container. That was ok for me because I was using a pre-populated alien cache.

@boegel
Copy link
Contributor Author

boegel commented Sep 24, 2020

@ocaisa So should we have separate sections? One for bind mounting (persistent cache), one using --scratch (no persistent cache), one for alien cache + --scratch?

That's going to complicate the pilot instructions quite a bit...

@ocaisa
Copy link
Member

ocaisa commented Sep 24, 2020

I think we don't really have too much choice but to cover various scenarios. Every option is going to have to be able to run MPI workloads and I believe bind mounting is not going to tick that box.

After that there is whether we have an internet connection or not. This is actually not that complicated if we advise the use of an alien cache (but an alien cache is unmanaged which means it can grow arbitrarily large). Whether we need to pre-populate is defined by whether or not we have internet access from where the alien cache is being used (and if we don't prepopulate, anyone who uses the cache will need write permissions there).

That's my "current" understanding of things.

@ocaisa
Copy link
Member

ocaisa commented Sep 24, 2020

I've been continuously updating the script in EESSI/filesystem-layer#37 and that is now using 2 layers of alien cache, one shared (read only) and one local. The shared layer is pre-populated only with the requested stack (not the whole repo), and this could be restricted even further if I use archspec (from inside the container) to generate a smarter dirTab file

@ocaisa
Copy link
Member

ocaisa commented Sep 24, 2020

In general, I think it is really important that we get this advice right.

Another option is to create a squid proxy on the login if the login is connected to the outside world. As an unprivileged user you'd have to be able to have that running for the entire job queuing and execution time (I imagine).

@rptaylor made some suggestions about how to do this as well. I think it's worth having a serious discussion about (and integrating some testing for).

Copy link
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always work if where you are trying things has an internet connection (but hey you need that to pull the image anyway). We should warn people though that the cache is getting blown away and if you want to do more than just try things out you will need additional (cache) configuration.

@bedroge
Copy link
Collaborator

bedroge commented Oct 2, 2020

I had some issues with that -S option this week, and @ocaisa just ran into the same issue. Turns out that with -S you get a tmpfs with (by default; can be configured in the site's singularity.conf) only 16MB of space... The way to solve this is by also using -W / --workdir or $SINGULARITY_WORKDIR to specify where the scratch directory should be created.

@ocaisa
Copy link
Member

ocaisa commented Oct 2, 2020

Unfortunately I'm pretty sure there is a "but" here, and that is that using --workdir will re-introduce the problem with running MPI codes...if you are exclusively using the alien caches, you should be fine with the 16MB...I think...

@ocaisa
Copy link
Member

ocaisa commented Oct 2, 2020

Our docs will contain a lot of if/then/else

@bedroge
Copy link
Collaborator

bedroge commented Oct 2, 2020

Unfortunately I'm pretty sure there is a "but" here, and that is that using --workdir will re-introduce the problem with running MPI codes...if you are exclusively using the alien caches, you should be fine with the 16MB...I think...

Ah, yeah, you're right. It doesn't seem to create a unique dir inside the specified directory. So maybe we should then instruct to do something like mktemp -d -p /scratch/location, and use that for the --workdir? Or:

export SINGULARITY_WORKDIR=`mktemp -d -p /some/scratch/dir

@boegel
Copy link
Contributor Author

boegel commented Oct 2, 2020

@bedroge That won't help? You basically need a unique scratch dir per MPI process, if I understand correctly...

@bedroge
Copy link
Collaborator

bedroge commented Oct 2, 2020

Whoops, right... then it has to be in the singularity command that gets invoked by srun.

@bedroge
Copy link
Collaborator

bedroge commented Oct 2, 2020

Or let srun call some wrapper script that creates a unique dir for that process/task?

@mulderij
Copy link
Contributor

mulderij commented Sep 14, 2022

The solution with export SINGULARITY_SCRATCH="/var/lib/cvmfs,/var/run/cvmfs" instead of SINGULARITY_BIND="$TMP/var-run-cvmfs:/var/run/cvmfs,$TMP/var-lib-cvmfs:/var/lib/cvmfs" wasn't enough in my situation. It resulted in this error:

Singularity> source /cvmfs/pilot.eessi-hpc.org/latest/init/bash
Found EESSI pilot repo @ /cvmfs/pilot.eessi-hpc.org/versions/2021.12!
/cvmfs/pilot.eessi-hpc.org/versions/2021.12/compat/linux/x86_64/usr/lib/python-exec/python3.9/python3: error while loading shared libraries: libpython3.9.so.1.0: cannot open shared object file: Input/output error
ERROR: no value set for $EESSI_SOFTWARE_SUBDIR

This was fixed by defining a singularity workdir:

mkdir -p $TMP/{home,workdir}
export `SINGULARITY_WORKDIR="$TMP/workdir"

The result would be:

export TMP="/tmp/$USER/eessi"
export SINGULARITY_SCRATCH="/var/lib/cvmfs,/var/run/cvmfs"
mkdir -p $TMP/{home,workdir}
export
export SINGULARITY_WORKDIR="$TMP/workdir"
export SINGULARITY_HOME="$TMP/home:/home/$USER"
export EESSI_PILOT="container:cvmfs2 pilot.eessi-hpc.org /cvmfs/pilot.eessi-hpc.org"
singularity shell --fusemount "$EESSI_PILOT" docker://ghcr.io/eessi/client-pilot:centos7

This works well with a workdir on a local disk, but when redirecting to a shared FS (BeeGFS) it is still not a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants