prefer using Singularity's support for scratch directories over bind mounting for /var/lib/cvmfs and /var/run/cvmfs #40

boegel · 2020-09-24T08:11:10Z

It turns out that bind mounting /var/lib/cvmfs and /var/run/cvmfs to temporary directories can lead to problems in some situations, leading to "Failed to initialize loader socket" errors when mounting the /cvmfs repositories.

This has proven to be an issue in multiple different situations:

When using a temporary directory on a shared filesystem rather than using /tmp (encountered by @trz42)
When running MPI software (encountered by @ocaisa, see Using the stack in a container on a parallel FS filesystem-layer#38)

Using the support that Singularity has for scratch directories (singularity --scratch or equivalently $SINGULARITY_SCRATCH) circumvents these issues.

…mounting for /var/lib/cvmfs and /var/run/cvmfs

ocaisa · 2020-09-24T08:13:52Z

The downside of this is that you are removing your cache every time you shut the container. That was ok for me because I was using a pre-populated alien cache.

boegel · 2020-09-24T08:28:36Z

@ocaisa So should we have separate sections? One for bind mounting (persistent cache), one using --scratch (no persistent cache), one for alien cache + --scratch?

That's going to complicate the pilot instructions quite a bit...

ocaisa · 2020-09-24T08:37:06Z

I think we don't really have too much choice but to cover various scenarios. Every option is going to have to be able to run MPI workloads and I believe bind mounting is not going to tick that box.

After that there is whether we have an internet connection or not. This is actually not that complicated if we advise the use of an alien cache (but an alien cache is unmanaged which means it can grow arbitrarily large). Whether we need to pre-populate is defined by whether or not we have internet access from where the alien cache is being used (and if we don't prepopulate, anyone who uses the cache will need write permissions there).

That's my "current" understanding of things.

ocaisa · 2020-09-24T08:40:23Z

I've been continuously updating the script in EESSI/filesystem-layer#37 and that is now using 2 layers of alien cache, one shared (read only) and one local. The shared layer is pre-populated only with the requested stack (not the whole repo), and this could be restricted even further if I use archspec (from inside the container) to generate a smarter dirTab file

ocaisa · 2020-09-24T08:52:31Z

In general, I think it is really important that we get this advice right.

Another option is to create a squid proxy on the login if the login is connected to the outside world. As an unprivileged user you'd have to be able to have that running for the entire job queuing and execution time (I imagine).

@rptaylor made some suggestions about how to do this as well. I think it's worth having a serious discussion about (and integrating some testing for).

ocaisa

This will always work if where you are trying things has an internet connection (but hey you need that to pull the image anyway). We should warn people though that the cache is getting blown away and if you want to do more than just try things out you will need additional (cache) configuration.

bedroge · 2020-10-02T10:29:56Z

I had some issues with that -S option this week, and @ocaisa just ran into the same issue. Turns out that with -S you get a tmpfs with (by default; can be configured in the site's singularity.conf) only 16MB of space... The way to solve this is by also using -W / --workdir or $SINGULARITY_WORKDIR to specify where the scratch directory should be created.

ocaisa · 2020-10-02T10:41:20Z

Unfortunately I'm pretty sure there is a "but" here, and that is that using --workdir will re-introduce the problem with running MPI codes...if you are exclusively using the alien caches, you should be fine with the 16MB...I think...

ocaisa · 2020-10-02T10:42:17Z

Our docs will contain a lot of if/then/else

bedroge · 2020-10-02T10:45:18Z

Unfortunately I'm pretty sure there is a "but" here, and that is that using --workdir will re-introduce the problem with running MPI codes...if you are exclusively using the alien caches, you should be fine with the 16MB...I think...

Ah, yeah, you're right. It doesn't seem to create a unique dir inside the specified directory. So maybe we should then instruct to do something like mktemp -d -p /scratch/location, and use that for the --workdir? Or:

export SINGULARITY_WORKDIR=`mktemp -d -p /some/scratch/dir

boegel · 2020-10-02T16:13:45Z

@bedroge That won't help? You basically need a unique scratch dir per MPI process, if I understand correctly...

bedroge · 2020-10-02T16:22:42Z

Whoops, right... then it has to be in the singularity command that gets invoked by srun.

bedroge · 2020-10-02T16:30:22Z

Or let srun call some wrapper script that creates a unique dir for that process/task?

mulderij · 2022-09-14T14:53:55Z

The solution with export SINGULARITY_SCRATCH="/var/lib/cvmfs,/var/run/cvmfs" instead of SINGULARITY_BIND="$TMP/var-run-cvmfs:/var/run/cvmfs,$TMP/var-lib-cvmfs:/var/lib/cvmfs" wasn't enough in my situation. It resulted in this error:

Singularity> source /cvmfs/pilot.eessi-hpc.org/latest/init/bash
Found EESSI pilot repo @ /cvmfs/pilot.eessi-hpc.org/versions/2021.12!
/cvmfs/pilot.eessi-hpc.org/versions/2021.12/compat/linux/x86_64/usr/lib/python-exec/python3.9/python3: error while loading shared libraries: libpython3.9.so.1.0: cannot open shared object file: Input/output error
ERROR: no value set for $EESSI_SOFTWARE_SUBDIR

This was fixed by defining a singularity workdir:

mkdir -p $TMP/{home,workdir}
export `SINGULARITY_WORKDIR="$TMP/workdir"

The result would be:

export TMP="/tmp/$USER/eessi"
export SINGULARITY_SCRATCH="/var/lib/cvmfs,/var/run/cvmfs"
mkdir -p $TMP/{home,workdir}
export
export SINGULARITY_WORKDIR="$TMP/workdir"
export SINGULARITY_HOME="$TMP/home:/home/$USER"
export EESSI_PILOT="container:cvmfs2 pilot.eessi-hpc.org /cvmfs/pilot.eessi-hpc.org"
singularity shell --fusemount "$EESSI_PILOT" docker://ghcr.io/eessi/client-pilot:centos7

This works well with a workdir on a local disk, but when redirecting to a shared FS (BeeGFS) it is still not a solution.

prefer using Singularity's support for scratch directories over bind …

c9de90f

…mounting for /var/lib/cvmfs and /var/run/cvmfs

boegel requested a review from ocaisa September 24, 2020 08:11

ocaisa requested changes Sep 24, 2020

View reviewed changes

casparvl mentioned this pull request Jan 8, 2021

Script to configure and populate an alien cache EESSI/filesystem-layer#37

Open

Base automatically changed from master to main February 18, 2021 15:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prefer using Singularity's support for scratch directories over bind mounting for /var/lib/cvmfs and /var/run/cvmfs #40

prefer using Singularity's support for scratch directories over bind mounting for /var/lib/cvmfs and /var/run/cvmfs #40

boegel commented Sep 24, 2020

ocaisa commented Sep 24, 2020 •

edited

Loading

boegel commented Sep 24, 2020

ocaisa commented Sep 24, 2020

ocaisa commented Sep 24, 2020 •

edited

Loading

ocaisa commented Sep 24, 2020

ocaisa left a comment

bedroge commented Oct 2, 2020

ocaisa commented Oct 2, 2020

ocaisa commented Oct 2, 2020

bedroge commented Oct 2, 2020 •

edited

Loading

boegel commented Oct 2, 2020

bedroge commented Oct 2, 2020

bedroge commented Oct 2, 2020

mulderij commented Sep 14, 2022 •

edited

Loading

prefer using Singularity's support for scratch directories over bind mounting for /var/lib/cvmfs and /var/run/cvmfs #40

Are you sure you want to change the base?

prefer using Singularity's support for scratch directories over bind mounting for /var/lib/cvmfs and /var/run/cvmfs #40

Conversation

boegel commented Sep 24, 2020

ocaisa commented Sep 24, 2020 • edited Loading

boegel commented Sep 24, 2020

ocaisa commented Sep 24, 2020

ocaisa commented Sep 24, 2020 • edited Loading

ocaisa commented Sep 24, 2020

ocaisa left a comment

Choose a reason for hiding this comment

bedroge commented Oct 2, 2020

ocaisa commented Oct 2, 2020

ocaisa commented Oct 2, 2020

bedroge commented Oct 2, 2020 • edited Loading

boegel commented Oct 2, 2020

bedroge commented Oct 2, 2020

bedroge commented Oct 2, 2020

mulderij commented Sep 14, 2022 • edited Loading

ocaisa commented Sep 24, 2020 •

edited

Loading

ocaisa commented Sep 24, 2020 •

edited

Loading

bedroge commented Oct 2, 2020 •

edited

Loading

mulderij commented Sep 14, 2022 •

edited

Loading