-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with using a CVMFS MPI application inside a Singularity container #2606
Comments
I think you're spot on. Both the In order to make the separation between local (/var/lib/cvmfs) and shared spaces (alien cache directory) clearer, I'd suggest to use a newer configuration syntax in /etc/cvmfs/default.local as described in the advanced cache configuration section:
|
I tried that but unfortunately it doesn't help. The problem is, I think that when I use something like
the issue is that rather than mount CVMFS once on a node, what I doing is telling it to do 4 different mounts, one for each MPI task on the node. Each of these mounts is sharing the var and lib directories because I don't (currently) have a way to tell them to use a unique one. I'll see if I can figure out a way around that. |
Ah, I see.. yes, they need to be local and unique per mount. The cvmfs configuration files can make (limited) use of bash scripting. So in the default.local file, you can specify subdirectories e.g. per PID or MPI rank or by other useful environment variables. (Such subdirectories are not automatically removed though.) |
Ok, great, that's what I was hoping, if I can use |
That should work in practice. Notice though that you'd be taking the PID of the temporary shell process parsing the config file, not of the cvmfs2 fuse module. So there is a slight chance that PIDs are reused, although probably quite a theoretical possibility. |
Ok, I got a little further. So the idea of |
I also tried using the I should say that I am not actually getting an error any more, the process just seems to hang:
(note that |
This is solved by using the
|
Ah, I wish I would have recognized what the issue was. My singcvmfs command also adds -S /var/run/cvmfs at the end of its script and maps /var/lib/cvmfs to a given directory (including a default). The problem with using -S on /var/lib/cvmfs is that the cache will get thrown away after each run. I wonder if using singcvmfs would make what you're trying to do easier, perhaps with modifications if needed. |
Yeah, for my case that's ok because I'm using a (pre-populated) alien cache so there's really nothing to throw away in |
Depending on the application and filesystem, I worry that using a shared alien cache can cause havoc on a filesystem's metadata server due to large numbers of requests for small files. Have you tried it yet at large scales? If so I'd be interested in hearing about how many files the application accesses at startup time, the number of nodes, and what filesystem the alien cache is on. |
No, I haven't got that far yet, but I should say that the way things are set up right now is for a user (or group), not a system wide alien cache so I wouldn't see the metadata workload being any worse than starting up a self-compiled set of software. I was considering using the tiered cache mentioned in the docs which should help, but the first task was to get it working in the first place. |
Actually I just tried this out. I realised I probably had to do a custom tiered configuration:
where |
In order to test that the setup is effective, you can check that The fact that the cache on the ram disk is unbounded is likely to become a problem though. I'd rather advise to use the RAM cache plugin. You can configure the plugin such that it listens on a TCP socket and connect several mount points to it. The cache will be shared. You might need some extra logic to start the plugin process before the containers start on the nodes. |
I guess that a UNIX domain socket that gets bind mounted into the containers should work just as well. |
It does indeed get populated. I'll take a deeper look into the RAM cache plugin another day. |
I think I don't have a good understanding of how the cache plugin works in this scenario, for me it is failing with:
I tried just mounting a directory and assuming it would create the socket itself. I also tried creating a socket and before launching Singularity and binding it but that lead to the error above. I really don't know much about sockets, so I'm shooting in the dark at this point. |
In EESSI/filesystem-layer#37 I've created a script for an HPC system to create an alien cache in a shared space that can be used by execution nodes that are not connected to the internet (all done inside a Singularity container).
Everything works fine until I try to execute an application in parallel, the problems arise when I try to run an application using MPI (see EESSI/filesystem-layer#38 for details). The basic error is
which leads to a Fatal error. Is there anything I can do to get around this? I suspect it might be due to this envvar
and some kind of race condition (it did work for 2 MPI processes, worked sometimes for 4 and never for 6).
The text was updated successfully, but these errors were encountered: