-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
squashfuse Performance #665
Comments
Thanks so much for this report and the details on your benchmark! Indeed I was able to reproduce the issue and run many of my own measurements using your benchmark. I have access to 16-core nodes that have both local disk and lustre. They have dual 2.6 GHz Intel E5-2650v2 CPUs and 128G of RAM. Because I had 16 cores I ran my tests with 32 events instead of 200, and I always pre-converted the container to the format being tested. I don't have access to a setuid-root installation on the machine so couldn't measure using kernel squashfs (hopefully I can get a sysadmin to cooperate for a test later). I included testing the image from cvmfs at
These are the timings I found in minutes and seconds:
Clearly the time for squashfuse in that last measurement is unacceptable. However, the very good news is that there is an existing squashfuse pull request that adds multithreading support to the squashfuse_ll command. I measured the following with squashfuse_ll (after removing -o uid=NN,gid=NN because that's not supported):
So that makes a huge difference and I plan to include the patched squashfuse_ll in apptainer packaging for now until the new feature is distributed. |
Thanks @DrDaveD! Let me know when there is an new Apptainer 1.1.0rc EL7 RPM with the patched squashfuse_ll included and I will be happy to test. However, I would be somewhat concerned about including a patched/development release of squashfuse in a production Apptainer release, as it may lead to stability or other issues. If you decide to proceed that way, could you also please include an Apptainer option to disable the use of squashfuse, and revert to the old automatic temporary sandbox creation behavior, for example as implemented in my PR #668? |
Normally I would also be concerned with using an unreleased patch in production code, but this has such a huge impact on the user experience with default apptainer 1.1.0 that I'm willing to risk it and work on fixing any problems that are discovered. |
The fix is in #673, it would be great if you could compile it from source and run your benchmark on it. Follow the updated instructions in INSTALL.md for including the enhanced performance squashfuse_ll in an rpm. |
Oh, I forgot: instead of compiling it yourself you can download an rpm (for now, until it gets cleaned up) from this fedora koji scratch build. |
Thanks @DrDaveD. I've installed the RPM from Koji, verified it contains squashfuse_ll, and have started some tests. |
Just an update: I can confirm the patched/multithreaded squashfuse_ll performance is considerably better, and runtimes for the above container are on par with unpacked SIF. Thanks! |
I redid the measurements using the same benchmark on a single node (instead of mixing them up on comparable nodes), this time including measuring kernel squashfs with setuid, the rest non-setuid, all with apptainer-1.1.0-rc.3. These are the results with the average of two runs (none of which varied between each other by more than 1%):
The first 4 are nearly identical, and cvmfs is not far behind. |
@hollowec I also tried the cms-gen-sim-bmk with the same parameters and the differences are not as dramatic. I ran a subset of the tests one time each and got the following results:
So the most dramatic change was from standard squashfuse to standard squashfuse_ll. Even the multithreading patch didn't make that much difference. My question for you is, are there other benchmarks that I should be trying? Or is atlas-gen-bwk the most stressful of the benchmarks on code storage? |
Hi @DrDaveD. lhcb-gen-sim-bmk:v2.1 (options --threads 1 and --events 5) was another container which appeared to be largely affected by the squashfuse performance issue. |
FYI I've run tests against the complete HEPscoreBeta benchmark set (https://gitlab.cern.ch/hep-benchmarks/hep-score - atlas-gen-bmk, cms-gen-sim-bmk, and lhcb-gen-sim-bmk are part of this set), and since the introduction of the patched squashfuse_ll binary in the 1.1.0rc3 release, runtimes are very similar to temporary unpacked SIF. |
Thanks for that additional info. I ran
The multithreaded squashfuse_ll does have a clear advantage over standard squashfuse_ll with this one. |
Version of Apptainer
Expected behavior
When using SIF images with unprivileged Apptainer, execution time should be similar to unprivileged Singularity.
Actual behavior
Apptainer's move to squashfuse for unprivileged (user namespace) mounts of SIF images has significantly increased the execution time of some containers, compared to automatically unpacking SIF images to a temporary sandbox as unprivileged Singularity did. I believe this is primarily a concern for containers running multiple processes/threads, as it seems there is a single squashfuse process to handle all of the parallel I/O requests and decompression.
Steps to reproduce this behavior
apptainer run -i -c -e -B /tmp/atlasgen:/results -B /tmp docker://gitlab-registry.cern.ch/hep-benchmarks/hep-workloads/atlas-gen-bmk:v2.1 -W --threads 1 --events 200
This is an ATLAS event generation benchmark container that will run a process per logical core on the host. Execution times on a system with 2x AMD EPYC 7351 CPUs (64 logical cores total):
Singularity with user namespaces (unpack to sandbox)
Execution time: ~24 min
Apptainer with setuid (squashfs privileged mount)
Execution time: ~25 min
Apptainer with user namespaces (squashfuse mount)
Execution time: ~2 hours 50 minutes
During execution, I see the squashfuse process using 100% of a single CPU core during most of the run.
Ideally the default behavior would be to revert to automatically unpacking SIF images when used unprivileged.
What OS/distro are you running
Scientific Linux 7
How did you install Apptainer
RPM from EPEL testing repo.
The text was updated successfully, but these errors were encountered: