-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify intended usage of --container-name #30
Comments
Hello @sfeltman, the intent was to save a container state across job steps. So for example within a sbatch script or a salloc. In our cluster we had a Slurm epilog to manually cleanup the named containers at the end of the job, and the commit above was part of a change to move this cleanup logic to Pyxis directly. We didn't want to allow named containers to be shared across different jobs, since it's usually challenging to make sure you land on the same nodes across jobs. I need to look more into what happens when job arrays are involved, I didn't test this use case yet. Perhaps there is an unexpected interaction with the SPANK API. By the way, I don't quite understand what you mean by "PID sharing", could you explain? |
@sfeltman I see that the Slurm epilog is called for each job of the job array. So how were you planning to cleanup the named containers for this use case? I don't see any way to know when the job array is entirely finished on one node. |
Hi @flx42, Thanks for the explanation, I think some of my confusion stems from the cmdline help makes it seems like the feature is more general purpose. With regards to "PID sharing", I meant container PID re-use from a running container with the same name. In terms of job arrays and cleanup. The idea is to use a new job that is dependent on the array job completion to do cleanup on any potential node the array ran on. |
That seems tricky, making sure the follow-up job runs on exactly the same nodes. But at the same time that use case seems similar to #28 |
@sfeltman I just pushed 5a7d900 You should be able to get the previous behavior with a config flag like the following:
There will still be a |
Could you also describe the kind of problems you've seen with containers reusing existing PIDs? It just means it will share the container namespaces, is that an issue? I'm wondering if there is a bug lurking here. |
Hi Felix, Thanks for the update. Below I've pasted some records of some of the errors we were running into. This was with Pyxis version 0.8.1 and enroot 3.1.0. I played with addding a --no-container-pid-reuse option which fixed the issue. However, this was on top of the master branch, so it may have also been conflated with other changes since 0.8.1...
|
Ok, it's probably a race condition between the different jobs here. For instance if the job being joined terminates when the new one is starting up. |
With regards to sharing the container namespace, does this mean the cgroups resources are actually shared or the limit values just copied? With array jobs, each job in the array are independent jobs using the same limit values, but would have their own CPU/GPU/memory allocations. |
The cgroups should still be per-job, but it will get a bit weird for the jobs reusing the initial container, since they will join the cgroup namespace while being under a cgroup outside of this namespace. |
I just confirmed the current HEAD 5a7d900 without any of my changes still exhibits the problems I mentioned when sharing the container name between array jobs (using container_scope=global option). |
Yes, this aspect is more tricky and for now I'm not too keen on adding another command-line argument for this, since the main intended use case is to have named containers with a job-level scope. |
Is it possible to just specify a path to the sqsh files?
I just want to run srun with an option to pyxis/enroot to use that sqsh file. |
Oops, sorry. That's simple. I overlooked that part in the docs. Thank you. |
I think this is solved now, closing. |
I mean that we're probably not going to add a knob for disabling PID sharing when a container exists, at least not right now. |
We had been attempting to use --container-name to share enroot containers across Slurm job arrays. This ended up having a lot of issues due to PID sharing of between the array jobs running on the same machine (we didn't know it did this until reading the Pyxis code). While this could be fixed with some sort of option to disable PID sharing. Commit a35027c added a prefix of "pyxis_$JOBID" to the container-name which would then break the idea.
Please clarify the intended usage of --container-name. We had been hoping to use it for speeding up array jobs that use big containers on the same machine and manually managing the enroot container import directory before/after the job array.
Thanks
The text was updated successfully, but these errors were encountered: