Skip to content
Felix Abecassis edited this page Jul 27, 2020 · 21 revisions

Overview

Pyxis being a SPANK plugin, the new command-line arguments it introduces are directly added to srun.

$ srun --help
...
      --container-image=[USER@][REGISTRY#]IMAGE[:TAG]|PATH
                              [pyxis] the image to use for the container
                              filesystem. Can be either a docker image given as
                              an enroot URI, or a path to a squashfs file on the
                              remote host filesystem.

      --container-mounts=SRC:DST[:FLAGS][,SRC:DST...]
                              [pyxis] bind mount[s] inside the container. Mount
                              flags are separated with "+", e.g. "ro+rprivate"

      --container-workdir=PATH
                              [pyxis] working directory inside the container
      --container-name=NAME   [pyxis] name to use for saving and loading the
                              container on the host. Unnamed containers are
                              removed after the slurm task is complete; named
                              containers are not. If a container with this name
                              already exists, the existing container is used and
                              the import is skipped.
      --container-save=PATH   [pyxis] Save the container state to a squashfs
                              file on the remote host filesystem.
      --container-mount-home  [pyxis] bind mount the user's home directory.
                              System-level enroot settings might cause this
                              directory to be already-mounted.

      --no-container-mount-home
                              [pyxis] do not bind mount the user's home
                              directory
      --container-remap-root  [pyxis] ask to be remapped to root inside the
                              container. Does not grant elevated system
                              permissions, despite appearances.

      --no-container-remap-root
                              [pyxis] do not remap to root inside the container
      --container-entrypoint  [pyxis] execute the entrypoint from the container
                              image

      --no-container-entrypoint
                              [pyxis] do not execute the entrypoint from the
                              container image

--container-image

This argument activates the Pyxis plugin and containerizes the submitted job. If no container registry is specified, the image will be pulled from Docker Hub:

$ srun --container-image=centos grep PRETTY /etc/os-release
PRETTY_NAME="CentOS Linux 8 (Core)"

You can pull the container image from any container registry, like you would do with the docker CLI:

$ srun --container-image nvcr.io/nvidia/pytorch:20.03-py3

You can use a squashfs file (from --container-save or enroot export) by passing its path as the argument:

$ srun --container-image ~/ubuntu.sqsh

If this file is on a shared filesystem, this is is useful for avoiding to pull the same image on all nodes of your cluster.

--container-mounts

This argument can be used to expose folders or files from the host system to the container. It is similar to the -v (or --mount type=bind) argument of docker run.

For instance, to bind-mount the /mnt folder from the host as /data inside the container:

$ srun --container-image ubuntu --container-mounts /mnt:/data ls /data

Using the same syntax, you can also mount files:

$ srun --container-image ubuntu --container-mounts /etc/os-release:/host/os-release cat /host/os-release

If the source and destination are identical, you can use the short-form with a single path:

$ srun --container-image ubuntu --container-mounts /mnt ls /mnt

You can also use relative paths (using the job's current working directory):

$ srun --container-image ubuntu --container-mounts ./config:/root/config cat /root/config

Finally, you can use additional mount flags such as ro (read-only), to prevent the container from unintentionally modifying the content from the host:

$ srun --container-image ubuntu --container-mounts /tmp/config:/root/config:ro sh -c 'echo oops > /root/config'
/usr/bin/sh: 1: cannot create /root/config: Read-only file system

--container-name

This argument is used to save the state of the container filesystem, in order to reuse it across srun commands. This is similar to docker run --name, and it is used to run or install additional tools required by the application.

# The file utility is not installed by default.
$ srun --container-image=ubuntu:20.04 which file
srun: error: luna-0173: task 0: Exited with exit code 1

# The following command creates a named container with the name "myubuntu", starting from the ubuntu 20.04 image.
$ srun --container-image=ubuntu:20.04 --container-name=myubuntu sh -c 'apt-get update && apt-get install -y file'

# Use the container filesystem created above, you don't need to specify --container-image anymore.
$ srun --container-name=myubuntu which file
/usr/bin/file

If the container is running, --container-name will behave like docker exec. This is particularly useful on the login node of the cluster combined with --jobid, to join a running container without having to ssh to the compute node:

# From a compute node, or inside a sbatch script
$ srun --container-name=myapp --container-mounts /mnt:/data ./myapp

# From the login node
$ srun --jobid=432788 --container-name=myapp findmnt /data
TARGET SOURCE               FSTYPE OPTIONS
/data  /dev/nvme2n1p2[/mnt] ext4   rw,relatime,errors=remount-ro

As you will land in the same container, this approach can be used to debug or profile your app with gdb, perf_events, strace, etc.

Clone this wiki locally