Skip to content

Commit

Permalink
Merge remote-tracking branch 'NERSCSSH/rtd' into rtd
Browse files Browse the repository at this point in the history
  • Loading branch information
scanon committed Aug 29, 2021
2 parents 64b94b9 + 53e1fa8 commit 8f16334
Show file tree
Hide file tree
Showing 4 changed files with 101 additions and 63 deletions.
53 changes: 31 additions & 22 deletions doc/command/shifter.rst
Original file line number Diff line number Diff line change
@@ -1,57 +1,66 @@
shifter
=======
.. _shifter-command:

``shifter`` command
===================

Synopsis
--------
*shifter* [options] _command_ [command options]
*shifter* [options] *command* [command options]

Description
-----------
*shifter* generates or attaches to an existing shifter container environment
``shifter`` command generates or attaches to an existing Shifter container environment
and launches a process within that container environment. This is done with
minimal overhead to ensure that container creation and process execution are
done as quickly as possible in support of High Performance Computing needs.

Options
-------
--image Image Selection Specification
-V|--volume Volume bind mount
-h|--help This help text
-v|--verbose Increased logging output

``-i`` \| ``--image``
Image selection specification
``-V`` \| ``--volume``
Volume bind mount
``-h`` \| ``--help``
This help text
``-v`` \| ``--verbose``
Increased logging output

Image Selection
---------------
*shifter* identifies the desired image by examining its environment and command
Shifter identifies the desired image by examining its environment and command
line options. In order of precedence, shifter selects image by looking at the
following sources:
- SHIFTER environment variable containing both image type and image speicifier
- SHIFTER_IMAGE and SHIFTER_IMAGETYPE environment variables
- SLURM_SPANK_SHIFTER_IMAGE and SLURM_SPANK_SHIFTER_IMAGETYPE environment variables
- --image command line option
- ``SHIFTER`` environment variable containing both image type and image speicifier
- ``SHIFTER_IMAGE`` and ``SHIFTER_IMAGETYPE`` environment variables
- ``SLURM_SPANK_SHIFTER_IMAGE`` and ``SLURM_SPANK_SHIFTER_IMAGETYPE`` environment variables
- ``--image`` command line option

Thus, the batch system can set effective defaults for image selection by manipulating
the job environemnt, however, the user can always override by specifying the --image
the job environemnt, however, the user can always override by specifying the ``--image``
command line argument.

The format of --image or the SHIFTER environment variable are the same:
The format of ``--image`` or the ``SHIFTER`` environment variable are the same::

imageType:imageSpecifier

where imageType is typically "docker" but could be other, site-defined types.
imageSpecifier is somewhat dependent on the imageType, however, for docker, the
where ``imageType`` is typically ``docker`` but could be other, site-defined types.
``imageSpecifier`` is somewhat dependent on the ``imageType``, however, for ``docker``, the
image gateway typically assigns the sha256 hash of the image manifest to be
the specifier.

shifter will attempt to see if the global environment already has a shifter
Shifter will attempt to see if the global environment already has a Shifter
image configured matching the users arguments. If a compatible image is already
setup on the system the existing environment will be used to launch the
requested process. If not, shifter will generate a new mount namespace, and
setup a new shifter environment. This ensures that multiple shifter instances
can be used simultaneously on the same node. Note that each shifter instance
requested process. If not, Shifter will generate a new mount namespace, and
setup a new shifter environment. This ensures that multiple Shifter instances
can be used simultaneously on the same node. Note that each Shifter instance
will consume at least one loop device, thus it is recommended that sites allow
for at least two available loop devices per shifter instance that might be
for at least two available loop devices per Shifter instance that might be
reasonably started on a compute node. At NERSC, we allow up to 128 loop
devices per compute node.

User-Specified Volume Mounts
----------------------------

.. todo:: Add documendation for user-specified volume mounts.
4 changes: 2 additions & 2 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = []
extensions = ['sphinx.ext.todo']

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
Expand Down Expand Up @@ -101,7 +101,7 @@
#keep_warnings = False

# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = False
todo_include_todos = True


# -- Options for HTML output ----------------------------------------------
Expand Down
2 changes: 2 additions & 0 deletions doc/modules.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _shifter-modules:

Shifter Modules
===============

Expand Down
105 changes: 66 additions & 39 deletions doc/mpi/mpich_abi.rst
Original file line number Diff line number Diff line change
@@ -1,22 +1,29 @@
MPI Support in Shifter: MPICH abi
=================================
MPI Support in Shifter: MPICH ABI
=================================

MPICH and its many variants agreed in 2014 to retain ABI compatibility to
help improve development practices. However, this ABI compatibility also
provides a clear path for almost transparently supporting MPI within shifter
environment containers.

The basic idea is that the container developer will use a fairly vanilla
version of MPICH and dynamically link their application against that. The
shifter-hosting site then configures shifter to inject their site-specific
version of MPICH and dynamically link their application against that. The
Shifter-hosting site then configures Shifter to inject their site-specific
version of MPICH (perhaps a Cray, Intel, or IBM variant) linked to the
interconnect and workload manager driver libraries. The site-specific version
of libmpi.so then overrides the version in the container, and the application
of ``libmpi.so`` then overrides the version in the container, and the application
automatically uses it instead of the generic version originally included in the
container.


--------------------------------
Container Developer Instructions
--------------------------------
Here is an example Dockerfile::

Here is an example Dockerfile:

.. code-block:: docker
FROM ubuntu:14.04
RUN apt-get update && apt-get install -y autoconf automake gcc g++ make gfortran
Expand All @@ -36,35 +43,44 @@ Here is an example Dockerfile::
Going through the above:

1. base from a common distribution, e.g., ubuntu:14.04,
2. install compiler tools to get a minimal dev environment.
3. get and install mpich 3.2
4. add and compile your application
5. Setup the environment to easily access your application
#. base from a common distribution, *e.g.*, ``ubuntu:14.04``,
#. install compiler tools to get a minimal dev environment.
#. get and install MPICH 3.2
#. add and compile your application
#. setup the environment to easily access your application

To construct the above container, one would do something like::
To construct the above container, one would do something like:

.. code-block:: bash
docker build -t dmjacobsen/mpitest:latest .
(setting your tag appropriately, of course)
(setting the tag appropriately, of course).


-----------------------
Slurm User Instructions
-----------------------
If the MPICH-abi environment is configured correctly (see below), it should be
very easy to run the application. Building from the example above::

If the MPICH ABI environment is configured correctly (see below), it should be
very easy to run the application. Building from the example above:

.. code-block:: bash
dmj@cori11:~> shifterimg pull dmjacobsen/mpitest:latest
2016-08-05T01:14:59 Pulling Image: docker:dmjacobsen/mpitest:latest, status: READY
dmj@cori11:~> salloc --image=dmjacobsen/mpitest:latest -N 4 --exclusive
salloc: Granted job allocation 2813140
salloc: Waiting for resource configuration
salloc: Nodes nid0[2256-2259] are ready for job
dmj@nid02256:~> srun shifter hello
hello from 2 of 4 on nid02258
hello from 0 of 4 on nid02256
hello from 1 of 4 on nid02257
hello from 3 of 4 on nid02259
dmj@nid02256:~> srun -n 128 shifter hello
hello from 32 of 128 on nid02257
hello from 46 of 128 on nid02257
Expand All @@ -79,66 +95,77 @@ very easy to run the application. Building from the example above::
hello from 29 of 128 on nid02256
hello from 30 of 128 on nid02256
hello from 31 of 128 on nid02256
dmj@nid02256:~> exit
salloc: Relinquishing job allocation 2813140
salloc: Job allocation 2813140 has been revoked.
dmj@cori11:~>
System Administrator Instructions: Configuring Shifter
------------------------------------------------------
The basic plan is to gather the libmpi.so* libraries and symlinks and copy them

The basic plan is to gather the ``libmpi.so*`` libraries and symlinks and copy them
into the container at runtime. This may require some dependencies to also be
copied, but hopefully only the most limited set possible. The current
recommendation is to copy these libraries into /opt/udiImage/<type>/lib64, and
all the dependencies to /opt/udiImage/<type>/lib64/dep
recommendation is to copy these libraries into ``/opt/udiImage/<type>/lib64``, and
all the dependencies to ``/opt/udiImage/<type>/lib64/dep``

We then use patchelf to rewrite the rpath of all copied libraries to point to
/opt/udiImage/<type>/lib64/dep
``/opt/udiImage/<type>/lib64/dep``

The source libraries must be prepared ahead of time using one of the helper
scripts provided in the extras directory, or a variant of same. As we get
access to different types of systems, we will post more helper scripts and
system-type-specific instructions.

Finally, we need to force LD_LIBRARY_PATH in the container to include
/opt/udiImage/<type>/lib64
Finally, we need to force ``LD_LIBRARY_PATH`` in the container to include
``/opt/udiImage/<type>/lib64``


Cray
++++
Run the `prep_cray_mpi_libs.py` script to prepare the libraries::

Run the ``prep_cray_mpi_libs.py`` script to prepare the libraries:

.. code-block:: bash
login$ python /path/to/shifterSource/extra/prep_cray_mpi_libs.py /tmp/craylibs
Note: in CLE5.2 this should be done on an internal login node; in CLE6 an
internal or external login node should work. You'll need to install patchelf
into your PATH prior to running (https://nixos.org/patchelf.html)
.. note::

In CLE5.2 this should be done on an internal login node; in CLE6 an
internal or external login node should work. You'll need to install
`PatchELF <https://github.com/NixOS/patchelf>`_ into your ``PATH`` prior to running.

Next copy /tmp/craylibs/mpich-<version> to your shifter module path (see Modules):
e.g., :code:`/usr/lib/shifter/opt/mpich-<version>`.
Next copy ``/tmp/craylibs/mpich-<version>`` to your Shifter module path (see :ref:`shifter-modules`):
*e.g.*, ``/usr/lib/shifter/opt/mpich-<version>``.

Finally, a few modifications need to be made to udiRoot.conf:
Finally, a few modifications need to be made to ``udiRoot.conf``:

1. add "module_mpich_siteEnvPrepend = LD_LIBRARY_PATH=/opt/udiImage/modules/mpich/lib64"
2. add "module_mpich_copyPath = /usr/lib/shifter/opt/mpich-<version>"
3. add "/var/opt/cray/alps:/var/opt/cray/alps:rec" to siteFs
4. if CLE6, add "/etc/opt/cray/wlm_detect:/etc/opt/cray/wlm_detect" to siteFs
5. add "defaultModules = mpich" to load cray-mpich support by default in all containers
#. add ``module_mpich_siteEnvPrepend = LD_LIBRARY_PATH=/opt/udiImage/modules/mpich/lib64``
#. add ``module_mpich_copyPath = /usr/lib/shifter/opt/mpich-<version>``
#. add ``/var/opt/cray/alps:/var/opt/cray/alps:rec`` to ``siteFs``
#. if CLE6, add ``/etc/opt/cray/wlm_detect:/etc/opt/cray/wlm_detect`` to ``siteFs``
#. add ``defaultModules = mpich`` to load ``cray-mpich`` support by default in all containers

Note, you may need to modify your sitePreMountHook script to create
/var/opt/cray and /etc/opt/cray prior the mounts.
.. note::

Instead of setting up the module_mpich_copyPath, you could use siteFs to bind-mount
You may need to modify your ``sitePreMountHook`` script to create
``/var/opt/cray`` and ``/etc/opt/cray`` prior the mounts.

Instead of setting up the ``module_mpich_copyPath``, you could use ``siteFs`` to bind-mount
the content into the container instead, which may have performance benefits in some
environments, e.g. set module_mpich_siteFs = /usr/lib/shifter/modules/mpich:/shifter/mpich.
In that case you'll need to adjust the module_mpich_siteEnvPrepend paths, and pre-create
the /shifter directory using the sitePreMountHook.
environments, *e.g.* set ``module_mpich_siteFs = /usr/lib/shifter/modules/mpich:/shifter/mpich``.
In that case you'll need to adjust the ``module_mpich_siteEnvPrepend`` paths, and pre-create
the ``/shifter`` directory using the ``sitePreMountHook``.

------

Other MPICH variants/vendors coming soon. If you have something not listed
here, please contact shifter-hpc@googlegroups.com!



[1] https://www.mpich.org/abi/

0 comments on commit 8f16334

Please sign in to comment.