Skip to content

Commit

Permalink
Merge pull request #188 from casparvl/host_injections
Browse files Browse the repository at this point in the history
Host injections, lmod hooks and moving gpu-related host-injection instructions
  • Loading branch information
xinan1911 committed Jul 2, 2024
2 parents 25d5d8f + 085c890 commit 7cc0b4a
Show file tree
Hide file tree
Showing 10 changed files with 268 additions and 52 deletions.
2 changes: 1 addition & 1 deletion docs/adding_software/debugging_failed_builds.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ If you want to install NVIDIA GPU software, make sure to also add the `--nvidia
While the above works perfectly well, you might not be able to complete your debugging session in one go. With the above approach, several steps will just be repeated every time you start a debugging session:

- Downloading the container
- Installing `CUDA` in your [host injections](../gpu.md#host_injections) directory (only if you use the `EESSI-install-software.sh` script, see below)
- Installing `CUDA` in your [host injections](../site_specific_config/host_injections.md) directory (only if you use the `EESSI-install-software.sh` script, see below)
- Installing all dependencies (before you get to the package that actually fails to build)

To avoid this, we create two directories. One holds the container & `host_injections`, which are (typically) common between multiple PRs and thus you don't have to redownload the container / reinstall the `host_injections` if you start working on another PR. The other will hold the PR-specific data: a tarball storing the software you'll build in your interactive debugging session. The paths we pick here are just example, you can pick any persistent, writeable location for this:
Expand Down
2 changes: 1 addition & 1 deletion docs/adding_software/opening_pr.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ git push koala example_branch
If all goes well, one or more bots :robot: should almost instantly create a comment in your pull request
with an overview of how it is configured - you will need this information when providing build instructions.

### Rebuilding software
### Rebuilding software {: #rebuilding_software }
We typically do not rebuild software, since (strictly speaking) this breaks reproducibility for anyone using the software. However, there are certain situations in which it is difficult or impossible to avoid.

To do a rebuild, you add the software you want to rebuild to a dedicated easystack file in the `rebuilds` directory. Use the following naming convention: `YYYYMMDD-eb-<EB_VERSION>-<APPLICATION_NAME>-<APPLICATION_VERSION>-<SHORT_DESCRIPTION>.yml`, where `YYYYMMDD` is the opening date of your PR. E.g. `2024.05.06-eb-4.9.1-CUDA-12.1.1-ship-full-runtime.yml` was added in a PR on the 6th of May 2024 and used to rebuild CUDA-12.1.1 using EasyBuild 4.9.1 to resolve an issue with some runtime libraries missing from the initial CUDA 12.1.1 installation.
Expand Down
38 changes: 6 additions & 32 deletions docs/gpu.md → docs/site_specific_config/gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
More information on the actions that must be performed to ensure that GPU software included in EESSI
can use the GPU in your system is available below.

[Please open a support issue](support.md) if you need help or have questions regarding GPU support.
[Please open a support issue](../support.md) if you need help or have questions regarding GPU support.

!!! tip "Make sure the `${EESSI_VERSION}` version placeholder is defined!"
In this page, we use `${EESSI_VERSION}` as a placeholder for the version of the EESSI repository,
Expand Down Expand Up @@ -39,33 +39,7 @@ An additional requirement is necessary if you want to be able to compile CUDA-en

Below, we describe how to make sure that the EESSI software stack can find your NVIDIA GPU drivers and (optionally) full installations of the CUDA SDK.

### `host_injections` variant symlink {: #host_injections }

In the EESSI repository, a special directory has been prepared where system administrators can install files that can be picked up by
software installations included in EESSI. This gives the ability to administrators to influence the behaviour (and capabilities) of the EESSI software stack.

This special directory is located in `/cvmfs/software.eessi.io/host_injections`, and it is a *CernVM-FS Variant Symlink*:
a symbolic link for which the target can be controlled by the CernVM-FS client configuration (for more info, see ['Variant Symlinks' in the official CernVM-FS documentation](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html#variant-symlinks)).

!!! info "Default target for `host_injections` variant symlink"

Unless otherwise configured in the CernVM-FS client configuration for the EESSI repository, the `host_injections` symlink points to `/opt/eessi` on the client system:
```
$ ls -l /cvmfs/software.eessi.io/host_injections
lrwxrwxrwx 1 cvmfs cvmfs 10 Oct 3 13:51 /cvmfs/software.eessi.io/host_injections -> /opt/eessi
```

As an example, let's imagine that we want to use a architecture-specific location on a shared filesystem as the target for the symlink. This has the advantage that one can make changes under `host_injections` that affect all nodes which share that CernVM-FS configuration. Configuring this in your CernVM-FS configuration would mean adding the following line in the client configuration file:

```{ .ini .copy }
EESSI_HOST_INJECTIONS=/shared_fs/path
```

!!! note "Don't forget to reload the CernVM-FS configuration"
After making a change to a CernVM-FS configuration file, you also need to reload the configuration:
```{ .bash .copy }
sudo cvmfs_config reload
```
### Configuring CUDA driver location {: #driver_location }

All CUDA-enabled software in EESSI expects the CUDA drivers to be available in a specific subdirectory of this `host_injections` directory.
In addition, installations of the CUDA SDK included EESSI are stripped down to the files that we are allowed to redistribute;
Expand All @@ -80,7 +54,7 @@ If the corresponding full installation of the CUDA SDK is available there, the C

### Using NVIDIA GPUs via a native EESSI installation {: #nvidia_eessi_native }

Here, we describe the steps to enable GPU support when you have a [native EESSI installation](getting_access/native_installation.md) on your system.
Here, we describe the steps to enable GPU support when you have a [native EESSI installation](../getting_access/native_installation.md) on your system.

!!! warning "Required permissions"
To enable GPU support for EESSI on your system, you will typically need to have system administration rights, since you need write permissions on the folder to the target directory of the `host_injections` symlink.
Expand Down Expand Up @@ -108,14 +82,14 @@ To install a full CUDA SDK under `host_injections`, use the `install_cuda_host_i
/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
```

For example, to install CUDA 12.1.1 in the directory that the [`host_injections` variant symlink](#host_injections) points to,
For example, to install CUDA 12.1.1 in the directory that the [`host_injections` variant symlink](host_injections.md) points to,
using `/tmp/$USER/EESSI` as directory to store temporary files:
```
/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/install_cuda_host_injections.sh --cuda-version 12.1.1 --temp-dir /tmp/$USER/EESSI --accept-cuda-eula
```
You should choose the CUDA version you wish to install according to what CUDA versions are included in EESSI;
see the output of `module avail CUDA/` after [setting up your environment for using
EESSI](using_eessi/setting_up_environment.md).
EESSI](../using_eessi/setting_up_environment.md).

You can run `/cvmfs/software.eessi.io/scripts/install_cuda_host_injections.sh --help` to check all of the options.

Expand All @@ -139,7 +113,7 @@ We focus here on the [Apptainer](https://apptainer.org/)/[Singularity](https://s
and have only tested the [`--nv` option](https://apptainer.org/docs/user/latest/gpu.html#nvidia-gpus-cuda-standard)
to enable access to GPUs from within the container.

If you are using the [EESSI container](getting_access/eessi_container.md) to access the EESSI software,
If you are using the [EESSI container](../getting_access/eessi_container.md) to access the EESSI software,
the procedure for enabling GPU support is slightly different and will be documented here eventually.

#### Exposing NVIDIA GPU drivers
Expand Down
48 changes: 48 additions & 0 deletions docs/site_specific_config/host_injections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# How to configure EESSI

## Why configuration is necessary

Just [installing EESSI](../getting_access/native_installation.md) is enough to get started with the EESSI software stack on a CPU-based system. However, additional configuration is necessary in many other cases, such as
- enabling GPU support on GPU-based systems
- site-specific configuration / tuning of the MPI libraries provided by EESSI
- overriding EESSI's MPI library with an ABI compatible host MPI

## The `host_injections` variant symlink

To allow such site-specific configuration, the EESSI repository includes a special directory where system administrations can install files that can be picked up by the software installations included in EESSI. This special directory is located in `/cvmfs/software.eessi.io/host_injections`, and it is a *CernVM-FS Variant Symlink*:
a symbolic link for which the target can be controlled by the CernVM-FS client configuration (for more info, see ['Variant Symlinks' in the official CernVM-FS documentation](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html#variant-symlinks)).

!!! info "Default target for `host_injections` variant symlink"

Unless otherwise configured in the CernVM-FS client configuration for the EESSI repository, the `host_injections` symlink points to `/opt/eessi` on the client system:
```
$ ls -l /cvmfs/software.eessi.io/host_injections
lrwxrwxrwx 1 cvmfs cvmfs 10 Oct 3 13:51 /cvmfs/software.eessi.io/host_injections -> /opt/eessi
```

The target for this symlink can be controlled by setting the `EESSI_HOST_INJECTIONS` variable in your local CVMFS configuration for EESSI. E.g.
```{bash}
sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/' > /etc/cvmfs/domain.d/eessi.io.local"
```

!!! note "Don't forget to reload the CernVM-FS configuration"
After making a change to a CernVM-FS configuration file, you also need to reload the configuration:
```{ .bash .copy }
sudo cvmfs_config reload
```

On a heterogeneous system, you may want to use different targets for the variant symlink for different node types. For example, you might have two types of GPU nodes (`gpu1` and `gpu2`) for which the GPU drivers are _not_ in the same location, or not of the same version. Since those are both things we configure under `host_injections`, you'll need separate `host_injections` directories for each node type. That can easily be achieved by putting e.g.

```{bash}
sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/gpu1/' > /etc/cvmfs/domain.d/eessi.io.local"
```

in the CVMFS config on the `gpu1` nodes, and

```{bash}
sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/gpu2/' > /etc/cvmfs/domain.d/eessi.io.local"
```
in the CVMFS config on the `gpu2` nodes.
192 changes: 192 additions & 0 deletions docs/site_specific_config/lmod_hooks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
# Configuring site-specific Lmod hooks
You may want to customize what happens when certain modules are loaded, for example, you may want to set additional environment variables. This is possible with [LMOD hooks](https://lmod.readthedocs.io/en/latest/170_hooks.html). A typical example would be when you want to tune the OpenMPI module for your system by setting additional environment variables when an OpenMPI module is loaded.


## Location of the hooks
The EESSI software stack provides its own set of hooks in `$LMOD_PACKAGE_PATH/SitePackage.lua`. This `SitePackage.lua` also searches for site-specific hooks in two additional locations:

- `$EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/.lmod/SitePackage.lua`
- `$EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/software/$EESSI_OS_TYPE/$EESSI_SOFTWARE_SUBDIR/.lmod/SitePackage.lua`

The first allows for hooks that need to be executed for that system, irrespective of the CPU architecture. The second allows for hooks specific to a certain architecture.

## Architecture-independent hooks
Hooks are written in Lua and can use any of the standard Lmod functionality as described in the [Lmod documentation](https://lmod.readthedocs.io/en/latest/170_hooks.html). While there are many types of hooks, you most likely want to specify a load or unload hook. Note that the EESSI hooks provide a nice example of what you can do with hooks. Here, as an example, we will define a `load` hook that environment variable `MY_ENV_VAR` to `1` whenever an `OpenMPI` module is loaded.

First, you typically want to load the necessary Lua packages:
```lua
-- $EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/.lmod/SitePackage.lua

-- The Strict package checks for the use of undeclared variables:
require("strict")

-- Load the Lmod Hook package
local hook=require("Hook")
```

Next, we define a function that we want to use as a hook. Unfortunately, registering multiple hooks of the same type (e.g. multiple `load` hooks) is only supported in Lmod 8.7.35+. EESSI version 2023.06 uses Lmod 8.7.30. Thus, we define our function without the local keyword, so that we can still add to it later in an architecture-specific hook (if we wanted to):

```lua
-- Define a function for the hook
-- Note that we define this without 'local' keyword
-- That way we can still add to this function in an architecture-specific hook
function set_my_env_var_openmpi(t)
local simpleName = string.match(t.modFullName, "(.-)/")
if simpleName == 'OpenMPI' then
setenv('MY_ENV_VAR', '1')
end
end
```

for the same reason that multiple hooks cannot be registered, we need to combine this function for our site-specific (architecture-independent) with the function that specifies the EESSI `load` hook. Note that all EESSI hooks will be called `eessi_<hook_type>_hook` by convention.

```lua
-- Registering multiple hook functions, e.g. multiple load hooks is only supported in Lmod 8.7.35+
-- EESSI version 2023.06 uses lmod 8.7.30. Thus, we first have to combine all functions into a single one,
-- before registering it as a hook
local function combined_load_hook(t)
-- Call the EESSI load hook (if it exists)
-- Note that if you wanted to overwrite the EESSI hooks (not recommended!), you would omit this
if eessi_load_hook ~= nil then
eessi_load_hook(t)
end
-- Call the site-specific load hook
set_my_env_var_openmpi(t)
end
```

Then, we can finally register this function as an Lmod hook:

```lua
hook.register("load", combined_load_hook)
```

Thus, our complete `$EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/.lmod/SitePackage.lua` now looks like this (omitting the comments):

```lua
require("strict")
local hook=require("Hook")

function set_my_env_var_openmpi(t)
local simpleName = string.match(t.modFullName, "(.-)/")
if simpleName == 'OpenMPI' then
setenv('MY_ENV_VAR', '1')
end
end

local function combined_load_hook(t)
if eessi_load_hook ~= nil then
eessi_load_hook(t)
end
set_my_env_var_openmpi(t)
end

hook.register("load", combined_load_hook)
```

Note that for future EESSI versions, if they use Lmod 8.7.35+, this would be simplified to:

```lua
require("strict")
local hook=require("Hook")

local function set_my_env_var_openmpi(t)
local simpleName = string.match(t.modFullName, "(.-)/")
if simpleName == 'OpenMPI' then
setenv('MY_ENV_VAR', '1')
end
end

hook.register("load", set_my_env_var_openmpi, "append")
```

## Architecture-dependent hooks
Now, assume that in addition we want to set an environment variable `MY_SECOND_ENV_VAR` to `5`, but only for nodes that have the `zen3` architecture. First, again, you typically want to load the necessary Lua packages:

```lua
-- $EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/software/linux/x86_64/amd/zen3/.lmod/SitePackage.lua

-- The Strict package checks for the use of undeclared variables:
require("strict")

-- Load the Lmod Hook package
local hook=require("Hook")
```

Next, we define the function for the hook itself

```lua
-- Define a function for the hook
-- This time, we can define it as a local function, as there are no hooks more specific than this
local function set_my_second_env_var_openmpi(t)
local simpleName = string.match(t.modFullName, "(.-)/")
if simpleName == 'OpenMPI' then
setenv('MY_SECOND_ENV_VAR', '5')
end
end
```

Then, we combine the functions into one

```lua
local function combined_load_hook(t)
-- Call the EESSI load hook first
if eessi_load_hook ~= nil then
eessi_load_hook(t)
end
-- Then call the architecture-independent load hook
if set_my_env_var_openmpi(t) ~= nil then
set_my_env_var_openmpi(t)
end
-- And finally the architecture-dependent load hook we just defined
set_my_second_env_var_openmpi(t)
end
```

before finally registering it as an Lmod hook

```lua
hook.register("load", combined_load_hook)
```

Thus, our full `$EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/software/linux/x86_64/amd/zen3/.lmod/SitePackage.lua` now looks like this (omitting the comments):

```lua
require("strict")
local hook=require("Hook")

local function set_my_second_env_var_openmpi(t)
local simpleName = string.match(t.modFullName, "(.-)/")
if simpleName == 'OpenMPI' then
setenv('MY_SECOND_ENV_VAR', '5')
end
end

local function combined_load_hook(t)
if eessi_load_hook ~= nil then
eessi_load_hook(t)
end
if set_my_env_var_openmpi(t) ~= nil then
set_my_env_var_openmpi(t)
end
set_my_second_env_var_openmpi(t)
end

hook.register("load", combined_load_hook)
```

Again, note that for future EESSI versions, if they use Lmod 8.7.35+, this would simplify to

```lua
require("strict")
local hook=require("Hook")

local function set_my_second_env_var_openmpi(t)
local simpleName = string.match(t.modFullName, "(.-)/")
if simpleName == 'OpenMPI' then
setenv('MY_SECOND_ENV_VAR', '5')
end
end

hook.register("load", set_my_second_var_openmpi, "append")
```
Loading

0 comments on commit 7cc0b4a

Please sign in to comment.