diff --git a/docs/alps/index.md b/docs/alps/index.md index 9b95316f7..2df21a571 100644 --- a/docs/alps/index.md +++ b/docs/alps/index.md @@ -1,43 +1,26 @@ -!!! construction "Page under construction - last update: 2024-09-06" - - Information in this page is not yet complete nor final. It will be updated following the progress of - - - the Alps system deployment at CSCS - - C2SM's adaptation to this new system - # The Alps System Alps is a distributed HPC infrastructure managed by CSCS. Contrary to traditional HPC, it is composed of several logical units called vClusters (versatile clusters). From the users perspective, they play the role of a traditional HPC machine, each vCluster being tailored to the needs of a specific community. This setup also enables geographical distribution of vClusters which facilitates geo-redundancy. The main physical piece of Alps is hosted at CSCS in Lugano and a detailed description can be found at [their website :material-open-in-new:](https://www.cscs.ch/computers/alps){:target="_blank"}. ## vClusters -The following table shows the current plan for the final vClusters distribution on Alps at CSCS (not the current situation). +The following table shows current vClusters distribution on Alps at CSCS (only C2SM relevant vClusters are shown). | vCluster | Activity | Share | |----------|-------------------|------------------| -| Daint | User Lab | ~ 800 GH nodes | +| Santis | Weather & Climate | ~ 500 GH nodes | +| Daint | User Lab | ~ 600 GH nodes | | Clariden | Machine Learning | ~ 800 GH nodes | -| Santis | Weather & Climate | ~ 400 GH nodes | -| Tödi | Testing | few GH nodes | | Eiger | | multi-core nodes | *GH = Grace Hopper* -## Early Access - -For getting access to the vCluster dedicated to testing ([Tödi](vclusters.md/#todi){:target="_blank"}), CSCS offers [Preparatory Projects :material-open-in-new:](https://www.cscs.ch/user-lab/allocation-schemes/preparatory-projects){:target="_blank"}. - ## Support by CSCS -To contact CSCS staff directly, users can join their dedicated [Slack channel :material-open-in-new:](https://cscs-users.slack.com){:target="_blank"}. - -## File Systems +General information about access, file systems, vClusters, user environments and much more can be found at the [CSCS Knowledge Base :material-open-in-new:](https://confluence.cscs.ch/display/KB){:target="_blank"}. -!!! note "TODO" +To contact CSCS staff directly, users can join their dedicated [Slack workspace :material-open-in-new:](https://cscs-users.slack.com){:target="_blank"}, with dedicated channels for each vCluster. - - [ ] `/users`, `/store` and `/scratch` - - [ ] reserved space per vClsuter vs shared space - - [ ] ... ## Introductory Workshop Material diff --git a/docs/alps/uenvs.md b/docs/alps/uenvs.md index b9106dd46..99f33320d 100644 --- a/docs/alps/uenvs.md +++ b/docs/alps/uenvs.md @@ -1,10 +1,3 @@ -!!! construction "Page under construction - last update: 2024-09-06" - - Information in this page is not yet complete nor final. It will be updated following the progress of - - - the Alps system deployment at CSCS - - C2SM's adaptation to this new system - # User Environments Software stacks at CSCS are now accessible through the so-called User Environments (uenv). They replace the previous monolithic software stack containing everything from which one could load any module, with all the infinite potential conflicts it involves. User environments contain the minimal software stack required for a certain activity, say, building and running ICON. They are generated by `spack`, packed into single `squashfs` file and then mounted by the user. In a way, they can be considered as poor man's containers. @@ -29,7 +22,7 @@ The user environments provided by CSCS are registered in a central database. In !!! warning - Old software stack images didn't have a mount point in the metadata which is now required for the new versions of the `uenv` tool and its `--uenv` slurm plugin counterpart. If you have images in your local repository that are older than roughly September 5th, please pull them again. It will only update the metadata + Old software stack images didn't have a mount point in the metadata which is now required for the new versions of the `uenv` tool and its `--uenv` slurm plugin counterpart. If you have images in your local repository that are older than roughly September 5th, please pull them again - it will only update the metadata. ## The `uenv` command line tool diff --git a/docs/alps/vclusters.md b/docs/alps/vclusters.md index 366c9f329..c7c64570b 100644 --- a/docs/alps/vclusters.md +++ b/docs/alps/vclusters.md @@ -1,10 +1,3 @@ -!!! construction "Page under construction - last update: 2024-09-06" - - Information in this page is not yet complete nor final. It will be updated following the progress of - - - the Alps system deployment at CSCS - - C2SM's adaptation to this new system - # Supported vClusters This page is hosting information about C2SM supported vClusters (not all CSCS vClusters). @@ -27,56 +20,76 @@ Host balfrin* daint* santis* todi* ProxyJump ela ``` -This would allow standard connections like `ssh santis` but also specifying the login node like `ssh santis-ln002` if needed. Replace `cscsusername` with your actual user name. +This allows standard connections like `ssh santis`, but you can also specify a login node if needed, e.g., `ssh santis-ln002`. Replace `cscsusername` with your actual username. -## Daint - -Daint (Alps) is the vCluster dedicated to the User Lab. It is currently accessible at `daint.alps.cscs.ch` (until the current Piz Daint gets decommissioned), so connect with `ssh daint.alps` with the `ssh` settings above. +## Santis -Even though Weather and Climate also has the dedicated vCluster Santis (see [below](#santis)), traditional projects might also land on Daint. +The vCluster `santis` is dedicated to **Climate and Weather** and may initially host only [EXCLAIM :material-open-in-new:](https://c2sm.ethz.ch/research/exclaim.html){:target="_blank"} and related projects. -### Uenvs +### Deployment Status -List of currently supported Uenvs on Daint: +Currently, the deployment is approximately 95% complete. -| uenv | activity | Remark | -|--------------------------|--------------------------------|---------------------| -| icon-vx:rcy | build and run ICON | Not deployed (yet?) | -| netcdf-tools/2024:v1-rc1 | pre- and post-processing tools | | +### Differences to the Environment on `todi` -### Storage +- `$HOME` is now on a new NFS file system + - Your folder `/users/$USER` will initially be mostly empty + - The NFS system still requires fine-tuning, and file system performance may be low. + - We recommend running tasks, especially heavy ones, on $SCRATCH. +- `todi`'s $HOME is mounted as `/users.OLD/$USER`. + - ⚠️ The mount is read-only! + - You are responsible for copying your data from `/users.OLD/$USER` to `/users/$USER/...`. + - The mount is temporary and will be removed by the end of January 2025. -!!! note "TODO" +!!! info - - [ ] Storage + Despite the need to work on the deployment in the upcoming days, users are invited to already access the system and start familiarising themselves with it and they might also start the data migration of their old home. -## Santis + The activities on CSCS side should not require any reboot, however, some services might need to be restarted, e.g., SLURM. This could lead to short interruptions or even failing jobs. CSCS will provide more information in the upcoming days and will try to minimise the risk of interferences by consolidating changes. -!!! warning "Santis has not been deployed yet." +### Uenvs -Santis is dedicated to Weather and Climate. It might, at the beginning, only host [EXCLAIM :material-open-in-new:](https://c2sm.ethz.ch/research/exclaim.html){:target="_blank"} and related projects. +To find and use already existing uenvs from `todi`, you need to modify the `CLUSTER_NAME` environment variable. -### Uenvs +```shell +export CLUSTER_NAME=todi +uenv image find +``` -| uenv | activity | -|--------------------------|--------------------------------| -| icon-vx:rcy | build and run ICON | -| netcdf-tools/tag:version | pre- and post-processing tools | +| uenv | activity | +|----------------------------|--------------------------------| +| `icon-wcp/v1:rc4` | build and run ICON | +| `netcdf-tools/2024:v1-rc1` | pre- and post-processing tools | ### Storage !!! note "TODO" - - [ ] Storage +The migration of the previous storage is not yet finished. Once there is an update from CSCS, we will inform you here. Also note that the environment variables `$STORE` and `$PROJECT` are not yet set. +## Daint -## Tödi +Daint (Alps) is the vCluster dedicated to the **User Lab**. It is currently accessible at `daint.alps.cscs.ch` (until the current Piz Daint gets decommissioned), so connect with `ssh daint.alps` with the `ssh` settings above. -Tödi is the testing vCluster and is currently deployed on the most of the Alps system. +Even though Climate and Weather also has the dedicated vCluster `santis` (see [below](#santis)), traditional projects might also land on Daint. ### Uenvs -| uenv | activity | -|--------------------------|--------------------------------| -| icon-wcp/v1:rc4 | build and run ICON | -| netcdf-tools/2024:v1-rc1 | pre- and post-processing tools | +As on `santis`, you can access the uenvs from `todi`: + +```shell +export CLUSTER_NAME=todi +uenv image find +``` + +| uenv | activity | +|----------------------------|--------------------------------| +| `icon-wcp/v1:rc4` | build and run ICON | +| `netcdf-tools/2024:v1-rc1` | pre- and post-processing tools | + +### Storage + +!!! note "TODO" + +The migration of the previous storage is not yet finished. Once there is an update from CSCS, we will inform you here. + diff --git a/docs/models/icon/usage.md b/docs/models/icon/usage.md index 4963f088c..09699c45a 100644 --- a/docs/models/icon/usage.md +++ b/docs/models/icon/usage.md @@ -13,77 +13,31 @@ Once you have access, clone the repository from GitHub using the SSH protocol: ## Configure and compile -### Piz Daint -Spack is used to build ICON. Please follow the steps below to set up Spack and build ICON. - -**1. Set up a Spack instance** - -To [set up a Spack instance :material-open-in-new:](https://c2sm.github.io/spack-c2sm/latest/QuickStart.html#at-cscs-daint-tsa-balfrin){:target="_blank"}, ensure that you clone the repository using the Spack tag provided in the ICON repository at [config/cscs/SPACK_TAG_C2SM :material-open-in-new:](https://github.com/C2SM/icon/blob/main/config/cscs/SPACK_TAG_C2SM){:target="_blank"} and load it into your command line. - -**2. Build ICON** - -Refer to the official spack-c2sm documentation for [installing ICON using Spack :material-open-in-new:](https://c2sm.github.io/spack-c2sm/latest/QuickStart.html#icon){:target="_blank"}. - -After the first compilation, you need to create a `setting` file (the following example is for Piz Daint, please adapt the lines according to the machine you are using): - -=== "daint_gpu_nvhpc" - ```shell - # Get SPACK_TAG used on machine - SPACK_TAG=$(cat "config/cscs/SPACK_TAG_C2SM") - # Set the name of the environment, which should be equal to the builder - ENV_NAME=daint_gpu_nvhpc - # Load probtest environment (only needed if you want to run check files) - source /project/g110/icon/probtest/conda/miniconda/bin/activate probtest - # Ensure CDO is loaded on your machine - module load daint-gpu CDO - # Remove and create setting file with the following two commands - rm -f setting - ./config/cscs/create_sh_env $SPACK_TAG $ENV_NAME - ``` - -### Euler -Spack is used to build ICON. Please follow the steps below to set up Spack and build ICON. - -**1. Set up a Spack instance** - -To [set up a Spack instance :material-open-in-new:](https://c2sm.github.io/spack-c2sm/latest/QuickStart.html#at-cscs-daint-tsa-balfrin){:target="_blank"}, ensure that you clone the repository using the Spack tag provided in the ICON repository at [config/ethz/SPACK_TAG_EULER :material-open-in-new:](https://github.com/C2SM/icon/blob/main/config/ethz/SPACK_TAG_EULER){:target="_blank"} and load it into your command line. +### Säntis +!!! construction "Under construction - last update: 2024-12-18" -**2. Build ICON** - -Activate the Spack environment for Euler: -```bash -SPACK_TAG=$(cat "config/ethz/SPACK_TAG_EULER") -spack env activate -d config/ethz/spack/$SPACK_TAG/euler_cpu_gcc -``` + Information on this section is not yet complete nor final. It will be updated following the progress of the Alps system deployment at CSCS and C2SM's adaptation to this new system. Please use the [C2SM support forum :material-open-in-new:](https://github.com/C2SM/Tasks-Support/discussions){:target="_blank"} in case of questions regarding building ICON on Alps. -Euler Support recommends to compile code on compute-nodes. Unfortunately [internet-access on Euler compute-nodes is restricted :material-open-in-new:](https://scicomp.ethz.ch/wiki/Accessing_the_clusters#Internet_Security){:target="_blank"}. -Therefore a two-step install needs to be performed: +Currently, the same ICON user environment used on `todi` is being used. Since the environment is still linked to `todi`, you need to export the `CLUSTER_NAME` to `todi` for now: ```bash -# fetch and install cosmo-eccodes-definitions on login-node -spack install cosmo-eccodes-definitions - -# compile ICON on compute-nodes -srun -N 1 -c 12 --mem-per-cpu=20G spack install -v -j 12 +export CLUSTER_NAME=todi ``` - -### Todi - -!!! construction "Under construction - last update: 2024-09-20" - - Information on this section is not yet complete nor final. It will be updated following the progress of the Alps system deployment at CSCS and C2SM's adaptation to this new system. Please use the [C2SM support forum :material-open-in-new:](https://github.com/C2SM/Tasks-Support/discussions){:target="_blank"} in case of questions regarding building ICON on Alps. - -On Todi, Spack is also used to build ICON. However, there is no suitable `spack.yaml` file present for the Spack environment. Therefore, create a `spack.yaml` file and use the software stack upstream provided by the user environment. +Next, follow the instructions to build ICON using Spack below. **1. Create a `spack.yaml` file** -Create the following files from the ICON build folder (different to the ICON root folder in case of a out-of-source build). +Create the following files from the ICON build folder (different to the ICON root folder in case of a out-of-source build). For that, you will have to create the missing folders first: +```bash +mkdir -p config/cscs/spack/v0.21.1.3/alps_cpu_nvhpc +mkdir -p config/cscs/spack/v0.21.1.3/alps_gpu_nvhpc +``` For CPU compilation: -=== "config/cscs/spack/v0.21.1.3/todi_cpu_nvhpc/spack.yaml" +=== "config/cscs/spack/v0.21.1.3/alps_cpu_nvhpc/spack.yaml" ```yaml spack: @@ -101,7 +55,7 @@ For CPU compilation: For GPU compilation: -=== "config/cscs/spack/v0.21.1.3/todi_gpu_nvhpc/spack.yaml" +=== "config/cscs/spack/v0.21.1.3/alps_gpu_nvhpc/spack.yaml" ```yaml spack: @@ -131,18 +85,65 @@ git clone --depth 1 --recurse-submodules --shallow-submodules -b ${SPACK_TAG} ht # Build ICON cd /path/to/icon-build-folder -spack env activate -d config/cscs/spack/${SPACK_TAG}/todi_gpu_nvhpc +spack env activate -d config/cscs/spack/${SPACK_TAG}/alps_gpu_nvhpc spack install ``` +### Euler +Spack is used to build ICON. Please follow the steps below to set up Spack and build ICON. + +**1. Set up a Spack instance** + +To [set up a Spack instance :material-open-in-new:](https://c2sm.github.io/spack-c2sm/latest/QuickStart.html#at-cscs-daint-tsa-balfrin){:target="_blank"}, ensure that you clone the repository using the Spack tag provided in the ICON repository at [config/ethz/SPACK_TAG_EULER :material-open-in-new:](https://github.com/C2SM/icon/blob/main/config/ethz/SPACK_TAG_EULER){:target="_blank"} and load it into your command line. + -### Santis -Please follow the instructions for Todi, but run the following before loading the ICON user-environment: +**2. Build ICON** +Activate the Spack environment for Euler: ```bash -export CLUSTER_NAME=todi +SPACK_TAG=$(cat "config/ethz/SPACK_TAG_EULER") +spack env activate -d config/ethz/spack/$SPACK_TAG/euler_cpu_gcc ``` +Euler Support recommends to compile code on compute-nodes. Unfortunately [internet-access on Euler compute-nodes is restricted :material-open-in-new:](https://scicomp.ethz.ch/wiki/Accessing_the_clusters#Internet_Security){:target="_blank"}. +Therefore a two-step install needs to be performed: + +```bash +# fetch and install cosmo-eccodes-definitions on login-node +spack install cosmo-eccodes-definitions + +# compile ICON on compute-nodes +srun -N 1 -c 12 --mem-per-cpu=20G spack install -v -j 12 +``` + +### Piz Daint +Spack is used to build ICON. Please follow the steps below to set up Spack and build ICON. + +**1. Set up a Spack instance** + +To [set up a Spack instance :material-open-in-new:](https://c2sm.github.io/spack-c2sm/latest/QuickStart.html#at-cscs-daint-tsa-balfrin){:target="_blank"}, ensure that you clone the repository using the Spack tag provided in the ICON repository at [config/cscs/SPACK_TAG_C2SM :material-open-in-new:](https://github.com/C2SM/icon/blob/main/config/cscs/SPACK_TAG_C2SM){:target="_blank"} and load it into your command line. + +**2. Build ICON** + +Refer to the official spack-c2sm documentation for [installing ICON using Spack :material-open-in-new:](https://c2sm.github.io/spack-c2sm/latest/QuickStart.html#icon){:target="_blank"}. + +After the first compilation, you need to create a `setting` file (the following example is for Piz Daint, please adapt the lines according to the machine you are using): + +=== "daint_gpu_nvhpc" + ```shell + # Get SPACK_TAG used on machine + SPACK_TAG=$(cat "config/cscs/SPACK_TAG_C2SM") + # Set the name of the environment, which should be equal to the builder + ENV_NAME=daint_gpu_nvhpc + # Load probtest environment (only needed if you want to run check files) + source /project/g110/icon/probtest/conda/miniconda/bin/activate probtest + # Ensure CDO is loaded on your machine + module load daint-gpu CDO + # Remove and create setting file with the following two commands + rm -f setting + ./config/cscs/create_sh_env $SPACK_TAG $ENV_NAME + ``` + ## Run test case In the *run* folder, you find many prepared test cases, which you can convert into run scripts. To generate the runscript of one of the experiment files, e.g. *mch_ch_lowres*, you can use the `make_runscripts` function. diff --git a/docs/posts/2024-12-17_alps_update.md b/docs/posts/2024-12-17_alps_update.md new file mode 100644 index 000000000..c97c1820f --- /dev/null +++ b/docs/posts/2024-12-17_alps_update.md @@ -0,0 +1,14 @@ +--- +date: + created: 2024-12-17 +categories: + - Alps +--- + +# Update on the current status on Alps + +Last week, CSCS deployed the Climate and Weather vCluster `santis`. As some fine-tuning is still ongoing, the [Santis section](../alps/vclusters.md#santis) provides an overview of how to transition from `todi` to `santis`. + + + +Additionally, all information about the new vClusters as well as how to use User Environments (uenvs) have been updated. \ No newline at end of file