Skip to content

Commit

Permalink
content updates
Browse files Browse the repository at this point in the history
  • Loading branch information
mjpritchard committed Feb 13, 2024
1 parent 7af7be1 commit 80e4666
Show file tree
Hide file tree
Showing 5 changed files with 215 additions and 229 deletions.
57 changes: 29 additions & 28 deletions content/docs/batch-computing/lotus-cluster-specification.md
@@ -1,6 +1,5 @@
---
aliases: /article/4932-lotus-cluster-specification
date: 2023-03-13 13:50:34
description: LOTUS cluster specification
slug: lotus-cluster-specification
tags:
Expand All @@ -9,16 +8,29 @@ tags:
title: LOTUS cluster specification
---

## LOTUS nodes
## Current cluster specification

LOTUS is a cluster of over 300 nodes/hosts and 19000 CPU cores. A node/host is
an individual computer in the cluster with more than 1 processor. Each
node/host belongs to a specific host group. The number of processors (CPUs or
cores) per host is listed in Table 1 with the corresponding processor model
and the size of the physical memory RAM available per node/host.

**Table 1**. LOTUS cluster specification

**Current** host groups

Host group name | Number of nodes/hosts | Processor model | CPUs per host | RAM
---|---|---|---|---
broadwell256G | 37 | Intel Xeon E5-2640-v4 "Broadwell" | 20 | 256 GB
skylake348G | 151 | Intel Xeon Gold-5118 "Skylake" | 24 | 348 GB
epyctwo1024G | 200 | AMD | 48 | 1024 GB |
{.table .table-striped}

## Selection of specific processor model

To select a node/host with a specific processor model and memory, add the
following SLURM directive to your job script
following Slurm directive to your job script

```bash
#SBATCH --constraint="<host-group-name>"
Expand All @@ -31,39 +43,28 @@ For example
```

{{< alert type="info" >}}
`intel` and `amd` node types are defined in the SLURM configuration as a feature:

* For any Intel node type use `#SBATCH --constraint="intel"`
* For a specific Intel CPU model use the host group name (see Table 1)
* e.g. `#SBATCH --constraint="skylake348G"`
* For AMD use ` #SBATCH --constraint="amd"`
Further notes

`intel` and `amd` node types are defined in the Slurm configuration as a feature:
- For any Intel node type use `#SBATCH --constraint="intel"`
- For a specific Intel CPU model use the host group name (see Table 1)
- e.g. `#SBATCH --constraint="skylake348G"`
- For AMD use ` #SBATCH --constraint="amd"`
- There are 10 nodes of node type `skylake348G` with SSD disk mounted on /tmp
- LOTUS nodes of node type `epyctwo1024` are not available yet on the `par-multi` queue
{{< /alert >}}

{{< alert type="danger" >}}
If you choose to compile code for specific architectures, do not expect it to run elsewhere in the system.
{{< /alert >}}

**Table 1**. LOTUS cluster specification

**Current** host groups

Host group name | Number of nodes/hosts | Processor model | CPUs per host | RAM
---|---|---|---|---
broadwell256G | 37 | Intel Xeon E5-2640-v4 "Broadwell" | 20 | 256 GB
skylake348G | 151 | Intel Xeon Gold-5118 "Skylake" | 24 | 348 GB
epyctwo1024G | 200 | AMD | 48 | 1024 GB |
{.table .table-striped}
## Retired host groups no longer in use

**Retired** host groups: no longer in use
(For reference only)

Host group name | Number of nodes/hosts | Processor model | CPUs per host | RAM
---|---|---|---|---
~~haswell256G~~ | ~~7~~ retired | ~~Intel Xeon E5-2650-v3 "Haswell"~~ | ~~20~~ | ~~256 GB~~
~~ivybridge2000G~~ | ~~3~~ -retired | ~~Intel Xeon E7-4860-v2 "Ivy Bridge"~~ | ~~48~~ | ~~2048 GB~~
{.table .table-striped}

**Notes**

* There are 10 nodes of node type `skylake348G` with SSD disk mounted on /tmp
* LOTUS nodes of node type `epyctwo1024` are not available yet on the `par-multi` queue

{{< alert type="danger" >}}
If you choose to compile code for specific architectures, do not expect it to run elsewhere in the system.
{{< /alert >}}
54 changes: 25 additions & 29 deletions content/docs/batch-computing/orchid-gpu-cluster.md
Expand Up @@ -9,23 +9,31 @@ type: docs
This article provides details on JASMIN's GPU
cluster, named **Orchid**.

## GPU cluster spec

The JASMIN GPU cluster is composed of 16 GPU nodes:

- 14 x standard GPU nodes with 4 GPU Nvidia A100 GPU cards each
- 2 x large GPU nodes with 8 Nvidia A100 GPU cards

{{< image src="img/docs/gpu-cluster-orchid/file-NZmhCFPJx9.png" caption="ORCHID GPU cluster" >}}

## Request access to Orchid

Access to the GPU cluster is controlled by being a member of the Slurm account
`orchid`. You can request access to this via the link below which will
direct you to the ORCHID service page on the JASMIN accounts portal:
Access to the GPU cluster (and a GPU interactive node) is controlled by being a member of the Slurm account
`orchid`. Please request access via the link below which will
take you to the ORCHID service page on the JASMIN accounts portal:

https://accounts.jasmin.ac.uk/services/additional_services/orchid/
{{<button href="https://accounts.jasmin.ac.uk/services/additional_services/orchid/" >}}Apply here{{</button>}}

**Note:** In the supporting info on the request form, please provide details
on the software and the workflow that you will use/run on the GPU cluster (or
the interactive GPU node)
on the software and the workflow that you will use/run on ORCHID.

## Test a GPU job

Testing a job on the JASMIN Orchid GPU cluster can be carried out in an
interactive mode by launching a pseudo-shell terminal SLURM job from a JASMIN
scientific server e.g. sci2:
interactive mode by launching a pseudo-shell terminal Slurm job from a JASMIN
scientific server e.g. `sci2`:

{{<command user="user" host="sci2">}}
srun --gres=gpu:1 --partition=orchid --account=orchid --pty /bin/bash
Expand All @@ -35,11 +43,10 @@ srun --gres=gpu:1 --partition=orchid --account=orchid --pty /bin/bash
{{<command user="user" host="gpuhost16">}}
## you are now on gpuhost16
{{</command>}}


The GPU node gpuhost016 is allocated for this interactive session on LOTUS

Note that for batch mode, a GPU job is submitted using the SLURM command
Note that for batch mode, a GPU job is submitted using the Slurm command
'sbatch':

{{<command user="user" host="sci2">}}
Expand All @@ -53,27 +60,26 @@ or by adding the following preamble in the job script file
#SBATCH --gres=gpu:1
```

Note 1: `gpuhost015 `and `gpuhost016`are the two largest nodes with 64 CPUs and
Note 1: `gpuhost015` and `gpuhost016` are the two largest nodes with 64 CPUs and
8 GPUs.

Note 2: **CUDA Version: 11.6**

Note 3: The SLURM batch queue 'orchid' has a maximum runtime of 24 hours and
Note 3: The Slurm batch partition/queue `orchid` has a maximum runtime of 24 hours and
the default runtime is 1 hour. The maximum number of CPU cores per user is
limited to 8 cores. If the limit is exceeded then the job is expected to be in
a pending state with the reason being **QOSGrpCpuLimit**
a pending state with the reason being {{<mark>}}QOSGrpCpuLimit{{</mark>}}

## GPU interactive node

There is also an interactive GPU node `gpuhost001.jc.rl.ac.uk` (same spec as
Orchid) that you can ssh into it from the JASMIN login server to prototype and
test your GPU code prior to using the batch GPU cluster Orchid
There is an interactive GPU node `gpuhost001.jc.rl.ac.uk`, with the same spec as
other Orchid nodes, that you can access via a login server to prototype and
test your GPU code prior to running as a batch job.


{{<command user="user" host="login1">}}
{{<command user="user" host="login1">}}
ssh -A gpuhost001.jc.rl.ac.uk
{{</command>}}
{{<command user="user" host="gpuhost001">}}
{{<command user="user" host="gpuhost001">}}
## you are now on gpuhost001
{{</command>}}

Expand All @@ -86,13 +92,3 @@ ssh -A gpuhost001.jc.rl.ac.uk
- Singularity 3.7.0 - which supports NVIDIA/GPU containers
- SCL Python 3.6

The SLURM queue is `orchid` with maximum runtime of 24 hours and
default runtime 1 hour.

## GPU cluster spec

The JASMIN GPU cluster is composed of 16 GPU nodes:
- 14 x standard GPU nodes with 4 GPU Nvidia A100 GPU cards each
- 2 x large GPU nodes with 8 Nvidia A100 GPU cards

{{< image src="img/docs/gpu-cluster-orchid/file-NZmhCFPJx9.png" caption="ORCHID GPU cluster" >}}
82 changes: 42 additions & 40 deletions content/docs/batch-computing/slurm-queues.md
Expand Up @@ -18,60 +18,62 @@ submissions to the LOTUS and ORCHID clusters.

The Slurm queues in the LOTUS cluster are:

* `test`
* `short-serial`
* `long-serial`
* `par-single`
* `par-multi`
* `high-mem`
* `short-serial-4hr` (see Note 3)

Each queue has an attribute of run-length limits (e.g. short, long) and
- `test`
- `short-serial`
- `long-serial`
- `par-single`
- `par-multi`
- `high-mem`
- `short-serial-4hr`

Each queue is has attributes of run-length limits (e.g. short, long) and
resources. A full breakdown of each queue and its associated resources is
shown below in Table 1.

## Queue details

Queues represent a set of pending jobs, lined up in a defined order, and
waiting for their opportunity to use resources. The queue is specified in the
job script file using SLURM scheduler directive `#SBATCH -p <partition=queue_name>` where `<queue_name>` is the name of the
queue/partition (Table 1. column 1)
job script file using Slurm scheduler directive like this:

```bash
#SBATCH -p <partition=queue_name>`
```

where `<queue_name>` is the name of the queue/partition (Table 1. column 1)

Table 1 summarises important specifications for each queue such as run time
limits and the number of CPU core limits. If the queue is not specified, SLURM
limits and the number of CPU core limits. If the queue is not specified, Slurm
will schedule the job to the queue `short-serial` by default.

Table 1. LOTUS/Slurm queues and their specifications

Queue name | Max run time | Default run time | Max CPU cores
per job | Max CpuPer
UserLimit | Priority
Queue name | Max run time | Default run time | Max CPU cores per job | MaxCpuPerUserLimit | Priority
---|---|---|---|---|---
`test` | 4 hrs | 1hr | 8 | 8 | 30
`short-serial` | 24 hrs | 1hr | 1 | 2000 | 30
`par-single` | 48 hrs | 1hr | 16 | 300 | 25
`par-multi` | 48 hrs | 1hr | 256 | 300 | 20
`long-serial` | 168 hrs | 1hr | 1 | 300 | 10
`high-mem` | 48 hrs | 1hr | 1 | 75 | 30
`short-serial-4hr` ( **Note 3** ) | 4 hrs | 1hr | 1 | 1000 | 30
`short-serial-4hr`<br>(**Note 3**) | 4 hrs | 1hr | 1 | 1000 | 30
{.table .table-striped}

**Note 1** : Resources that the job requests must be within the resource
**Note 1** : Resources requested by a job must be within the resource
allocation limits of the selected queue.

**Note 2:** The default value for `--time=[hh:mm:ss]` (predicted maximum wall
time) is 1 hour for the six SLURM queues. If you do not specify this option
time) is 1 hour for the all queues. If you do not specify this option
and/or your job exceeds the default maximum run time limit then it will be
terminated by the SLURM scheduler.
terminated by the Slurm scheduler.

**Note 3** : A user must specify the SLURM job account `--account=short4hr`
when submitting a batch job to the provisional SLURM partition `short-
serial-4hr`
**Note 3** : A user must specify the Slurm job account `--account=short4hr`
when submitting a batch job to the `short-serial-4hr` queue.

## State of queues

The Slurm command `sinfo `reports the state of queues/partitions and nodes
managed by SLURM. It has a wide variety of filtering, sorting, and formatting
The Slurm command `sinfo` reports the state of queues and nodes
managed by Slurm. It has a wide variety of filtering, sorting, and formatting
options.

{{<command shell="bash">}}
Expand All @@ -97,14 +99,14 @@ as they implement different job scheduling and control policies.

## 'sinfo' Output field description:

By default, the SLURM command 'sinfo' displays the following information:
By default, the Slurm command 'sinfo' displays the following information:

* **PARTITION** : Partition name followed by "*" for the default queue/partition
* **AVAIL** : State/availability of a queue/partition. Partition state: up or down.
* **TIMELIMIT** : The maximum run time limit per job in each queue/partition is shown in TIMELIMIT in days- hours:minutes :seconds . e.g. 2-00:00:00 is two days maximum runtime limit
* **NODES** : Count of nodes with this particular configuration e.g. 48 nodes
* **STATE** : State of the nodes. Possible states include: allocated, down, drained, and idle. For example, the state "idle" means that the node is not allocated to any jobs and is available for use.
* **NODELIST** List of node names associated with this queue/partition
- **PARTITION** : Partition name followed by **\*** for the default queue/partition
- **AVAIL** : State/availability of a queue/partition. Partition state: up or down.
- **TIMELIMIT** : The maximum run time limit per job in each queue/partition is shown in TIMELIMIT in days- hours:minutes :seconds . e.g. 2-00:00:00 is two days maximum runtime limit
- **NODES** : Count of nodes with this particular configuration e.g. 48 nodes
- **STATE** : State of the nodes. Possible states include: allocated, down, drained, and idle. For example, the state "idle" means that the node is not allocated to any jobs and is available for use.
- **NODELIST** List of node names associated with this queue/partition

The `sinfo` example below, reports more complete information about the
partition/queue short-serial
Expand All @@ -116,12 +118,12 @@ sinfo --long --partition=short-serial
(out)short-serial* up 1-00:00:00 1-infinite no NO all 48 idle host[146-193]
{{</command>}}

## How to choose a SLURM queue/partition
## How to choose a Slurm queue/partition

### Test queue

The test queue `test` can be used to test new workflows and also to help new
users to familiarise themselves with the SLURM batch system. Both serial and
The `test` queue can be used to test new workflows and also to help new
users to familiarise themselves with the Slurm batch system. Both serial and
parallel code can be tested on the `test`queue. The maximum runtime is 4 hrs
and the maximum number of jobs per user is 8 job slots. The maximum number of
cores for a parallel job e.g. MPI, OpenMP, or multi-threads is limited to 8
Expand Down Expand Up @@ -171,32 +173,32 @@ submitted to the `par-single` queue . Each thread should be allocated one CPU
core. Oversubscribing the number of threads to the CPU cores will cause the
job to run very slow. The number of CPU cores should be specified via the
submission command line `sbatch -n <number of CPU cores>` or by adding the
SLURM directive `#SBATCH -n <number of CPU cores>`in the job script file. An
Slurm directive `#SBATCH -n <number of CPU cores>`in the job script file. An
example is shown below:

{{<command>}}
sbatch --ntasks=4 --partition=par-single < myjobscript
{{</command>}}

Note: Jobs submitted with a number of CPU cores greater than 16 will be
terminated (killed) by the SLURM scheduler with the following statement in the
terminated (killed) by the Slurm scheduler with the following statement in the
job output file:

#### par-multi

Distributed memory jobs with inter-node communication using the MPI library
should be submitted to the `par-multi` queue . A single MPI process (rank)
should be allocated a single CPU core. The number of CPU cores should be
specified via the SLURM submission command flag `sbatch -n <number of CPU
cores>` or by adding the SLURM directive `#SBATCH -n <number of CPU cores>`
specified via the Slurm submission command flag `sbatch -n <number of CPU
cores>` or by adding the Slurm directive `#SBATCH -n <number of CPU cores>`
to the job script file. An example is shown below:

{{<command>}}
sbatch --ntasks=4 --partition=par-multi < myjobscript
{{</command>}}

Note 1: The number of CPU cores gets passed from SLURM submission flag `-n` .
Note 1: The number of CPU cores gets passed from Slurm submission flag `-n` .
Do not add the `-np` flag to `mpirun` command .

Note 2: SLURM will reject a job that requires a number of CPU cores greater
Note 2: Slurm will reject a job that requires a number of CPU cores greater
than the limit of 256.

0 comments on commit 80e4666

Please sign in to comment.