<img src="./workshop/New_accre_logo.png" style="width: 400px;">

# 1.0 Overview of the Class Environment & SLURM

This notebook will introduce the basic knowledge of using ACCRE cluster at Vanderbilt. You will have an overview of the Class Environment configured as an ACCRE compute cluster. In addition, you will experiment with basic commands of the [SLURM cluster management](https://slurm.schedmd.com/overview.html).

### Learning Objectives

The goals of this notebook are to:
* Understand the hardware configuration available for the class
* Understand the basics commands for jobs submissions with SLURM
* Run simple test scripts allocating different GPU resources
* Connect interactively to a compute node and observe available resources

**[1.1 The Hardware Configuration Overview](#1.1-The-Hardware-Configuration-Overview)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[1.1.1 Check The Available CPUs](#1.1.1-Check-The-Available-CPUs)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[1.1.2 Check the Available GPUs](#1.1.2-Check-The-Available-GPUs)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[1.1.3 Check The Interconnect Topology](#1.1.3-Check-The-Interconnect-Topology)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[1.1.4 Bandwidth & Connectivity Tests](#1.1.4-Bandwidth-and-Connectivity-Tests)<br>
**[1.2 Basic SLURM Commands](#1.2-Basic-SLURM-Commands)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[1.2.1 Check the SLURM Configuration](#1.2.1-Check-the-SLURM-Configuration)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[1.2.2 Submit Jobs Using SRUN Command](#1.2.2-Submit-jobs-using-SRUN-Command)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[1.2.3 Submit Jobs Using SBATCH Command](#1.2.3-Submit-jobs-using-SBATCH-Command])<br>
&nbsp;&nbsp;&nbsp;&nbsp;[1.2.4 Exercise: Submit Jobs Using SBATCH Command Requesting More Resources](#1.2.4-Exercise-Submit-jobs-using-SBATCH-Command])<br>
**[1.3 Run Interactive Sessions](#1.3-Run-Interactive-Sessions)<br>**

---
# 1.1 The Hardware Configuration Overview


A modern AI cluster is a type of infrastructure designed for optimal Deep Learning model development. ACCRE has GPU servers suitable for scalable AI development. Click the link to learn more about [ACCRE GPUs](https://help.accre.vanderbilt.edu/index.php?title=GPUs_at_ACCRE).

Different deliveries of this course may have different hardware configurations. For benchmarking purposes, we will be using 2 A100s/A6000 as a reference that are NVLink. NVLink is a flexible and scalable interconnect technology, enabling multiple GPUs with a variety of interconnect topologies and bandwidths.

<img  src="nvlink_configurability-624x289.png" width="1000"/>

The hardware for this class has already been configured as a GPU cluster unit for Deep Learning. The cluster is organized as compute units (nodes) that can be allocated using a Cluster Manager (example SLURM). Among the hardware components, the cluster includes CPUs (Central Processing Units), GPUs (Graphics Processing Units), storage and networking.

Let's look at the GPUs, CPUs and network design available in this class.

## 1.1.1 Check the Available CPUs 

We can check the CPU information of the system using the `lscpu` command. 

This example of outputs shows that there are 12 CPU cores of the `x86_64` from Intel.
```
Architecture:                    x86_64
Core(s) per socket:              6
Model name:                      Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
```
For a complete description of the CPU processor architecture, check the `/proc/cpuinfo` file.


In [1]:
# Display information CPUs
!lscpu

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         52 bits physical, 57 bits virtual
  Byte Order:            Little Endian
CPU(s):                  256
  On-line CPU(s) list:   0-255
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 9554 64-Core Processor
    CPU family:          25
    Model:               17
    Thread(s) per core:  2
    Core(s) per socket:  64
    Socket(s):           2
    Stepping:            1
    Frequency boost:     enabled
    CPU(s) scaling MHz:  82%
    CPU max MHz:         3762.9880
    CPU min MHz:         1500.0000
    BogoMIPS:            6191.25
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                         a cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall n
                         x mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_go
                         od amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfm
                     

In [2]:
# Check the number of CPU cores
!grep 'cpu cores' /proc/cpuinfo | uniq

cpu cores	: 64


## 1.1.2 Check the Available  GPUs 

The NVIDIA System Management Interface `nvidia-smi` is a command for monitoring NVIDIA GPU devices. Several key details are listed such as the CUDA and  GPU driver versions, the number and type of GPUs available, the GPU memory each, running GPU process, etc.

In the following example, `nvidia-smi` command shows that there are GPUs, each with approximately 80GB of memory. 

<img  src="Nvidasmi.png" width="600"/>

For more details, refer to the [nvidia-smi documentation](https://developer.download.nvidia.com/compute/DCGM/docs/nvidia-smi-367.38.pdf).

In [3]:
# Display information about GPUs
!nvidia-smi

Thu Feb  6 13:14:31 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100 80GB PCIe          On  |   00000000:01:00.0 Off |                    0 |
| N/A   29C    P0             53W /  300W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100 80GB PCIe          On  |   00

## 1.1.3 Check the Available Interconnect Topology 



The multi-GPU system configuration needs a fast and scalable interconnect. [NVIDIA NVLink technology](https://www.nvidia.com/en-us/data-center/nvlink/) is a direct GPU-to-GPU interconnect providing high bandwidth and improving scalability for multi-GPU systems.

To check the available interconnect topology, we can use `nvidia-smi topo --matrix` command. In this class, we should get 4 NVLinks per GPU device. 

```
        GPU0    GPU1    GPU2    GPU3    CPU Affinity    NUMA Affinity
GPU0     X      NV12    SYS     SYS     0-23            N/A
GPU1    NV12     X      SYS     SYS     24-47           N/A
GPU2    SYS     SYS      X      NV12    48-71           N/A
GPU3    SYS     SYS     NV12     X      72-95           N/A

Where X= Self and NV# = Connection traversing a bonded set of # NVLinks
```

In this environment, notice only 1 link between GPU0 and GPU1, GPU2 while 2 links are shown between GPU0 and GPU3.

In [4]:
# Check Interconnect Topology 
!nvidia-smi topo --matrix

	[4mGPU0	GPU1	GPU2	GPU3	CPU Affinity	NUMA Affinity	GPU NUMA ID[0m
GPU0	 X 	NV12	NODE	NODE	0-63,128-191	0		N/A
GPU1	NV12	 X 	NODE	NODE	0-63,128-191	0		N/A
GPU2	NODE	NODE	 X 	NV12	0-63,128-191	0		N/A
GPU3	NODE	NODE	NV12	 X 	0-63,128-191	0		N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks


It is also possible to check the NVLink status and bandwidth using `nvidia-smi nvlink --status` command. You should see similar outputs per device.
```
GPU 0: Graphics Device
	 Link 0: 25 GB/s
	 Link 1: 25 GB/s
	 Link 2: 25 GB/s
	 Link 3: 25 GB/s
```

In [5]:
# Check nvlink status
!nvidia-smi nvlink --status

GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-6a085fc2-2a1d-ef32-5f72-cb4c1ea69d70)
	 Link 0: 25 GB/s
	 Link 1: 25 GB/s
	 Link 2: 25 GB/s
	 Link 3: 25 GB/s
	 Link 4: 25 GB/s
	 Link 5: 25 GB/s
	 Link 6: 25 GB/s
	 Link 7: 25 GB/s
	 Link 8: 25 GB/s
	 Link 9: 25 GB/s
	 Link 10: 25 GB/s
	 Link 11: 25 GB/s
GPU 1: NVIDIA A100 80GB PCIe (UUID: GPU-14368e3e-0b7a-c7b3-6369-324815faa132)
	 Link 0: 25 GB/s
	 Link 1: 25 GB/s
	 Link 2: 25 GB/s
	 Link 3: 25 GB/s
	 Link 4: 25 GB/s
	 Link 5: 25 GB/s
	 Link 6: 25 GB/s
	 Link 7: 25 GB/s
	 Link 8: 25 GB/s
	 Link 9: 25 GB/s
	 Link 10: 25 GB/s
	 Link 11: 25 GB/s
GPU 2: NVIDIA A100 80GB PCIe (UUID: GPU-b96be928-c571-9e9b-2c63-69a587a6f306)
	 Link 0: 25 GB/s
	 Link 1: 25 GB/s
	 Link 2: 25 GB/s
	 Link 3: 25 GB/s
	 Link 4: 25 GB/s
	 Link 5: 25 GB/s
	 Link 6: 25 GB/s
	 Link 7: 25 GB/s
	 Link 8: 25 GB/s
	 Link 9: 25 GB/s
	 Link 10: 25 GB/s
	 Link 11: 25 GB/s
GPU 3: NVIDIA A100 80GB PCIe (UUID: GPU-d8717bb1-7cba-4378-ade8-6c3fd2ff0e67)
	 Link 0: 25 GB/s
	 Link 1: 25 GB/

## 1.1.4 Bandwidth & Connectivity Tests


NVIDIA provides an application **p2pBandwidthLatencyTest** that demonstrates CUDA Peer-To-Peer (P2P) data transfers between pairs of GPUs by computing bandwidth and latency while enabling and disabling NVLinks. This tool is part of the code samples for CUDA Developers [cuda-samples](https://github.com/NVIDIA/cuda-samples.git). 

Why it Matters? Models like AF3 rely on rapid access to tensors in memory:
   -  High-bandwidth memory (HBM) on GPUs allows faster matrix multiplications and reduces time spent waiting for data.
   -  A100 GPUs, for instance, have >1.5 TB/s memory bandwidth, crucial for large protein structures.
   -  NVLink (e.g., in DGX systems) enables fast GPU-GPU communication, reducing transfer delays in parallelized training.

Example outputs are shown below. Notice the Device to Device (D\D) bandwidth differences when enabling and disabling NVLinks (P2P).

```
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1      2      3 
     0 1529.61 516.36  20.75  21.54 
     1 517.04 1525.88  20.63  21.33 
     2  20.32  20.17 1532.61 517.23 
     3  20.95  20.83 517.98 1532.61 

Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1      2      3 
     0 1532.61  18.09  20.79  21.52 
     1  18.11 1531.11  20.65  21.33 
     2  20.32  20.17 1528.12  28.89 
     3  20.97  20.82  28.36 1531.11 
```


In [6]:
# Tests on GPU pairs using P2P and without P2P 
#'git clone --depth 1 --branch v11.2 https://github.com/NVIDIA/cuda-samples.git'
!cuda-samples/bin/x86_64/linux/release/p2pBandwidthLatencyTest

[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, NVIDIA A100 80GB PCIe, pciBusID: 1, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA A100 80GB PCIe, pciBusID: 21, pciDeviceID: 0, pciDomainID:0
Device: 2, NVIDIA A100 80GB PCIe, pciBusID: 41, pciDeviceID: 0, pciDomainID:0
Device: 3, NVIDIA A100 80GB PCIe, pciBusID: 61, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=0 CAN Access Peer Device=2
Device=0 CAN Access Peer Device=3
Device=1 CAN Access Peer Device=0
Device=1 CAN Access Peer Device=2
Device=1 CAN Access Peer Device=3
Device=2 CAN Access Peer Device=0
Device=2 CAN Access Peer Device=1
Device=2 CAN Access Peer Device=3
Device=3 CAN Access Peer Device=0
Device=3 CAN Access Peer Device=1
Device=3 CAN Access Peer Device=2

***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.

P2P Connectivity Matrix
     D\D     0     

---
# 1.2 Basic SLURM Commands

Now that we've seen how GPUs can communicate with each other over NVLink, let's go over how the hardware resources can be organized into compute nodes. These nodes can be managed by Cluster Manager such as [*Slurm Workload Manager*](https://slurm.schedmd.com/), an open source cluster management and job scheduler system for large and small Linux clusters. 


For this lab, we have configured a SLURM manager where the 2 available GPUs are partitioned into 2 nodes: **slurmnode1** 
and **slurmnode2**, each with 2 GPUs. 

Next, let's see some basic SLURM commands. More SLURM commands can be found in the [SLURM official documentation](https://slurm.schedmd.com/).

<img src="arch.gif" width="500"/>

## 1.2.1 Check the SLURM Configuration

We can check the available resources in the SLURM cluster by running `sinfo`. The output will show that there are 2 nodes in the cluster **slurmnode1** and **slurmnode2**. Both nodes are currently idle.

In [7]:
# Check available resources in the cluster
!sinfo

PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
production*    up 14-00:00:0      3  down* cn[1413,1611,1622]
production*    up 14-00:00:0      2   unk* cn[1588,1625]
production*    up 14-00:00:0      3   drng cn[1270,1620,1627]
production*    up 14-00:00:0      3  drain cn[1268,1276,1289]
production*    up 14-00:00:0     90    mix cn[1205,1208,1211,1213-1214,1223-1224,1226,1228-1229,1232,1234,1236,1238-1239,1257,1269,1273-1274,1285,1288,1293,1302,1305-1306,1317,1323,1331,1345,1351,1354,1357,1365,1367-1368,1371,1374,1379,1395,1397-1398,1400-1405,1409-1412,1414-1416,1418-1421,1424-1425,1580,1582-1584,1587,1589-1591,1593-1596,1602-1604,1606,1610,1612-1613,1615-1616,1621,1626,1628,1700-1705]
production*    up 14-00:00:0    145  alloc cn[1201-1204,1206-1207,1209-1210,1212,1215-1222,1225,1227,1230,1233,1235,1237,1240-1242,1258-1262,1264-1267,1271-1272,1275,1277-1284,1286-1287,1290-1292,1294-1301,1304,1307-1316,1318,1320-1322,1324,1326-1328,1330,1332-1344,1346-1349,1352-1353,1355-1356,13

##  1.2.2 Submit Jobs Using `srun` Command

The `srun` command allows to running parallel jobs. 

The argument **-N** (or *--nodes*) can be used to specify the nodes allocated to a job. It is also possible to allocate a subset of GPUs available within a node by specifying the argument **-G (or --gpus)**.

Check out the [SLURM official documentation](https://slurm.schedmd.com/) for more arguments.

To test running parallel jobs, let's submit a job that requests 1 node (2 GPUs) and run a simple command on it: `nvidia-smi`. We should see the output of 2 GPUs available in the allocated node.

In [9]:
# run nvidia-smi slurm job with 1 node allocation
!nvidia-smi

Thu Feb  6 13:35:30 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100 80GB PCIe          On  |   00000000:01:00.0 Off |                    0 |
| N/A   29C    P0             53W /  300W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100 80GB PCIe          On  |   00

## 1.2.3 Submit Jobs Using `sbatch` Command 

In the previous examples, we allocated resources to run one single command. For more complex jobs, the `sbatch` command allows submitting batch scripts to SLURM by specifying the resources and all environment variables required for executing the job. `sbatch` will transfer the execution to the SLURM Manager after automatically populating the arguments.

In the batch script below, `#SBATCH ...` is used to specify resources and other options relating to the job to be executed:

```
        #!/bin/bash
        #SBATCH -N 1                               # Node count to be allocated for the job
        #SBATCH --job-name=firstSlurmJob           # Job name
        #SBATCH -o /MY/PATH/logs/%j.out           # Outputs log file 
        #SBATCH -e /MY/PATH/logs/%j.err           # Errors log file

        srun -l my_script.sh                       # my SLURM script 
```

In [10]:
#!chmod +x /dli/code/test.sh
# Check the batch script 
!cat /home/soubasbj/alphafold3/specialrun.sh

#!/bin/bash
#SBATCH --job-name=alphafold3_job
#SBATCH --output=/home/soubasbj/alphafold3/af3-fold_pgp_apo_noseed_rep1_job_request.log
#SBATCH --partition=a100
#SBATCH --account=mchaourab_acc
#SBATCH --gres=gpu:2
#SBATCH --time=2-00:00:00
#SBATCH --nodes=1
#SBATCH --cpus-per-task=180
#SBATCH --mem=800G

# Written by Brandon Soubasis

# Load required modules
module load CUDA

# Define paths
BASE_INPUT_DIR="/home/soubasbj/alphafold3/fasta_files/initial_screening"
OUTPUT_DIR="/gpfs51/dors2/mchaourablab/home/soubasbj/AF3_output"
SINGULARITY_IMAGE="/home/soubasbj/alphafold3/alphafold3_latest.sif"
DB_DIR="/gpfs51/dors2/mchaourablab/home/soubasbj/alphafold3_databases"
MODEL_DIR="/gpfs51/dors2/mchaourablab/home/soubasbj/alphafold3_models"

# Specify the single JSON input file
INPUT_JSON="/home/soubasbj/alphafold3/fasta_files/initial_screening/Pgp_Initial_screening_rep/fold_pgp_apo_noseed_rep1_job_request.json"
INPUT_NAME=$(basename "$INPUT_JSON" .json)
SUBDIR=$(dirname "$INPUT_JSON")

echo "Run

To submit this batch script job, let's create an `sbatch` script that initiates the resources to be allocated and submits the test.sh job.

The following cell will edit the `test_sbatch.sbatch` script allocating 1 node.

In [None]:
!srun -l /home/soubasbj/alphafold3/specialrun.sh

Now let's submit the `sbatch` job and check the SLURM scheduler. The batch script will be queued and executed when the requested resources are available.

The `squeue` command shows the running or pending jobs. An output example is shown below: 

```
Submitted batch job **
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                **  slurmpar soubasbj    admin  R       0:01      1 slurmnode1

```

It shows the SLURM Job ID, Job's name, the user ID, Job's Status (R=running), running duration and the allocated node name.

The following cell submits the `sbatch` job, collects the `JOBID` variable (for querying later the logs) and checks the jobs in the SLURM scheduling queue.

In [14]:
# Submit the job
#!sbatch /home/soubasbj/alphafold3/specialrun.sh

# Get the JOBID variable
#JOBID=!squeue -u soubasbj | grep alphafold | awk '{print $1}'
#slurm_job_output='/dli/nemo/logs/'+JOBID[0]+'.out'

# check the jobs in the SLURM scheduling queue
!squeue -u soubasbj
#!scancel 1894053

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)


The output log file for the executed job (**JOBID.out**) is automatically created to gather the outputs.

In our case, we should see the results of `nvidia-smi` command that was executed in the `test.sh` script submitted with 1 node allocation. Let's have a look at execution logs:


In [15]:
# Check the execution logs 
!cat /home/soubasbj/alphafold3/af3-fold_msba_2mg_2atp_noseed_rep1_job_request.log

Running AlphaFold 3 for: /home/soubasbj/alphafold3/fasta_files/initial_screening/MsbA_Initial_screening_rep/fold_msba_2mg_2atp_noseed_rep1_job_request.json
Running AlphaFold 3 iteration: 1
Seed used for iteration 1: 16117
I0204 14:10:28.463452 22436754387136 folding_input.py:1044] Detected /root/af_input/fold_input.json is an AlphaFold 3 JSON since the top-level is not a list.
I0204 14:10:32.247563 22436754387136 xla_bridge.py:895] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
I0204 14:10:32.283560 22436754387136 xla_bridge.py:895] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
I0204 14:10:36.399066 22436754387136 pipeline.py:404] Skipping MSA and template search for protein chain A because it already has MSAs and templates.
I0204 14:10:36.399350 22436754387136 pipeline.py:404] Skipping MSA and template search for protein chain B becau

## 1.2.4  Exercise: Submit Jobs Using `sbatch` Command  Requesting More Resources (SKIP)


Using what you have learned, submit the previous `test.sh` batch script with the `sbatch` command on **2 nodes** allocation.

To do so, you will need to:
1. Modify the `test_sbatch.sbatch` script to allocate 2 Nodes 
2. Submit the script again using `sbatch` command
3. Check the execution logs 

---
# 1.3 Run Interactive Sessions 

Interactive sessions allow to connect directly to a worker node and interact with it through the terminal. 

The SLURM manager allows to allocate resources in interactive session using the `--pty` argument as follows: `srun -N 1 --pty /bin/bash`. 
The session is closed when you exit the node or you cancel the interactive session job using the command `scancel JOBID`.


Since this is an interactive session, first, we need to launch a terminal window and submit a slurm job allocating resources in interactive mode. To do so, we will need to follow the 3 steps: 
1. Launch a terminal session
2. Check the GPUs resources using the command `nvidia-smi` 
3. Run an interactive session requesting 1 node by executing `srun -N 1 --pty /bin/bash`
4. Check the GPUs resources using the command `nvidia-smi` again 

Let's run our first interactive job requesting 1 node and check what GPU resources are at our disposal. 


Notice that while connected to the session, the host name as displayed in the command line changes from "lab" (login node name) to "slurmnode1" indicating that we are now successfully working on a remote worker node.

Run the following cell to get a link to open a terminal session and the instructions to run an interactive session.

<pre>
   Step 1: Open a terminal session by following this <a href="", data-commandlinker-command="terminal:create-new">Terminal link</a>
   Step 2: Check the GPUs resources: <font color="green">nvidia-smi</font>
   Step 3: Run an interactive session: <font color="green">srun -N 1 --pty /bin/bash</font>
   Step 4: Check the GPUs resources again: <font color="green">nvidia-smi</font>
</pre>

---
<h2 style="color:green;">Congratulations!</h2>

You've made it through the first section of the presentation and are ready to begin training models on multiple GPUs. <br>

Before moving on, we need to make sure that no jobs are still running or waiting on the SLURM queue. 
Let's check the SLURM jobs queue by executing the following cell:

In [17]:
# Check the SLURM jobs queue 
!squeue -u $USER

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)


In [16]:
# Cancel admin user jobs
!scancel -u $USER

# Check again the SLURM jobs queue (should be either empty, or the status TS column should be CG)
#!squeue -u $USER

Next, we will be running basic model training on different distribution configurations.