# SLING usage tutorial

<sup>This notebook is a part of Natural Language Processing class at the University of Ljubljana, Faculty for computer and information science. Please contact [slavko.zitnik@fri.uni-lj.si](mailto:slavko.zitnik@fri.uni-lj.si) for any comments.</sub>

In this document we show an example how to use a SLING HPC to run tasks that need GPU  devices. To get access, **send your e-mail address to Slavko Žitnik** until the end of the week!

You can get more information on SLING at their [official web site](https://www.sling.si). Useful documentation and login instructions are available here - [https://doc.sling.si/navodila/clusters/](https://doc.sling.si/navodila/clusters/). By default, [SLURM](https://slurm.schedmd.com) is used to submit and manage jobs.

FRI users will have access to NSC where 4 GPUs (NVIDIA Tesla K40) are always reserved. Otherwise, cluster consists of the following main computer centers:

* nsc-login.ijs.si (5x NVIDIA Ampere, 16x NVIDIA K40c)
* trdina-login.fis.unm.si (4x NVIDIA v100)
* rmaister.hpc-rivr.um.si (24x NVIDIA v100)
* Arnes (hpc-login.arnes.si, ARC-only access) (48x NVIDIA v100)
* Vega (login.vega.izum.si, currently test users only) (240x NVIDIA Ampere)

## Steps to use a GPU using Singularity

### Step 1 (get access)

First you need to retrieve username and password. Then use those credentials and login to [https://fido.sling.si/](https://fido.sling.si/) to enter your public SSH key (similar to GitHub access).

Probably you need to copy contents of `~/.ssh/id_rsa.pub` file and add it to Fido web site. This will enable SSH access to the login node.

![](fido-ssh.png)

### Step 2 (login and prepare environment)

Login via ssh and then build a Singularity container. 

You can start from prebuilt Docker images (e.g. [tensorflow](https://hub.docker.com/r/tensorflow/tensorflow) or [pytorch](https://hub.docker.com/r/pytorch/pytorch)) and build a Singularity image.

```bash
mkdir containers
singularity build ./containers/container-tf-2.4.1.sif docker://tensorflow/tensorflow:2.4.1-gpu
```

Install additional libraries that your source might need.

```bash
singularity exec ./containers/container-tf-2.4.1.sif pip install tensorflow-gpu==2.4.1 keras==2.4.3 pandas==1.1.5 numpy==1.19.2

```

Copy your code from local machine (e.g. using SCP) or clone a source code repository.

```bash
git clone https://github.com/szitnik/NLP-Course-Tutorials.git
```

### Step 3 (prepare and submit a job)

Create a separate folder for your log files.

```bash
mkdir logs
```

Create a job script (e.g. `nano run-slurm.sh`).

```bash
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:1 #--gpus=1
#SBATCH --time=00:30:00
#SBATCH --output=logs/sling-nlp-showcase-%J.out
#SBATCH --error=logs/sling-nlp-showcase-%J.err
#SBATCH --job-name="SLING NLP showcase"

srun singularity exec --nv ./containers/container-tf-2.4.1.sif python \
    "NLP-Course-Tutorials/08 - Neural networks examples and hardware/SLING Example/IMDB_Multiple_NN_Example.py"
```

Submit job.

```bash
sbatch run-slurm.sh
```

### Step 4 (check status of a job, logs, results)

By default your current working directory is mapped directly to a container, so you can read/write files based on relative path from directory where job was run (otherwise you can use bind commands when running Singularity.)

Some useful commands:

```bash
# Get current global queue
squeue

# Get the queue of your jobs only
squeue -u szitnik

# Get an (rough) estimation of starting your job
squeue -j <jobid> --start

# Get basic information of a waiting job
sacct -j <jobid>

# Get some useful statistics of your running job (CPU, MEM, ...)
sstat -j <jobid>

# Cancel a job
scancel <jobid>
```