# Tutorial 1: First steps on the supercomputer

**Content creators**: Stefan Kesselheim, Jan Ebert, Fabian Emmerich

**Content reviewers / testers**: Jannik Jauch

In this first tutorial, you will be doing first steps on **JUWELS**, including **JUWELS Booster** (Booster hereafter). This tutorial assumes that you are familiar with the command prompt to a minimum degree.

JUWELS has two different types of nodes:

1. Login Nodes: The entry point to the system.
   - Users here log in to manage their workspaces, move data, and submit jobs that are supposed to be run on the cluster.
   - Login nodes are not designed for computational workloads!
   - JUWELS in total has 16 login nodes (JUWELS: 12, Booster: 4).
1. Compute Nodes: The compute power of the system.
   - Each node has multiple CPUs (JUWELS: 40/48, Booster 96), a large amount of RAM (JUWELS: 96/192, Booster: 512GB). 
   - Booster is a system that was especially designed for usage of GPUs and thus is equiped with 4 NVidia A100 GPUs (4 x 40GB vRAM) per node.
   - Compute nodes are detached from the internet.
   - JUWELS in total has 2567 nodes, Booster 936.

For detailed overviews of each system see [here for JUWELS](https://apps.fz-juelich.de/jsc/hps/juwels/configuration.html#hardware-configuration-of-the-system-name-cluster-module) and [here for Booster](https://apps.fz-juelich.de/jsc/hps/juwels/booster-overview.html).

## Exercise 0: Install an SSH client

Before you can actually start, it is required that an SSH client is installed on your machine. On both, Mac and Linux, an SSH client should be installed by default. On Windows, it is recommended to install the Windows Subsystem for Linux (WSL). On older Windows versions without WSL, you have to install a terminal emulator like [PuTTY](https://www.chiark.greenend.org.uk/~sgtatham/putty/).

## Exercise 1: SSH connection to JUWELS

JSC has very strict security restrictions to avoid misuse of the compute resources. JSC does not allow logging into the systems solely with a password, but requires key-based authentication via SSH connections from whitelisted IPs.

As a first step, you will create a SSH key pair for public/private key authentification. Then, you will register the public keys for access to JUWELS using the JuDoor web page. To do so, it is required to add a meaningful restriction of the range of IPs or hostnames that are allowed to connect to JUWELS. Finally, you will be able to connect to Juwels. This exercise guides you through the process that is explained in more detail in the [Juwels access documentation pages](https://apps.fz-juelich.de/jsc/hps/juwels/access.html).

Execute the following command in the command line to create an ED25519 key pair directly into your `.ssh` directory.

```bash
ssh-keygen -a 100 -t ed25519 -f ~/.ssh/maelstrom-bootcamp
```

where 

  - `-a` is the number of rounds for the hashing algorithm, i.e. how often the algorithm computes the key hash.
  - `-t` is the hashing algorithm used.
  - `-f` is the path and file name in which the SSH keys will be stored.

**Note:** Simply press `ENTER` when asked for a passphrase.

```bash
$ ssh-keygen -a 100 -t ed25519 -f ~/.ssh/maelstrom-bootcamp
Generating public/private ed25519 key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/user/.ssh/maelstrom-bootcamp
Your public key has been saved in /home/user/.ssh/maelstrom-bootcamp.pub
The key fingerprint is:
SHA256:ZBAVl31cLkKm+Cmp/IKRgDXMzjA2E2n7UwPwjOumfhA user@host
The key's randomart image is:
+--[ED25519 256]--+
| =+   ooo..oo. ..|
|o*O.   . o.+. o. |
|oO++.   + . ... .|
|.E=  o o o . . . |
| .o.... S o      |
|.. oo. . .       |
| o. .oo          |
|o  .. ..         |
|o..    ..        |
+----[SHA256]-----+
```

On Windows, you must define a different storage location for the key pair, but otherwise the command works. In WSL you can execute the command right away.

The command generated two keys: a public one (`maelstrom-bootcamp.pub`) and a private one (`maelstrom-bootcamp`). 

```bash
$ ls ~/.ssh
/home/user/.ssh/maelstrom-bootcamp  /home/user/.ssh/maelstrom-bootcamp.pub
```

The public key (ending in `.pub`) is similar to your hand-written signature: you may give it to others who can then use it to confirm your identity. The private key (`maelstrom-bootcamp`) **must not** be shared. Continuing with the hand-written signature analogy, the private key is the way _you_ write your signature. Just as you would not give others the ability to perfectly copy your hand-written signature, you should under no circumstance publicize your private key.

Before you can add your public SSH key to the list of authorized SSH keys for JUWELS, you must create a valid *from-clause* that meaningfully restricts the range of IPs for which an SSH connection with the given key will be permitted (whitelisted). You have several options to that, e.g. check the IP range of your internet service provider (ISP). If you know the IP of your ISP, or if you can connect to a VPN giving you a fixed IP range (FZ Jülich's VPN is an example, but other institutions work as well), this is very easy. You can directly use the IP range as a *from-clause*. For FZ Jülich your *from-clause* would be:

```bash
from="134.94.0.0/16"
```

Note that the `/16` indicates the subnet, hence all adresses of the form 134.94.\*.\* will be allowed. If you use this option, you can directly jump to the point *Register your public key*.

We also show here the slightly more difficult steps to create a *from-clause* based on reverse DNS lookup.

1. Visit the [JuDoor page](https://judoor.fz-juelich.de). Prior to this course, you should have visited this page to register and get access to the compute resources. Under the header **Systems**, find `juwels > Manage SSH-keys` and navigate to it.
   <img src="./images/judoor-manage-ssh-keys.png" width="80%" height="80%">
1. On this page, your IP should be visible. Example: *Your current IP address is 37.201.214.241*.
   <img src="./images/judoor-current-ip.png" width="80%" height="80%">
1. Perform a reverse DNS search of your IP and extract the DNS name (the field *Name*) associated with your IP. Type into your command line:

   ```bash
   nslookup <your-ip>
   ```

   Example results:
   *Name:    aftr-37-201-214-241.unity-media.net* or *\[...\] name = aftr-37-201-214-241.unity-media.net*
1. Guess a wildcard pattern that will likely apply for all future connections. For example `*.unity-media.net`.

### Register your public key.

Now, you can register your key pair in JuDoor: Go to the [JuDoor page](https://judoor.fz-juelich.de) and navigate to `juwels > Manage SSH-keys`.

<img src="./images/judoor-manage-ssh-keys.png" width="80%" height="80%">

Now, you have two options to add your key to the system:

1. Manually entering the SSH key:
   1. Create a *from-clause* from your wildcard expression or current IP and enter it into the field `Your public key and options string`, but do not confirm yet. 
   1. Open your public key file `~/.ssh/maelstrom-bootcamp.pub` to copy your public key. 
      You can e.g. use `less` combined with `echo` to get the file content to the command prompt via `echo $(less ~/.ssh/maelstrom-bootcamp.pub)`
   
      ```bash
      $ echo $(less ~/.ssh/maelstrom-bootcamp.pub)
      ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIc92KEIyXu2l/EpNx6wwgofefsUvUpPCslw25hQENz8 user@host
      ```
      
      **Note the file ending `.pub`!**
   1. Copy the public key into the same field (making sure there is a single space between the *from-clause* and the contents of the file). 
   
      <img src="./images/judoor-add-ssh-key-string.png" width="100%" height="100%">
   
   1. Select `Start upload of SSH-Keys`. 
1. Uploading the file: 
   1. In the `Your public key file` field, you can upload the public key previously generated (`~/.ssh/maelstrom-bootcamp.pub`). 
   **Note the file ending `.pub`!**
   1. In the `Additional public key options` field, create a *from-clause* from your wildcard expression or IP, e.g. `from="91.66.91.125"`.

   <img src="./images/judoor-add-ssh-key-file.png" width="100%" height="100%">
   
   1. Select `Start upload of SSH-Keys`.


After a few minutes, your newly added SSH key should be available. Note that JuDoor writes the file `~/.ssh/authorized_keys` in your JUWELS home directory, thus manually added SSH keys will automatically be overwritten.

Finally, you can log into JUWELS, using

```bash
ssh -i ~/.ssh/maelstrom-bootcamp <username>@juwels-booster.fz-juelich.de
```

*Note*: To connect to Booser, use `<username>@juwels-booster.fz-juelich.de`.

If you have created the key pair in `~/.ssh/` it is possible to omit the `-i` option as `ssh` will try all keys in your `.ssh` directory by default. Your username is identical to the username in the JuDoor website, typically *lastname1*.

Alternatively, you can add configuration to your `~/.ssh/config` file to simplify the SSH commands. To do so, edit the `~/.ssh/config` file with an editor of your choice and add the following entries:

```
Host juwels
    HostName juwels.fz-juelich.de
    User <username>
    Port 22
    IdentityFile ~/.ssh/maelstrom-bootcamp

Host juwels-booster
    HostName juwels-booster.fz-juelich.de
    User <username>
    Port 22
    IdentityFile ~/.ssh/maelstrom-bootcamp
```

This enables you to connect by simply typing `ssh juwels`/`ssh juwels-booster` in the terminal.

### Tasks

Once SSH is up and running, you are ready to perform a few tasks.

1. Create a personal directory named like your user in the project folder located in `/p/project1/training2223/`.
   ```bash
   mkdir /p/project1/training2223/${USER}
   ```
1. Navigate to the project folder.
   ```bash
   cd /p/project1/training2223/${USER}
   ```   
1. Clone the [course material Git reposity](https://gitlab.jsc.fz-juelich.de/esde/training/maelstrom_bootcamp) to that folder.
   ```bash
   git clone https://gitlab.jsc.fz-juelich.de/esde/training/maelstrom_bootcamp.git
   ```

## Exercise 2: Your first Slurm job

After logging into JUWELS Booster, you may realize you are not actually connected to a node named `juwels-booster.fz-juelich.de`. The hostname might by `jwlogin23.juwels` as you have been redirected onto one of the many login nodes. To launch jobs on the compute nodes, JSC uses the workload manager Slurm.

Slurm has a set of commands allowing you to interact with the compute nodes. For a full list, see the [Slurm Quickstart Commands Section](https://slurm.schedmd.com/quickstart.html).

To get an overview of what partitions are available, use the `sinfo` command.

In order to start a job on the compute node, please type in the following command:

```bash
srun --pty --nodes=1 -A training2223 --partition booster --gres gpu --time=00:15:00 /bin/bash
```

Notice how the command prompt changes. For example, when writing this tutorial, it changed to `kesselheim1@jwb0012`. 

<img src="./images/prompt.png" width="80%" height="80%">

Now you have started an interactive job on a compute node, where the name of the node within the LAN is `jwb0012`. Execute `nvidia-smi` to check the status of the GPUs installed on the machine you have been assigned to. You can also specify the number of required GPUs with `--gres gpu:<number of GPUs>`, e.g. `--gres gpu:1`.

Open a second terminal, `ssh` to JUWELS and use the command `squeue` to check the status of your job. Use 
```bash
squeue
```
to inspect the current status of the queues. Enter
```
squeue -u <username>
```
to filter out only the lines of `squeue` that contain entries belonging to you user.

### Tasks
1. What is the meaning of the `training2223` in the upper command?
1. What is the partition and queue that your job was assigned to?
1. What is your job's Slurm job ID?
1. Can you download a file from the internet from the compute node?
   Try for example
   ```
   curl https://gitlab.jsc.fz-juelich.de/esde/training/maelstrom_bootcamp/-/blob/master/README.md
   ```
1. Cancel your job by using `scancel <job_id>`.

## Exercise 3: Batch jobs
In the previous tutorial, you have learned how to run an interactive Slurm job. In practice, you will often run a longer running job as a *batch job* that will wait in the queue until compute nodes are allocated. Batch jobs are typically written as scripts, often in the user's favorite shell language, such as Bash. Here is an example `hello_world.sbatch`:

```bash
#!/bin/bash
#SBATCH --nodes=1
#SBATCH -A training2223
#SBATCH --partition booster
#SBATCH --gres gpu
#SBATCH --time=00:02:00

# Create directory if needed and navigate to it
mkdir -p /p/project1/training2223/${USER}
cd /p/project1/training2223/${USER}

srun echo "This message indicates that the job is running."
srun echo "Hello world!" > greeting.txt
```

Notes:

- `#SBATCH` are comments that Slurm will interpret as flags for where and how to run the job. You can build complex resource requirements, build multi-stage jobs etc. with these. For this exercise, it will be enough to know that you can use the lines above to run a two-minute job on one of the booster's nodes. It is straightforward to adjust the maximum runtime and the number of compute nodes.
- Within a batch script, you still should use `srun` to launch computational tasks. If your program uses MPI for parallelization, `srun` creates the required MPI processes. If not, `srun` will run your program as many times as specified by the `--ntasks` option (default: `--ntasks=1`).

To start the batch job, run
```bash
sbatch hello_world.sbatch
```

Since you will want to read the output to check for errors or be politely greeted, Slurm automatically creates a file based on the job ID called `slurm-<job-id>.out`. You can give this file your own name with the `--output` flag.

### Tasks
1. Use `sacct` and `squeue` to check the status of your submitted job.  

   *Hint:* `squeue -u $USER` will only show your jobs.
1. Check the output file `slurm-<job ID>.out` and `/p/project1/training2223/greeting.txt` to see whether the above batch script executed correctly.
1. Modify the script such that it will print the host name of the compute node into a file `hostname.txt`.
1. Add a line `sleep 60` at the end of the script. Run the job again and use `squeue` to determine if and when the job is running.
1. Cancel the job. (Take a look at the previous exercise if you cannot remember how to do this.)
1. Use `sacct -S 2020-01-01` to retrieve information about all jobs you have run since Jan 1st, 2020.

## Exercise 4: Compute environment

A supercomputer is a shared resource and therefore, it is challenging to build a compute environment that suffices scientific criteria like reproducibility. At JSC, [environment modules](https://modules.readthedocs.io/en/latest/index.html) are used to provide a modularized but consistent compute environment. Software is not installed system-wide but encapsulated in modules. Loading a module corresponds to setting a set of environment variables such that certain software is found. This allows also for concurrent versions of the same software being installed without mutual interference. Providing curated sets of environment modules is a challenging task in the administration of a supercomputer.

Modules can be loaded and unloaded with the `ml command`. Documentation can be found [here](https://modules.readthedocs.io/en/latest/ml.html). 

On top of the environment modules, it is possible to use Python virtual environments. For this course, we will continue with Jupyter notebooks and have hence created a dedicated Jupyter kernel that has all packages installed.


### Tasks

1. Use `ml` to see the list of currently loaded modules.
1. Use `ml avail` to see a list of all modules available on the system.
1. Load a module of your choice via `ml load <module name>`.

## Exercise 5: JupyterJSC

As an alternative to the terminal, JSC provides access to the system via a platform called JupyterJSC. It is especially designed for use of Jupyter notebooks. It allows you to launch JupyterLabs on the system components of your choice. Under the hood, JupyterJSC also uses Slurm to launch interactive jobs that run a JupyterLab instance and connects you to this instance.

1. Go to the [JupyterJSC page](https://jupyter-jsc.fz-juelich.de) and log in with your JuDoor username. 
1. On the top, click <kbd>+ New</kbd> to configure a new JupyterLab. 
   <img src="./images/jsc-new-button.png" width="60%" height="60%">
1. In the popup, give the JupyterLab a meaningful name and leave `Type` unchanged.
   <img src="./images/jsc-configuration-service.png" width="60%" height="60%">
1. Move to the <kbd>Options</kbd> tab and select `Partition > booster`. 
   <img src="./images/jsc-configuration-options.png" width="60%" height="60%">
1. Now, you can move to the <kbd>Resources</kbd> tab to set the resources you want to request. Pick 1 Node, 1 GPU and a runtime of 120 minutes.
   <img src="./images/jsc-configuration-resources.png" width="60%" height="60%">
1. Click <kbd>Start</kbd> and wait until your JupyterLab is launched. JupyterJSC will now request your specified resources via Slurm. You can follow the process on the screen.
   <img src="./images/jsc-configuration-start-button.png" width="60%" height="60%">

Now, you will see the launch progress of the JupyterLab.

<img src="./images/jsc-status.png" width="60%" height="60%">

Once the JuypterLab is ready, you will be connected to it. On the landing page you will see all applications that are accessible from the JupyerLab.

- In the <kbd>Notebook</kbd> section, you can launch interactive notebooks, e.g. for Python.
- The <kbd>Console</kbd> section allows you to launch interactive consoles such as an interactive Python session.
- On the bottom, in the <kbd>Other</kbd> section, you are able to launch a terminal from which you can interact with the shell of the system, e.g. to browse the file system, move files, or the like.

You may have noticed that in the <kbd>Options</kbd> tab, by default `Partition > LoginNode` is selected. In fact, JupyterJSC also allows you to launch a JupyterLab on login nodes without any time limit. You can use these to perform regular tasks on the system (e.g. via terminal) or test simple Python routines. But remember: the login nodes are not designed for heavy computations!

### Tasks

1. Using `sacct` and `squeue -u $USER` you will see your currently running interactive job that runs your JupyterLab.
1. Launch a terminal and figure out where you are located on the file system, e.g. via `ls` and `pwd`. Explore the system storage a bit. Take a look at the following paths:
   - `/p`
   - `/p/project` and `/p/project1/training2223`
   - `/p/home/jusers` and `/p/home/jusers/$USER`
   - `/p/scratch` and `/p/scratch1/training2223` 
1. In the top left, navigate to `File > New Launcher` and launch a Python console. Execute a set of commands to write a file to the file system. E.g. print a simple statement.
   1. Just use a file name to store the file.
   1. Use the path `/p/project1/training2223/<user>`
   1. Use the terminal to see where the files were stored.

### Solutions

In [None]:
# Solution to Task 3A/B
%load ./solutions/tutorial-1/task-5-3.py

In [None]:
# Solution to Task 3C
%load ./solutions/tutorial-1/task-5-3.sh