# Cluster Computing with NERSC - Tutorial

The goal of this notebook is to introduce NERSC and help everyone to learn to work in it. This means that you will get familiar with NERSC's module and queues systems, storage policies, setting up an enviroment, work with jupyter and set a job to the different the different queues.

This notebook will be brief, but you can look at NERCS's documentation [here](https://docs.nersc.gov).

## Storage systems

NERSC works with different file systems for multiple purposes. Usually, you will be working in either your ```$HOME``` directory  or ```$SCRATCH```. Further information about the storage systems is found [here](https://docs.nersc.gov/filesystems/).

- **HOME**: is a permanent storage system with small capacity (40 GB) recommended for code, scripts and stuff you may want to keep. 

- **SCRACTH** is a large capacity system (20 TB) intended for temporary storage since it has a 12 week purge limit for files that have not been read!.

For the moment lets work in SCRATCH.

```
$ cd $SCRATCH
```

## Modules in NERSC

Working on a cluster usually means working with modular systems, NERSC is no exception. The main advantage of this is that you can manage multiple softwares without breaking something (or not so easy!) and work with multiple versions of these softwares too. 

Let's see what modules you can use.

By default you have some modules loaded when logging into your accout, you can check this by typing in your terminal:

```
$ module list
```
 
But, what other modules can you use in NERSC? There's lots of them! 

You can check all the available modules with

```
$ module avail
```

How do I use an specific module? And what does it do? 
The latter can be checked with ```module show``` command: e.g

```
$ module show python
```

The former can be done with ```module load```.


Lets load the python module:

```
$ module load python
```

You can check with ```module list``` that you now have an additional module.

You can find further information about modules at nersc [here](https://docs.nersc.gov/environment/modules/)

## Setting up a python enviroment in NERSC

Now that we have python's module loaded lets create an enviroment. 


This is recommended since some codes have conflicts with some specific library versions. To avoid this you can create an enviroment for an specific task you may have. For further information about enviroment manging visit [this](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) page.

Lets say that you need to use Python 3.7 and numpy 1.19.0 specifically and any astropy version for a task. This can be done with 

```
$ conda create -n myenv python=3.7 numpy=1.19.0 astropy
```

Before activating your enviroment type ```pip list```, just to see that your enviroment actually has different stuff than the default NERSC's python enviroment.

Now do the following:

```
$ source activate mynev
$ pip list
```

Now you are working in your specific enviroment! You should notice this by looking at your enviroment's name at the left of your terminal e.g:  ```(myenv)$```.

You can exit this enviroment by typing ```conda deactivate```

NOTE: Everytime you want to work in your enviroment you should do the following:

```
$ module load python
$ source activate myenv
```

Sometimes your enviroment requires multiple instructions and might be difficult to remember every step to set it up. You can avoid this struggle by creating a shell script and source it everytime you want to work in this specific enviroment. For example, lets use the shell file aditional to this notebook:

```
$ source <path/to/script>/load_myenv.sh
```

This shell file can contain whatever you want! Export paths, definitions, multiple module loading, etc.

## Working in Jupyter at NERSC

So what if I want to work from a notebook remotely? Fortunatelly we have Jupyter at NERSC! Just go to https://jupyter.nersc.gov/hub/home and log into your account.

But what about an specific or custom kernel? This is a bit tricky so lets work on that!

First, create a shell file with the specifications to your enviroment. An example is added to this notebook ```myenv.sh```. This file should be an executable. To do this simply run:
```
$ chmod u+x myenv.sh
```

Then, create a folder for your enviroment in the kernels in jupyter:

```
$ cd $HOME/.local/share/jupyter/kernels/
$ mkdir myenv
$ cd myenv
```

Finally, create a json file for your kernel. This is usually done with ```vi kernel.sh``` and writing (or copying) the whole thing. But you can just copy the file attached to this notebook. **IMPORTANT:** in this file the second line should point to the ```myenv.sh``` file you created.

**NOTE:** This is done just once per kernel.

Lets see we are using our enviroment!

In [1]:
import numpy
print(numpy.__version__)

1.19.0


It works!!!

## Using DESICODE modules

If you want to use desicode in a FRESH NEW terminal type the following:

```
$ source /project/projectdirs/desi/software/desi_environment.sh master
```

The last part of this line can vary depending on the version of the desicode you want to use, ```master``` uses the current status of the libraries up to date, you can load stable versions, the latest is ```21.5```.

To create a Jupyter kernel symply use the folllowing:

```
$ source /project/projectdirs/desi/software/activate_desi_jupyter.sh master
```
Once again you can substitute master with any stable version.


**NOTE:** activating the jupyter enviroment for DESI should be done only ONCE per version you want to install. I recommend master and any stable version you like.

You may want to create a custum DESI enviroment, however this is a quite specific task, so we refer to the [wiki page](https://desi.lbl.gov/trac/wiki/Pipeline/GettingStarted/NERSC#CustomizingYourDESIPythonEnvironment) that helps you with this!

# Submitting Jobs

Now that we have set up our working enviroments (or where you will most likely be working) is time to send a job to NERSC.

## NERSC queue policy

NERSC has different queues with some restrictions according with the requirements of the user. The usual queues you will be working (unless some special requirement) are:

- **Regular**: the standard queue for jobs, has a limit in computing time of 48 hours and a maximum of 5000 submitted jobs without run limit. There is no maximum requested nodes limit meaning that you can use up to 1932 for Haswell and 9489 for KNL. Click [here](https://docs.nersc.gov/systems/cori/) for more information about Haswell and KNL.

- **Interactive**: Usually used for code development, test, debug and analysis. Has a time limit of 4 hours, can run up to 2 simultaneous jobs and this is also the limit of submitted jobs. There is a maximum requested node number of 64. Jobs in interactive should be run via salloc or srun (we will get to this). 

- **Debug**: Used for code developement, test and debugging. Has a time limit of 30 minutes, can run up to 2 simultaneous jobs with a 5 submitted jobs limit, and 64 requested nodes limit.

For other queues you can check [here](https://docs.nersc.gov/jobs/policy/).

## Submitting jobs with salloc or srun

You can run jobs directly from command line with ```salloc``` or ```srun```. 

The command ```salloc```is used to allocate resources and work interactively, this is commmonly used with the interactive queue.

While ```srun```is used to submit a single job, this instruction can be used inside any batch script, during an allocation in interactive queue or directly from command line.

Aditionally to these commands you should use options for the job, this includes the number of nodes, the queue, time and computing system to be used, however there are multiple options that can be set up, these can be found [here](https://docs.nersc.gov/jobs/#commonly-used-options)

An example of an salloc command is the following:

```
$ salloc -A desi -C haswell -q interactive -t 00:10:00 -N 1
```

In this command we are requesting 10 minutes of one node of haswell in the interactive queue, you can change haswell for knl, ask up to 64 nodes and up to 4 hours. This will prompt a interactive node in your terminal. You should notice it when your terminal has a ```@nid``` number right to your user. Once you are have allocated resources you can run a job inside this interactive session with ```srun```.


```
$ srun python test_script.py
```

A thing you should know is that ```srun```inherits the properties from salloc and can add more for example if you ask for 3 nodes with salloc ```-N 3``` you distribute different jobs between them by writing ```srun -N 1 ```. 

Also ```salloc``` uses the enviroment you are currently working in your terminal when calling it. But you can also load modules or enviroments inside the allocated node.

If your job finishes before your allocation time runs out just type ```exit```.

Equivalently we could have run ```srun```directly from our command line. 


```
$ srun -A desi -C haswell -q interactive -t 00:10:00 -N 1 python test_script.py
```

You can change the interactive queue for whatever your needs are.

## Submitting jobs with SBATCH

The last piece of our tutorial is to submit jobs via sbatch, this is done via a bash script like the one attached to this notebook ``èxample_batch.batch```. It must include the options you want with a ```#SBATCH``` flag. Followed by the instructions you want to run. This file can contain anything after the SBATCH flags, like loading modules, calculations via another script, srun commands, for iterations, etc.

To run it simply type:

```
$ sbatch example_batch.batch
```