# Running SLURM jobs from a notebook

One possible use case of Jupyter Notebooks in an HPC environment is to manage SLURM jobs and to monitor/visualize results from running jobs.  

In this lesson, we will have a look at **Slurm magics** to manage jobs and **interactive analysis** of running jobs.

## Contents

- [SLURM magics](#SLURM-magics)
    - [Using SLURM magics](#Using-SLURM-magics)
- [Submitting and analyzing jobs](#Submitting-a-job-and-analyzing-results-on-the-fly)
    - [GROMACS as an example](#GROMACS-as-an-example)
    - [<font color="red"> Exercise 2.1](#exercise21)

## SLURM magics

- Developed at [NERSC](http://www.nersc.gov/) (sources [here](https://github.com/NERSC/slurm-magic))
- Implements Jupyter magic commands for interacting with the SLURM workload manager
- Commands are spawned via `subprocess` and output captured in the notebook
- Arguments accepted by a SLURM command are also accepted by the corresponding magic command

> *"I’ll never have to leave a notebook again, that’s like the ultimate dream"*  
> (Anonymous SLURM-magic user)

### Using SLURM magics

The Python package ``slurm-magic`` is available in the ``prace`` environment:

```bash
$ module load anaconda/py36/5.0.1
$ source activate prace
$ jupyter-notebook --no-browser --port=<port> --ip=<ip>
```

In the notebook, we then need to load the IPython extension: 

In [None]:
%load_ext slurm_magic

We can check the newly added magics provided by ``slurm-magic``:

In [None]:
%lsmagic

and try them out:

In [None]:
%squeue

In [None]:
%sinfo

## Submitting a job and analyzing results on the fly

### GROMACS as an example

[GROMACS](http://www.gromacs.org/) is a molecular dynamics simulation package designed for simulations of biological macromolecules (proteins, lipids, nucleic acids, etc.).

In this exercise we use [lysozyme in water](http://www.mdtutorials.com/gmx/lysozyme/index.html) as a model system to demonstrate how to use Jupyter notebook to submit jobs and analyze results.

First, go to the ``gromacs_job`` folder

In [None]:
%cd gromacs_job

and check that the input files (.mdp, .top and .gro) are in the folder

In [None]:
%ls

Then, use the ``%%sbatch`` cell magic to submit a GROMACS job

In [None]:
%%sbatch
#!/bin/bash -l
#SBATCH -A snic2018-3-161
#SBATCH -N 1
#SBATCH -t 00:15:00
#SBATCH -J gromacs
module load GROMACS/2018.1-nsc2-gcc-2018a-eb
gmx grompp -f npt.mdp -c start.gro -p topol.top
gmx_mpi mdrun -s topol.tpr -deffnm npt

Monitor your job with the ``%squeue`` line magic

In [None]:
%squeue -u x_thowi

As the simulation goes on, the output files will be constantly updated. You can start to analyze the output files and monitor the progress of the simulation.

We have prepared a Python module ``gmx_util`` that provides some easy-to-use functions for analysis. To use the module, type

In [None]:
import gmx_util as gu

and read the documentation of the module via

In [None]:
help(gu)

Now we are ready to do some analysis. Note that the GROMACS module is not loaded yet, and we need to do the following to add GROMACS executables to the $PATH environment variable.  
We first inspect the Gromacs module:

In [None]:
!module show GROMACS/2018.1-nsc2-gcc-2018a-eb

We see that the root directory for this Gromacs module is `/pdc/vol/gromacs/2018.3/amd64_co7/haswell_openmpi/`. We use this path with the `load_gmx()` function:

In [None]:
gmx_root = "/software/sse/easybuild/prefix/software/GROMACS/2018.1-foss-2018a-nsc2"
gu.load_gmx(gmx_root)

Now import matplotlib

In [None]:
import matplotlib.pyplot as plt

and examine the evolution of density with respect to simulation time

In [None]:
time,dens = gu.get_prop("Density","npt")
plt.plot(time,dens)

Note that the default unit is kg/m<sup>3</sup> for density and ps for simulationt time. You may improve the plot by adding ``xlabel``, ``ylabel``, etc.

In [None]:
plt.xlabel('Simulation time [ps]')
plt.ylabel('Density [kg/m$^3$]')
plt.plot(time,dens)

We can also examine the evolution of pressure with respect to time

In [None]:
time,pres = gu.get_prop("Pressure","npt")
plt.plot(time,pres)

Also look at the correlation between density and pressure

In [None]:
plt.plot(dens,pres[:len(dens)],'b+')

Since we are simulating lysozyme in water, we can monitor the root-mean-square deviation (RMSD) of protein

In [None]:
time,rmsd = gu.get_rmsd("Backbone","npt")
plt.plot(time,rmsd)

In many cases it is of interest to analyze the coordinates of protein and surrounding solvent molecules. To do that, we first convert the binary trajectory file into the protein data bank (PDB) format

In [None]:
gu.get_pdb("System","npt")

Then we extract a frame at e.g. 10 ps, and print some information about the atoms and residues

In [None]:
atoms = gu.read_pdb(10, "npt")
print("Number of atoms:", len(atoms))
residues = list(set([a.resname for a in atoms]))
print("Number residue types:", len(residues))
print("Residue types:", residues)

Below is an example code for computing the shortest distance between protein and the first 30 water molecules

In [None]:
import math

pro_atoms = []
wat_atoms = []
for a in atoms:
    if a.resname == "SOL":
        wat_atoms.append(a)
    elif a.resname == " CL":
        pass
    else:
        pro_atoms.append(a)

min_r2 = 1e+99
for a in pro_atoms:
    for b in wat_atoms[:90]:
        dx = a.x - b.x
        dy = a.y - b.y
        dz = a.z - b.z
        r2 = dx**2 + dy**2 + dz**2
        if min_r2 > r2:
            min_r2 = r2

print(math.sqrt(min_r2))

<a id='exercise21'></a>

### <font color="red"> Exercise 2.1

In this exercise, you will compile the hello-world MPI code, submit a batch job and have a look at the output. 

Try to do all the steps below from within this notebook:

1. Start by creating a new directory called `hello-world` under the `jupyter-notebook` directory (you may need `%cd ..` first), and `cd` into it.
2. Copy-paste the hello-world MPI code in C from [the HPC-Intro lesson](https://pdc-support.github.io/hpc-intro/08-compiling/#mpi-parallelized-code) into a code cell (**don't execute it yet**).
3. Add the `%%writefile hello_mpi.c` cell magic command at the top of the cell, and execute it.
4. Check that you have indeed created the file in the right directory (`%pwd` and `%ls` are your friends).
5. Compile the code using `mpicc -o hello_mpi hello_mpi.c`. Check that the executable has been created.
6. Write a new batch script in a cell (or copy-paste the cell from the [Gromacs section above](#GROMACS-as-an-example) using `c` and `v`). It should: 
    - request 1 node for 5 minutes using the edu18.prace allocation
    - load the `gcc/7.2.0` and `openmpi/3.0-gcc-7.2` modules 
    - execute your executable using `mpirun -n 24 ./hello_mpi > hello.out`
7. Submit the job using the `%%sbatch` magic, monitor the job using `squeue -u <username>` and inspect the output file. 