## Submitting a Slurm Job

Now that you created the training data set you are ready to finally train your network. For this you need a lot of GPU power. Therefore, all these parts are down on the HPC of the University. This server is managed by a framework called “Slurm”

(https://slurm.schedmd.com/documentation.html). In simplified terms: It manages the different requests for processing time on the CPU as well as the GPU cluster. You submit a job that is put into a queue. Afterwards you can log out and check later that day or the next day if your job has already finished. You can interact with it through the terminal on the server or through the jupyter notebook. For the latter option you will need to install slurm-magic (https://github.com/NERSC/slurm-magic). 

To do that connect to the server via PuTTY/Terminal and within your base environment type:

`pip install git+https://github.com/NERSC/slurm-magic.git`

Note: For whatever reason (I do not have any idea why) the following steps need to be done within your base environment and NOT within the DEEPLABCUT environment. If you are still in your DEEPLABCUT environment type: conda activate base

Before you can execute this batch script you need to transfer the py_scripts folder from the tutorial directory to the server, the same way you did with the project folder (i.e. with WinSCP/Cyberduck)

The submitting of jobs is done by executing batch scripts. We will do that from within the jupyter notebook. these will look something like that:

Important: Go to the py_script folder and open the training_script.py with a text editor (can be done wihtin WinSCP/Cyberduck).

It should look like that:

`import deeplabcut
 ProjectFolderName = 'OpenField-Wallhorn-2022-08-15'
 path_config_file = '/home/wallhorn/DLC_Projects/top_view_08_2022-Wallhorn-2022-08-11/config.yaml'
 deeplabcut.train_network(path_config_file, displayiters=1000,saveiters=50000, maxiters=500000)`
 
 change the path of the config file to the path of your config file.

In [None]:
%load_ext slurm_magic

First you should check how many other Jobs there are waiting to be processed. 

In [None]:
%squeue

Now you are almost ready to run the batch script. But first you have to make sure the script knows where the py-script is located. Check if this path is correct:

`cd /home/wallhorn/DLC_Projects/py_scripts`

And check if this runs the correct script:

`srun python3 training_script.py`


Also look at this line: 

`#SBATCH --output=/home/wallhorn/DLC_Projects/job_reports/%j.out`

This is the path where Slurm is saving the job report. Edit this to a path of your chosing. It is important to check these reports, to see if everything worked. 

In [None]:
%%sbatch
#!/bin/bash
#
#SBATCH --job-name=DLC_training
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=48
#SBATCH --output=/home/wallhorn/DLC_Projects/job_reports/%j.out


module load cuda
module load anaconda

source /hpc/opt/apps/Anaconda/3-2021.11/etc/profile.d/conda.sh
conda init bash

conda activate DEEPLABCUT

export LD_LIBRARY_PATH="$CONDA_PREFIX/lib"
cd /home/wallhorn/DLC_Projects/py_scripts

srun python3 training_script.py


Now your job is submitted and you can check if it is in the queue by executing the following cell. You can also go the folder where the job reports are being saved and open the job report with a text editor.

In [None]:
%squeue

When your job is done continue with DLC_Demo_Notebook_4