# Example of slurm job creation and submission (on levante)

## Structure

0. [Install the following:]()

    - [dask]() (conda)

    - [dask_jobqueue]() (pip)

    - [os, re]()

1. [Importing the slurm_job module](#1-import-the-slurm_job-module)

2. [Creating the Slurm Job on LEVANTE](#2-creating-the-slurm-job-on-levante)

    2.1 [Creating the Job with manual selection of cores, memory, and walltime](#21-creating-the-job-with-manual-selection-of-cores-memory-and-walltime)

    2.2 [Creating the Job with exclusive node access](#22-creating-the-job-with-exclusive-node-access)

    2.3 [Creating the Job with maxumum availibale resources per the node](#23-creating-the-job-with-maxumum-availibale-resources-per-the-node)

    2.4 [Redirecting the SLURM output to /any/path/you/want](#24-redirecting-the-slurm-output-to-anypathyouwant)

3. [Creating and Submitting the Job to the SLURM queue on Lumi](#3-creating-and-submitting-the-job-to-the-slurm-queue-on-lumi)
    
4. [Canceling the Slurm Job](#4-canceling-the-slurm-job)

    3.1 [Canceling all jobs of the user](#41-cancelling-all-jobs-of-user)

    3.2 [Canceling the specific Job](#42-canceling-specific-job)

## 1. Importing the slurm_job module

##### The `slurm_job` module contains the following function `squeue`, `slurm_job`, `max_resources_per_node`, `scancel`,  which allows us to create and operate the Slurm Job

In [1]:
from slurm import  max_resources_per_node, squeue, slurm_job, scancel, output_dir

## 2. Creating the Slurm Job on LEVANTE

### 2.1 Creating the Job with manual selection of cores, memory, and walltime

In [17]:
slurm_job()

#!/usr/bin/env bash

#SBATCH -J dask-worker
#SBATCH -p compute
#SBATCH -A bb1153
#SBATCH -n 1
#SBATCH --cpus-per-task=1
#SBATCH --mem=10G
#SBATCH -t 02:30:00
#SBATCH --error=./slurm/logs/dask-worker-%j.err
#SBATCH --output=./slurm/output/dask-worker-%j.out

/home/b/b382267/mambaforge/envs/tropical-rainfall/bin/python -m distributed.cli.dask_worker tcp://136.172.124.6:35899 --nthreads 1 --memory-limit 9.31GiB --name dummy-name --nanny --death-timeout 60



##### We can check the status of created Job in the queue using the function `squeue()`

In [18]:
squeue()

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           4945327   compute dask-wor  b382267 PD       0:00      1 (Priority)


0

##### By default, the Job has the following attributes:
 - cores    = 1, 
 - memory   = "10 GB", 
 - queue    = "compute", 
 - walltime = '02:30:00', 
 - jobs     = 1

 
##### If you want to use a different amount of cores, memory, wall time, jobs, or a different queue, you can specify it as an argument of function:

In [None]:
slurm_job(cores=8, memory="50 GB", queue = "interactive", walltime='00:30:00', jobs=1)

### 2.2. Creating the Job with exclusive node access

##### The function has an argument `exclusive`, which is False by default.  If we set the argument to True, we will get exclusive access to the node.


In [None]:
slurm_job(exclusive=True)

### Important! 

##### The exclusive argument DOES NOT automatically provide us the maximum available memory, number of cores, and walltime! 

In [None]:
slurm_job(exclusive=True, cores=256, memory="500 GB", queue = "interactive", walltime='12:00:00', jobs=1)

### 2.3. Creating the Job with maxumum availibale resources per the node

##### The function has an argument `max_resources_per_node`, which is False by default. If we set the argument to True, the number of cores, memory, and walltime will equal the maximum number of cores, memory, and walltime!

In [2]:
slurm_job(max_resources=True)

#!/usr/bin/env bash

#SBATCH -J dask-worker
#SBATCH -p compute
#SBATCH -A bb1153
#SBATCH -n 1
#SBATCH --cpus-per-task=256
#SBATCH --mem=235G
#SBATCH -t 8:00:00
#SBATCH --error=./slurm/logs/dask-worker-%j.err
#SBATCH --output=./slurm/output/dask-worker-%j.out

/home/b/b382267/mambaforge/envs/tropical-rainfall/bin/python -m distributed.cli.dask_worker tcp://136.172.124.6:45169 --nthreads 16 --nworkers 16 --memory-limit 14.63GiB --name dummy-name --nanny --death-timeout 60



##### With the argument `max_resources_per_node=True`, the function `max_resources_per_node()` automatically extracts the following information about the resources in any queue:

 - Size of memory per node in Gigabytes: `max_memory`, 

 - Maximum time for any job in the format "days-hours:minutes:seconds: `max_walltime`

 - Number of CPUs per node: `max_cpus`, 

 - Number of sockets per node: `max_sockets`

 - Number of cores per socket: `max_cores`, 
 
 - Number of threads per core: `max_threads`

##### For example, 

In [23]:
max_resources_per_node('compute')

('251.3671875 GB', '8:00:00', '256', '8', '16', '2')

In [24]:
max_resources_per_node('interactive')

('502.9296875 GB', '12:00:00', '256', '8', '16', '2')

### 2.4 Redirecting the SLURM output to /any/path/you/want

##### Slurm Job writing by default
    - the errors into `./slurm/logs` directory 
    - the output into `./slurm/output/` directory

##### If folders `/slurm`,  `/slurm/output`,  `/slurm/logs` do not exist, the function will create them automatically. 

##### But user can specify the path to redirect the SLURM output from the currect folder to `/any/other/folder/` 

In [None]:
slurm_job(path_to_output='/any/path/you/want')

## 3. Creating and Submitting the Job to the SLURM queue on Lumi (Under development)

##### You need to change the project name if you want to create the Slurm Job on Lumi with the function `slurm_job.` Currently, it  is `account="bb1153"`

In [None]:
slurm_job(account=='Your_Lumi_account_name')

## 4. Canceling the Slurm Job

### 4.1 Cancelling all jobs of user

In [4]:
scancel()

### 4.2 Canceling specific job

##### Knowing the Job_ID, you can cancel your Job in the queue. For exaple, you can find your Job_ID using the function `squeue().` 

In [5]:
Job_ID = 4929434
scancel(Job_ID)

##### Checking the status of canceled Job

In [3]:
squeue()

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           4945492   compute dask-wor  b382267  R       0:08      1 l40358


0