# Example of slurm job creation and submission

## Structure

1. [Importing the Job Module](#1-importing-the-job-module)

2. [Creating the Slurm Job](#2-creating-the-slurm-job)

    2.1 [Submitting Jobs to SLURM Queues on Different Machines](#21-submitting-jobs-to-slurm-queues-on-different-machines)
    
    2.2 [Creating the Job with Manual Selection of Cores, Memory, and Walltime](#22-creating-the-job-with-manual-selection-of-cores-memory-and-walltime)

    2.3 [Creating the Job with Exclusive Node Access](#23-creating-the-job-with-exclusive-node-access)

    2.4 [Creating the Job with Maximum Available Resources per Node](#24-creating-the-job-with-maximum-available-resources-per-node)

    2.5 [Redirecting the SLURM output to /any/path/you/want](#25-redirecting-the-slurm-output-to-anypathyouwant)
    
3. [Canceling the Slurm Job](#3-canceling-the-slurm-job)

    3.1 [Canceling All Jobs for a User](#31-canceling-all-jobs-for-a-user)

    3.2 [Canceling a Specific Job](#32-canceling-a-specific-job)

## 1. Importing the Job Module

The slurm module contains several functions that allow users to create and manage SLURM jobs. These functions include:

 - `squeue`: Query and display information about jobs in the SLURM queue.
 - `job`: Submit a job to the SLURM queue with various configurable parameters.
 - `max_resources_per_node`: Retrieve the maximum available resources per node for a specified machine and queue.
 - `scancel`: Cancel one or more jobs in the SLURM queue.

These functions provide a comprehensive toolkit for operating SLURM jobs efficiently.

In [1]:
from aqua.slurm import slurm

## 2. Creating the Slurm Job

### 2.1 Submitting Jobs to SLURM Queues on Different Machines

The job() function is designed to submit jobs to a SLURM queue on various machines, utilizing configurations that are predefined in a YAML configuration file (`.aqua/aqua/slurm/config-slurm.yml`). This approach allows users to seamlessly initialize and manage a SLURM cluster directly within the notebook using `dask_jobqueue.SLURMCluster`.

Users must specify the machine name if they want to use the default setup provided in the YAML configuration file. When a machine name is provided, the job() function automatically loads the corresponding configuration from the YAML file, which includes parameters such as memory, core allocation, walltime, and more.

⚠️ Warning: Currently, the pip installation does not copy the YAML configuration file to a user-accessible directory. This functionality will be updated in the future to ensure easier modification of configurations by users.

⚠️ Warning: There is no default machine name. Users need to provide a machine name to use the default setup.

In [2]:
slurm.job(machine_name='lumi')


#SBATCH -J dask-worker
#SBATCH -p small
#SBATCH -A project_465000454
#SBATCH -n 1
#SBATCH --cpus-per-task=1
#SBATCH --mem=10G
#SBATCH -t 02:30:00
#SBATCH --error=./slurm/logs/dask-worker-%j.err
#SBATCH --output=./slurm/output/dask-worker-%j.out

/users/nazarova/mambaforge/aqua/bin/python3.11 -m distributed.cli.dask_worker tcp://193.167.209.163:44587 --name dummy-name --nthreads 1 --memory-limit 9.31GiB --nanny --death-timeout 60
[0m


We can check the status of created Job in the queue using the function `squeue()`

In [3]:
slurm.squeue()

JOBID      CPUS  NODES ST         NAME                 TIME       START_TIME           DEPENDENCY           PARTITION            MIN_MEMORY          
7149698    1     1     PD         dask-worker          0:00       N/A                  (null)               small                10G                 


0

To cancel all the jobs:

In [4]:
slurm.scancel()

By default, when submitting a job on `Lumi`, the job has the following attributes:

- `exclusive=False`: The job will not request exclusive access to the node. Set to True to request exclusive access.
- `max_resources=False`: The job will not request the maximum resources available on the node. Set to True to request maximum resources.
- `cores=1`: The number of cores per socket allocated for the job.
- `memory="10 GB"`: The real memory required per node.
- `walltime="02:30:00"`: The duration for which the nodes remain allocated.
- `jobs=1`: The factor of assignment scaling across multiple nodes.
- `path_to_output="."`: The path where log, error, and output files are stored.
- `account="project_465000454"`: The project account under which the job runs.
- `queue="small"`: The name of the queue to which the job is submitted.

The YAML file also contains the default setup for the `Levante` and `Mafalda` machines, ensuring that job submissions on these machines are configured with appropriate settings.

### 2.2 Creating the Job with Manual Selection of Cores, Memory, and Walltime

For greater flexibility, users can manually specify the number of cores, memory, and walltime when creating a job. This allows customization beyond the default configurations, ensuring the job meets specific resource requirements. By adjusting these parameters, users can optimize resource usage and job performance according to their unique needs.

In [5]:
slurm.job(cores=8, memory="50 GB", queue="interactive", walltime='00:30:00', jobs=1)

Perhaps you already have a cluster running?
Hosting the HTTP server on port 37883 instead

#SBATCH -J dask-worker
#SBATCH -p interactive
#SBATCH -n 1
#SBATCH --cpus-per-task=8
#SBATCH --mem=47G
#SBATCH -t 00:30:00
#SBATCH --error=None/slurm/logs/dask-worker-%j.err
#SBATCH --output=None/slurm/output/dask-worker-%j.out

/users/nazarova/mambaforge/aqua/bin/python3.11 -m distributed.cli.dask_worker tcp://193.167.209.163:40101 --name dummy-name --nthreads 2 --memory-limit 11.64GiB --nworkers 4 --nanny --death-timeout 60
[0m


In [6]:
slurm.squeue()

JOBID      CPUS  NODES ST         NAME                 TIME       START_TIME           DEPENDENCY           PARTITION            MIN_MEMORY          
7149810    8     1     PD         dask-worker          0:00       N/A                  (null)               interactive          47G                 


0

In [7]:
slurm.scancel(loglevel="INFO")

[38;2;64;184;50m2024-05-17 17:37:15 :: slurm :: INFO     -> Cancelling all user jobs in the queue[0m


### 2.3. Creating the Job with Exclusive Node Access

The job() function includes an argument exclusive, which is set to False by default. By setting this argument to True, the job will request exclusive access to the node. This means no other jobs will be scheduled on the same node, ensuring that the entire node's resources are dedicated to your job. This option is useful for tasks that require all of a node's resources or need to avoid resource contention with other jobs.

In [8]:
slurm.job(exclusive=True, machine_name='lumi')

Perhaps you already have a cluster running?
Hosting the HTTP server on port 42125 instead

#SBATCH -J dask-worker
#SBATCH -p small
#SBATCH -A project_465000454
#SBATCH -n 1
#SBATCH --cpus-per-task=1
#SBATCH --mem=10G
#SBATCH -t 02:30:00
#SBATCH --error=./slurm/logs/dask-worker-%j.err
#SBATCH --output=./slurm/output/dask-worker-%j.out
#SBATCH --get-user-env
#SBATCH --exclusive

/users/nazarova/mambaforge/aqua/bin/python3.11 -m distributed.cli.dask_worker tcp://193.167.209.163:42349 --name dummy-name --nthreads 1 --memory-limit 9.31GiB --nanny --death-timeout 60
[0m


In [9]:
slurm.squeue()

JOBID      CPUS  NODES ST         NAME                 TIME       START_TIME           DEPENDENCY           PARTITION            MIN_MEMORY          


0

In [10]:
slurm.scancel(Job_ID=7148312, loglevel="INFO")

[38;2;64;184;50m2024-05-17 17:37:16 :: slurm :: INFO     -> Cancelling the job with ID: 7148312[0m


⚠️ Warning: The `exclusive` argument DOES NOT automatically provide us the maximum available memory, number of cores, and walltime!
It only provide you the exclusive usage of the node, meaning that no other job can run at the same time on the same node.

In [11]:
slurm.job(exclusive=True, cores=32, memory="200 GB", queue = "interactive", walltime='8:00:00', jobs=1)

Perhaps you already have a cluster running?
Hosting the HTTP server on port 43051 instead

#SBATCH -J dask-worker
#SBATCH -p interactive
#SBATCH -n 1
#SBATCH --cpus-per-task=32
#SBATCH --mem=187G
#SBATCH -t 8:00:00
#SBATCH --error=None/slurm/logs/dask-worker-%j.err
#SBATCH --output=None/slurm/output/dask-worker-%j.out
#SBATCH --get-user-env
#SBATCH --exclusive

/users/nazarova/mambaforge/aqua/bin/python3.11 -m distributed.cli.dask_worker tcp://193.167.209.163:41007 --name dummy-name --nthreads 4 --memory-limit 23.28GiB --nworkers 8 --nanny --death-timeout 60
[0m


In [12]:
slurm.squeue()

JOBID      CPUS  NODES ST         NAME                 TIME       START_TIME           DEPENDENCY           PARTITION            MIN_MEMORY          
7149811    1     1     PD         dask-worker          0:00       N/A                  (null)               small                10G                 


0

In [13]:
slurm.scancel()

### 2.4. Creating the Job with Maximum Available Resources per Node

With the argument `max_resources_per_node=True`, the function `max_resources_per_node()` automatically extracts the following information about the resources in any queue:

 - Size of memory per node in Gigabytes: `max_memory`, 

 - Maximum time for any job in the format "days-hours:minutes:seconds: `max_walltime`

 - Number of CPUs per node: `max_cpus`, 

 - Number of sockets per node: `max_sockets`

 - Number of cores per socket: `max_cores`, 
 
 - Number of threads per core: `max_threads`

The following call reads the maximum available resources for the queue specified in the YAML configuration file for the machine `Lumi`:

In [14]:
slurm.max_resources_per_node(machine_name='lumi')

('99.19921875 GB', '3-00:00:00', '128', '2', '64', '2')

In [15]:
slurm.max_resources_per_node(queue='small')

('99.19921875 GB', '3-00:00:00', '128', '2', '64', '2')

The function can be also used alone to have info about a specific queue/partition.
For example:

In [16]:
slurm.max_resources_per_node(queue='interactive')

('224.0 GB', '8:00:00', '128', '2', '64', '2')

The job() function includes an argument max_resources_per_node, which is set to False by default. By setting this argument to True, the job will utilize the maximum available resources per node, as allowed by the selected queue or partition. This ensures that the job leverages the full capacity of the node, maximizing the number of cores, memory, and walltime allocated to the job. This option is ideal for tasks that require extensive computational resources and need to maximize performance.

In [17]:
slurm.job(max_resources=True, machine_name='lumi')

Perhaps you already have a cluster running?
Hosting the HTTP server on port 44451 instead

#SBATCH -J dask-worker
#SBATCH -p small
#SBATCH -A project_465000454
#SBATCH -n 1
#SBATCH --cpus-per-task=128
#SBATCH --mem=93G
#SBATCH -t 3-00:00:00
#SBATCH --error=./slurm/logs/dask-worker-%j.err
#SBATCH --output=./slurm/output/dask-worker-%j.out

/users/nazarova/mambaforge/aqua/bin/python3.11 -m distributed.cli.dask_worker tcp://193.167.209.163:36743 --name dummy-name --nthreads 8 --memory-limit 5.77GiB --nworkers 16 --nanny --death-timeout 60
[0m


In [18]:
slurm.squeue()

JOBID      CPUS  NODES ST         NAME                 TIME       START_TIME           DEPENDENCY           PARTITION            MIN_MEMORY          


0

In [19]:
slurm.scancel()

### 2.5 Redirecting the SLURM output to `/any/path/you/want`

Slurm Job writes by default
    - the errors into `./slurm/logs` directory 
    - the output into `./slurm/output/` directory

If folders `/slurm`,  `/slurm/output`,  `/slurm/logs` do not exist, the function will create them automatically. 

The user can specify a custom path to redirect the SLURM outputs with the `path_to_output` option:

In [None]:
slurm.job(path_to_output='/any/path/you/want', machine_name='lumi')

## 3. Canceling the Slurm Job

### 3.1 Canceling All Jobs for a User

In [22]:
slurm.scancel()

### 3.2 Canceling a Specific Job

Knowing the Job_ID, you can cancel your Job in the queue. For exaple, you can find your Job_ID using the function `squeue().` 

In [23]:
Job_ID = 4929434
slurm.scancel(Job_ID)

Checking the status of Jobs

In [24]:
slurm.squeue()

JOBID      CPUS  NODES ST         NAME                 TIME       START_TIME           DEPENDENCY           PARTITION            MIN_MEMORY          


0