# Slurm REST API Guide
Slurm provides a REST API alongside the CLI, useful for integrating with other software. This API uses the JSON Web Token mechanism to authenticate, which is very popular in web applications, even supported by various cloud like AWS. In this guide i will demonstrate how to interact with this API using Python. 

## JWT authentication
Enable this feature, make sure it is enabled when slurm is compiled, and provide the necessary parameter in slurm.conf and slurmdbd.conf  
Here we demonstrate the use of a RS256 key pair. 
```
AuthAltTypes=auth/jwt
AuthAltParameters=jwks=/etc/slurm/jwks.pub.json
```
In AUthAltParameters, use jwks to point to the json file that contains the public key of the signing key.  
A valid key pair can be generated using this [json-web-key-generator](https://github.com/bspk/json-web-key-generator) command, or this [mkjwk](https://mkjwk.org/) website by the same author. 
```
java -jar /build/target/json-web-key-generator-0.9-SNAPSHOT-jar-with-dependencies.jar --type RSA --size 2048 --algorithm RS256 --idGenerator sha1 --keySet --output /jwks.json --pubKeyOutput /jwks.pub.json
```  
In this slurm-lab setup, the key pair is stored in JSON web key set /etc/slurm/jwks.json, and the public key is stored in /etc/slurm/jwks.pub.json. Public key is supplied to slurm, and the key set is used to signed the request. The following cell shows a function to generate the http request header that contains the necessary jwt auth token. 

In [None]:
import time
import json
import jwt
import requests
import getpass
import os
import io
import getent
import pandas as pd
jwks_path="/etc/slurm/jwks.json"
jwks=json.load(open(jwks_path))

def get_auth_header(username=getpass.getuser(), expire=1800, private_key=jwks["keys"][0]):
    iat = round(time.time())
    exp = iat + expire
    userinfo = dict(getent.passwd(username))
    payload={
        "iat": iat, # (required) token issue time
        "exp": exp, # (required) token expire time
        "username": username, # (required) username or sun
        "homedir": userinfo["dir"],
        "uid": userinfo["uid"],
        "gid": userinfo["gid"]
    }
    return {
        "X-SLURM-USER-NAME": username, # this should match the username/sun attribute in jwt's payload
        "X-SLURM-USER-TOKEN": jwt.encode(
            payload,
            jwt.algorithms.RSAAlgorithm.from_jwk(json.dumps(private_key)),
            algorithm="RS256",
            headers={
                "kid": private_key["kid"] # key id is necessary for slurm daemon to select the corresponding public key for verification.
            }
        )
    }
print(json.dumps(get_auth_header(username="root"), indent=2))

## Test API endpoint connection
By default the API is available at port 6820 of the host running slurmrestd. In this lab environment slurmrestd is running on the 2 slurm master node `slurm-lab-master-[1,2]`, the nginx on this frontend container act as a reverse to the 2 endpoint such that the api is available at port 80. 

In [None]:
res = requests.get("http://127.0.0.1/slurm/v0.0.40/ping", headers=get_auth_header())
print(json.dumps(res.json(), indent=2))

API reference can be found here: [Slurm REST API](/doc/rest_api.html). If you are planning to use openapi client generator, the specification can be queried from the endpoint

In [None]:
res = requests.get("http://127.0.0.1/openapi",headers=get_auth_header())
openapi_spec = res.json() # the spec is too big to be shown in this notebook

## Job Submission, Query, Delete
* Query jobs status (squeue): [GET /slurm/v0.0.40/jobs](/doc/rest_api.html#slurmV0040GetJobs)  
* Query specific job (squeue): [GET /slurm/v0.0.40/job/{job_id}](/doc/rest_api.html#slurmV0040GetJob)  
* Submit job to cluster (sbatch): [POST /slurm/v0.0.40/job/submit](/doc/rest_api.html#slurmV0040PostJobSubmit)
* Delete jobs (scancel): [DELETE /slurm/v0.0.40/jobs](/doc/rest_api.html#slurmV0040DeleteJobs)
* Delete job by id (scancel): [DELETE /slurm/v0.0.40/job/{job_id}](http://localhost/doc/rest_api.html#slurmV0040DeleteJob)

### Your first job submitted via API

To submit a job, we send a POST request to the endpoint `/slurm/v0.0.40/job/submit`. The following example is a minimal stuff you need to submit a job via the API.  
1. To submit a single job, you put all the config under the "job" key 
2. script, as a string, is required
3. current_working_directory is required. In older version of slurmrestd, if a job is submitted without this entry, the job will take the work directory of slurmrestd as its work directory
4. environment is required, as a list of string. You can define custom environment variables, like using "sbatch --export" option. Note that even if you are not defining any custom variables, you need still need this variable, the job submission request will fail otherwise.
  
If job is submitted successfully, you should be able to get the job_id from the result.

In [None]:
# Job submission 
with io.open('./helloworld.sh', 'r', encoding='utf-8') as file:
    req_body = {
        "job": {
            "script": file.read(),
            "current_working_directory": os.getcwd(),
            "environment": [""] # this stupid dummy line is REQUIRED
        }
    }
    res = requests.post("http://127.0.0.1/slurm/v0.0.40/job/submit", json=req_body, headers=get_auth_header())
    submission_result = res.json()
    print(json.dumps(submission_result, indent=2))

Using this job id we get from the submission we can get query more detail about the job. A GET request to path `/slurm/v0.0.40/job/{jobid}` gives you the detail of the job specified by the job id

In [None]:
res = requests.get(f"http://127.0.0.1/slurm/v0.0.40/job/{submission_result['job_id']}", headers=get_auth_header())
print(json.dumps(res.json(), indent=2))

To remove this job, make a DELETE request to the same path.

In [None]:
res = requests.delete(f"http://127.0.0.1/slurm/v0.0.40/job/{submission_result['job_id']}", headers=get_auth_header())
print(json.dumps(res.json(), indent=2))

### More complex jobs
Now let's try some more complex jobs. We will reuse the mpi-pi example. One different from the MPI Guide notebook is this notebook run a python kernel, so we need to move the build process to a job. 

In [None]:
build_script = """#!/bin/bash -l
module avail
module load mpi
module list
make --directory mpi-pi
"""
build_job = {
    "job":{
        "name": "build-mpi-pi",
        "script": build_script,
        "current_working_directory": os.getcwd(),
        "environment": [""],
    }
}

# submit build_job 
res = requests.post("http://127.0.0.1/slurm/v0.0.40/job/submit", json=build_job, headers=get_auth_header())
build_job_submit_res = res.json()
print(json.dumps(build_job_submit_res, indent=2))

run_script = """#!/bin/bash -l
module load mpi
mpirun mpi-pi/parallel-pi
"""
run_job = {
    "job": {
        "name": "run-mpi-pi",
        "script": run_script,
        "current_working_directory": os.getcwd(),
        "environment": [""],
        "dependency": f"afterany:{build_job_submit_res['job_id']}",
        "minimum_nodes": 2,
        "tasks_per_node": 2
    }
}
# submit 4 run_job
for i in range(4):
    res = requests.post("http://127.0.0.1/slurm/v0.0.40/job/submit", json=run_job, headers=get_auth_header())
    run_job_submit_res = res.json()
    print(json.dumps(run_job_submit_res, indent=2))

We can get a list of all the jobs and their details in these endpoints
* GET /slurm/v0.0.40/jobs
* GET /slurm/v0.0.40/jobs/state

In [None]:
res = requests.get("http://127.0.0.1/slurm/v0.0.40/jobs/state", headers=get_auth_header())
# print( json.dumps( res.json()["jobs"], indent=2 ) )
pd.DataFrame.from_records(res.json()["jobs"]) 

In [None]:
res = requests.get("http://127.0.0.1/slurm/v0.0.40/jobs", headers=get_auth_header())
# print( json.dumps( res.json()["jobs"], indent=2 ) )
table = [
    dict([
        (key, value)
        for key, value in job.items()
        if key in ["job_id", "dependency", "group_name", "user_name", "job_state", "tres_alloc_req", "tres_alloc_str", "partition", "name", "nodes", "resv_name", "standard_output", "time_limit"]
    ])
    for job in res.json()["jobs"]
]
pd.DataFrame.from_records(table) 

## Manage nodes and partition

### List partitions and nodes
You can list partition details using these 2 endpoints. The response from these endpoint, are the same structure, but the second one contains only the partition you specified. 
* [GET /slurm/v0.0.40/partitions](/doc/rest_api.html#slurmV0040GetPartitions)
* [GET /slurm/v0.0.40/partition/{partition_name}](/doc/rest_api.html#slurmV0040GetPartition)
  
To list node details use these endpoints:
* GET [/slurm/v0.0.40/nodes](/doc/rest_api.html#slurmV0040GetNodes)
* GET [/slurm/v0.0.40/node/{node_name}](/doc/rest_api.html#slurmV0040GetNode)

In [None]:
res = requests.get("http://127.0.0.1/slurm/v0.0.40/partitions", headers=get_auth_header())
# print(json.dumps(res.json(), indent=2))
table = [
    dict([
        (key, value)
        for key, value in partition.items()
        if key in ["tres", "name", "node_sets", "partition"]
    ])
    for partition in res.json()["partitions"]
]
pd.DataFrame.from_records(table) 

In [None]:
res = requests.get("http://127.0.0.1/slurm/v0.0.40/nodes", headers=get_auth_header())
# print(json.dumps(res.json(), indent=2))
table = [
    dict([
        (key, value)
        for key, value in node.items()
        if key in ["hostname", "tres", "reason", "state", "gres", "address", "architecture", "tres_used"]
    ])
    for node in res.json()["nodes"]
]
pd.DataFrame.from_records(table) 

### Admin Operations
You can also do admin operations via the api. eg. drain nodes, cancel other user's job. You will need to make these request as root, or a slurm admin.  
In this container lab setup, the private key used to sign JWT is world readable, hence in this notebook you can sign a jwt declaring yourself as root. This would certainly a security issue for a production system, so please keep your private key properly.

#### Admin example - Drain/undrain node
1. get list of node
2. drain the first node
3. undrain the drained node

In [None]:
# get node list
res = requests.get("http://127.0.0.1/slurm/v0.0.40/nodes", headers=get_auth_header())
node_list = [ node["hostname"] for node in res.json()["nodes"]]

# drain the node
res = requests.post(
    f"http://127.0.0.1/slurm/v0.0.40/node/{node_list[0]}", 
    json={
        "state": ["DRAIN"],
        "reason": "test api drain"
    },
    headers=get_auth_header(username="root")
)
print(json.dumps(res.json(), indent=2))

# check the state
res = requests.get(f"http://127.0.0.1/slurm/v0.0.40/node/{node_list[0]}", headers=get_auth_header())
print(json.dumps(res.json()["nodes"][0]["state"], indent=2))

# undrain the node
res = requests.post(
    f"http://127.0.0.1/slurm/v0.0.40/node/{node_list[0]}", 
    json={
        "state": ["RESUME"],
    },
    headers=get_auth_header(username="root")
)
print(json.dumps(res.json(), indent=2))