# HPC Cluster - Slurm

This tutorial demonstrates submiting and monitoring a job which is running an external SLURM system. We can submit jobs to a Slurm HPC cluster using the Bridge operator or submitting a Kubeflow Pipelines script which uses the Slurm pod, or using the Slurm pod directly. This tutorial will demonstrate the setup and deployment for running a test script, and how to use S3 for file upload and download for all three implementations. See also [README](https://github.ibm.com/Accelerated-Discovery/bridge-operator/tree/master/pods/slurm).

--------------------------------------------------------------------------------------------------------------------

##  Setup 

#### S3
Create the S3 bucket with input files

- Create a test bucket on S3 called "mybucket" and upload the sample batch script `slurm_batch.sh`


#### Create environment variables

For these tests we need to specify our S3 and resource endpoints and S2 bucket name. If the job script, parameter file and metadata file are in S3 the we also need to provide the ```bucket:folder/filename```

In [None]:
%env ENDPOINT=minio-kubeflow.apps.adp-rosa-2.5wcf.p1.openshiftapps.com
%env BUCKET=mybucket
%env RESOURCE_URL=http://ec2-3-139-236-142.us-east-2.compute.amazonaws.com:8082/slurm/v0.0.37/

%env JOBSCRIPT=mybucket:slurm_batch.sh

#### Create the S3 and Slurm secrets needed by the pod

Edit the S3 and Slurm secret yaml file with credentials to access S3. Then create these secrets in the namespace you wish to run jobs in, e.g. set env variable and  run in bridge-operator-system use 

NOTE for slurm you need to generate a token using e.g.
`scontrol token lifespan=$((3600*24))`

In [None]:
# Define env names for secrets to be used for all jobs
%env S3_SECRET=mysecret-s3
%env RESOURCE_SECRET=secret-slurm

!sed -i '' "s#{{S3_SECRET}}#$S3_SECRET#g" ../core/secrets/s3secret.yaml 
!sed -i '' "s#{{RESOURCE_SECRET}}#$RESOURCE_SECRET#g" ../core/secrets/slurmsecret.yaml 


In [None]:
!kubectl apply -f ../core/secrets/slurmsecret.yaml -n bridge-operator-system
!kubectl apply -f ../core/secrets/s3secret.yaml -n bridge-operator-system

--------------------------------------------------------------------------------------------------------------------

## 1. Testing the Slurm pod directly

Testing of individual pods can be done directly without invoking the Bridge operator.

For Slurm the ```samples/tests/slurm/pod.yaml``` specifies
- the pod image to use ```quay.io/ibmdpdev/slurm-pod:v0.0.1```
- the jobname ```hpcjob```
The secret name for both the resource and S3 must be set using:

In [None]:

!sed -i '' "s#{{S3_SECRET}}#$S3_SECRET#g" ../test/slurm/pod.yaml
!sed -i '' "s#{{RESOURCE_SECRET}}#$RESOURCE_SECRET#g" ../test/slurm/pod.yaml


The configmap yamls are in ```samples/tests/slurm/ ``` and there you must configure 
- the Minio endpoint
- the S3 bucket name
- the resource URL

Run the following to set the envoirnment variables and create the configmap. Then submit the job:

In [None]:
!sed -i '' "s#{{ENDPOINT}}#$ENDPOINT#g" ../test/slurm/hpcjob-sample0_cm.yaml
!sed -i '' "s#{{RESOURCE_URL}}#$RESOURCE_URL#g" ../test/slurm/hpcjob-sample0_cm.yaml
!sed -i '' "s#{{RESOURCE_SECRET}}#$RESOURCE_SECRET#g" ../test/slurm/hpcjob-sample0_cm.yaml

In [None]:
!kubectl apply -f ../test/slurm/hpcjob-sample0_cm.yaml
!kubectl apply -f ../test/slurm/pod.yaml 

In [None]:
#Monitor the job
!kubectl logs hpcjob-pod
!kubectl describe pod hpcjob-pod
# Once the job completes the log file will be in the S3 bucket specified in the configmap

## 2. Bridge operator for Slurm

There are two sample yaml files in ```samples/core/operator``` for running jobs to a Slurm cluster using the Bridge operator.
Before running either job edit the files so that 

- S3storage: endpoint: corresponds to your S3 endpoint
- S3upload: bucket: corresponds to your bucket in S3

### Remote script and inline job parameters example 
The ```job0slurm.yaml``` submits a simple job script which is 'inline'.
To edit the yaml and run the job:

In [None]:
 
!sed -i '' "s#{{RESOURCE_URL}}#$RESOURCE_URL#g" ../core/operator/job0slurm.yaml
!sed -i '' "s#{{RESOURCE_SECRET}}#$RESOURCE_SECRET#g" ../core/operator/job0slurm.yaml


In [None]:
!kubectl apply -f ../core/operator/job0slurm.yaml 

### Inline script and job parameters example
The ```job1qslurm.yaml``` submits a job script which is in S3. 
To run the job:

In [None]:
!sed -i '' "s#{{RESOURCE_URL}}#$RESOURCE_URL#g" ../core/operator/job1slurm.yaml
!sed -i '' "s#{{ENDPOINT}}#$ENDPOINT#g" ../core/operator/job1slurm.yaml
!sed -i '' "s#{{JOBSCRIPT}}#$JOBSCRIPT#g" ../core/operator/job1slurm.yaml

!sed -i '' "s#{{S3_SECRET}}#$S3_SECRET#g" ../core/operator/job1slurm.yaml
!sed -i '' "s#{{RESOURCE_SECRET}}#$RESOURCE_SECRET#g" ../core/operator/job1slurm.yaml


In [None]:
!kubectl apply -f ../core/operator/job1slurm.yaml

--------------------------------------------------------------------------------------------------------------------

## 3. KubeFlow Pipelines

These examples assume you have access to a KFP with Tekton installation where you can submit and run jobs or upload pipelines to the KFP UI. See e.g. ``` bridge-operator/kubeflow/```

The credentials for S3 and the external resource should be saved to the kubeflow namespace:

In [None]:
!kubectl apply -f ../core/secrets/slurmsecret.yaml -n kubeflow
!kubectl apply -f ../core/secrets/s3secret.yaml -n kubeflow

The implementation with KubeFlow Pipelines uses a general ```bridge-pipeline``` given in ```kubeflow/bridge_pipeline_handler.py``` and the specific implementation for Slurm is in ```kubeflow/implementations/slurm_invoker.py```

1. compile the bridge pipeline

``` $ python bridge_pipeline_handler.py ```

2. Upload the generated yaml to the KFP UI > pipelines


3. Run ```kubeflow/implementations/slurm_invoker.py``` providing

- a host endpoint for KFP
- a ```s3endpoint``` for S3 
- a ```s3_secret``` name 
- a ```resource_secret``` name 

In [None]:
# submit the job
!python ../../kubeflow/implementations/slurm_invoker.py --kfphost=<KFP_HOST> \
                                                      --s3endpoint=<s3ENDPOINT> --s3_secret=<S3_SECRET> \
                                                      --script=<BUCKET:SCRIPT> --resource_secret=<RESOURCE_SECRET>

--------------------------------------------------------------------------------------------------------------------