# HPC Cluster - LSF

This tutorial demonstrates submiting and monitoring a job which is running an external LSF system. We can submit jobs to a LSF HPC cluster using the Bridge operator or submitting a Kubeflow Pipelines script which uses the LSF pod, or using the LSF pod directly. This tutorial will demonstrate the setup and deployment for running a test script, and how to use S3 for file upload and download for all three implementations. See also [README](https://github.ibm.com/Accelerated-Discovery/bridge-operator/tree/master/pods/lsf).

--------------------------------------------------------------------------------------------------------------------

##  Setup 

#### S3
Create the S3 bucket with input files

- Create a test bucket on S3 called "mybucket" and upload the sample batch script `lsf_batch.sh` and a test file `test.txt`


#### Create environment variables

For these tests we need to specify our S3 and resource endpoints and S3 bucket name. If the job script, parameter file and metadata file are in S3 the we also need to provide the ```bucket:folder/filename```

In [8]:
%env ENDPOINT=minio-kubeflow.apps.adp-rosa-2.5wcf.p1.openshiftapps.com
%env BUCKET=mybucket
%env RESOURCE_URL=https://161.156.200.86:8443/platform/

%env JOBSCRIPT=mybucket:lsf_batch.sh
%env ADDITIONALDATA=mybucket:test.txt
    
%env REMOTEJOBSCRIPT=/home/lsfadmin/shared/tests/batch.sh
%env UPLOADFILE=sample.out

%env S3_SECRET=mysecret-s3
%env RESOURCE_SECRET=mysecret

env: ENDPOINT=minio-kubeflow.apps.adp-rosa-2.5wcf.p1.openshiftapps.com
env: BUCKET=mybucket
env: RESOURCE_URL=https://161.156.200.86:8443/platform/
env: JOBSCRIPT=mybucket:lsf_batch.sh
env: ADDITIONALDATA=mybucket:test.txt
env: REMOTEJOBSCRIPT=/home/lsfadmin/shared/tests/batch.sh
env: UPLOADFILE=sample.out
env: S3_SECRET=mysecret-s3
env: RESOURCE_SECRET=mysecret


#### Create the S3 and LSF secrets needed by the pod

Edit the S3 and LSF secret yaml file with credentials to access S3. Then create these secrets in the namespace you wish to run jobs in, e.g. set env variable and  run in bridge-operator-system use 

In [3]:
# Define env names for secrets to be used for all jobs

!sed -i '' "s#{{S3_SECRET}}#$S3_SECRET#g" ../core/secrets/s3secret.yaml
!sed -i '' "s#{{RESOURCE_SECRET}}#$RESOURCE_SECRET#g" ../core/secrets/lsfsecret.yaml


env: S3_SECRET=mysecret-s3
env: RESOURCE_SECRET=mysecret


In [None]:
!kubectl apply -f ../core/secrets/lsfsecret.yaml -n bridge-operator-system
!kubectl apply -f ../core/secrets/s3secret.yaml -n bridge-operator-system

--------------------------------------------------------------------------------------------------------------------

## 1. Testing the LSF pod directly

Testing of individual pods can be done directly without invoking the Bridge operator.

For LSF the ```samples/tests/lsf/pod.yaml``` specifies
- the pod image to use ```quay.io/ibmdpdev/lsf-pod:v0.0.1```
- the jobname ```hpcjob```
The secret name for both the resource and S3 must be set using:

In [5]:

!sed -i '' "s#{{S3_SECRET}}#$S3_SECRET#g" ../test/lsf/pod.yaml
!sed -i '' "s#{{RESOURCE_SECRET}}#$RESOURCE_SECRET#g" ../test/lsf/pod.yaml


The configmap yamls are in ```samples/tests/lsf/ ``` and there you must configure 
- the Minio endpoint
- the S3 bucket name
- the resource URL

Run the following to set the envoirnment variables and create the configmap. Then submit the job:

In [6]:
!sed -i '' "s#{{ENDPOINT}}#$ENDPOINT#g" ../test/lsf/hpcjob-sample0_cm.yaml
!sed -i '' "s#{{RESOURCE_URL}}#$RESOURCE_URL#g" ../test/lsf/hpcjob-sample0_cm.yaml
!sed -i '' "s#{{RESOURCE_SECRET}}#$RESOURCE_SECRET#g" ../test/lsf/hpcjob-sample0_cm.yaml
!sed -i '' "s#{{REMOTEJOBSCRIPT}}#$REMOTEJOBSCRIPT#g" ../test/lsf/hpcjob-sample0_cm.yaml
!sed -i '' "s#{{ADDITIONALDATA}}#$ADDITIONALDATA#g" ../test/lsf/hpcjob-sample0_cm.yaml
!sed -i '' "s#{{S3_SECRET}}#$S3_SECRET#g" ../test/lsf/hpcjob-sample0_cm.yaml

In [2]:
!kubectl apply -f ../test/lsf/hpcjob-sample0_cm.yaml
!kubectl apply -f ../test/lsf/pod.yaml

configmap/hpcjob-bridge-cm created
serviceaccount/hpc-cm-viewer unchanged
role.rbac.authorization.k8s.io/hpc-cm-viewer-role unchanged
rolebinding.rbac.authorization.k8s.io/hpc-cm-viewer-rolebinding unchanged
pod/hpcjob-pod created


In [None]:
#Monitor the job
!kubectl logs hpcjob-pod
!kubectl describe pod hpcjob-pod
# Once the job completes the log file will be in the S3 bucket specified in the configmap

Testing inline script and upload results to S3. Note that the results are uploaded to the S3 location ```mybucket:hpcjob/```

In [9]:
!sed -i '' "s#{{ENDPOINT}}#$ENDPOINT#g" ../test/lsf/hpcjob-sample1_cm.yaml
!sed -i '' "s#{{RESOURCE_URL}}#$RESOURCE_URL#g" ../test/lsf/hpcjob-sample1_cm.yaml
!sed -i '' "s#{{RESOURCE_SECRET}}#$RESOURCE_SECRET#g" ../test/lsf/hpcjob-sample1_cm.yaml
!sed -i '' "s#{{REMOTEJOBSCRIPT}}#$REMOTEJOBSCRIPT#g" ../test/lsf/hpcjob-sample1_cm.yaml
!sed -i '' "s#{{ADDITIONALDATA}}#$ADDITIONALDATA#g" ../test/lsf/hpcjob-sample1_cm.yaml
!sed -i '' "s#{{S3_SECRET}}#$S3_SECRET#g" ../test/lsf/hpcjob-sample1_cm.yaml
!sed -i '' "s#{{BUCKET}}#$BUCKET#g" ../test/lsf/hpcjob-sample1_cm.yaml
!sed -i '' "s#{{UPLOADFILE}}#$UPLOADFILE#g" ../test/lsf/hpcjob-sample1_cm.yaml

In [11]:
!kubectl apply -f ../test/lsf/hpcjob-sample1_cm.yaml
!kubectl apply -f ../test/lsf/pod.yaml

configmap/hpcjob-bridge-cm created
serviceaccount/hpc-cm-viewer unchanged
role.rbac.authorization.k8s.io/hpc-cm-viewer-role unchanged
rolebinding.rbac.authorization.k8s.io/hpc-cm-viewer-rolebinding unchanged
pod/hpcjob-pod created


## 2. Bridge operator for LSF

There are three sample yaml files in ```samples/core/operator``` for running jobs to a LSF cluster using the Bridge operator.
Before running either job edit the files so that 

- S3storage: endpoint: corresponds to your S3 endpoint
- S3upload: bucket: corresponds to your bucket in S3

### Run remote script on LSF cluster, download input file from S3 and upload output file to S3 example 
The ```job0lsf.yaml``` submits a simple job script which is 'remote' on the LSF cluster.
To edit the yaml and run the job:

In [16]:
 
!sed -i '' "s#{{RESOURCE_URL}}#$RESOURCE_URL#g" ../core/operator/job0lsf.yaml
!sed -i '' "s#{{RESOURCE_SECRET}}#$RESOURCE_SECRET#g" ../core/operator/job0lsf.yaml
!sed -i '' "s#{{ENDPOINT}}#$ENDPOINT#g" ../core/operator/job0lsf.yaml
!sed -i '' "s#{{REMOTEJOBSCRIPT}}#$REMOTEJOBSCRIPT#g" ../core/operator/job0lsf.yaml
!sed -i '' "s#{{ADDITIONALDATA}}#$ADDITIONALDATA#g" ../core/operator/job0lsf.yaml
!sed -i '' "s#{{S3_SECRET}}#$S3_SECRET#g" ../core/operator/job0lsf.yaml
!sed -i '' "s#{{BUCKET}}#$BUCKET#g" ../core/operator/job0lsf.yaml
!sed -i '' "s#{{UPLOADFILE}}#$UPLOADFILE#g" ../core/operator/job0lsf.yaml

In [17]:
!kubectl apply -f ../core/operator/job0lsf.yaml 

bridgejob.bridgejob.ibm.com/lsfjob created


### S3 script example
The ```job2lsf.yaml``` submits a job script which is in an S3 bucket. 
To run the job:

In [21]:
!sed -i '' "s#{{RESOURCE_URL}}#$RESOURCE_URL#g" ../core/operator/job2lsf.yaml
!sed -i '' "s#{{RESOURCE_SECRET}}#$RESOURCE_SECRET#g" ../core/operator/job2lsf.yaml
!sed -i '' "s#{{ENDPOINT}}#$ENDPOINT#g" ../core/operator/job2lsf.yaml
!sed -i '' "s#{{S3_SECRET}}#$S3_SECRET#g" ../core/operator/job2lsf.yaml
!sed -i '' "s#{{JOBSCRIPT}}#$JOBSCRIPT#g" ../core/operator/job2lsf.yaml

In [22]:
!kubectl apply -f ../core/operator/job2lsf.yaml

bridgejob.bridgejob.ibm.com/lsfjob created


--------------------------------------------------------------------------------------------------------------------

## 3. KubeFlow Pipelines

These examples assume you have access to a KFP with Tekton installation where you can submit and run jobs or upload pipelines to the KFP UI. See e.g. ``` bridge-operator/kubeflow/```

The credentials for S3 and the external resource should be saved to the kubeflow namespace:

In [23]:
!kubectl apply -f ../core/secrets/lsfsecret.yaml-COPY -n kubeflow
!kubectl apply -f ../core/secrets/s3secret.yaml-COPY -n kubeflow

secret/mysecret configured
secret/mysecret-s3 configured


The implementation with KubeFlow Pipelines uses a general ```bridge-pipeline``` given in ```kubeflow/bridge_pipeline_handler.py``` and the specific implementation for LSF is in ```kubeflow/implementations/lsf_invoker.py```

1. compile the bridge pipeline

``` $ python bridge_pipeline_handler.py ```

2. Upload the generated yaml to the KFP UI > pipelines


3. Run ```kubeflow/implementations/lsf_invoker.py``` providing

- a host endpoint for KFP
- a ```s3endpoint``` for S3 
- a ```s3_secret``` name 
- a ```resource_secret``` name 

In [None]:
# submit the job
!python ../../kubeflow/implementations/lsf_invoker.py --kfphost=<KFP_HOST> \
                                                      --s3endpoint=<s3ENDPOINT> --s3_secret=<S3_SECRET> \
                                                      --script=<BUCKET:SCRIPT> --resource_secret=<RESOURCE_SECRET>

--------------------------------------------------------------------------------------------------------------------