# Exercise: Train a U-Net for semantic segmentation in Nautilus


## <span style='color:green'>Before you get started, ensure you have successfully completed the [persistent volume notebook](./activities/PersistenVolume.ipynb)!</span>


In [2]:
import os
import sys

from jinja2 import Template
import yaml

# Part 1 Generate yml file for jobs

Instead of maually creating multiple yml files that vary very little from each other, we can exploit this advantage by generating them automatically

In [3]:
template ='''
apiVersion: batch/v1
kind: Job
metadata:
  name: job-unet-train-{{sso}}-{{fold}}
spec:
  template:
    spec:
      automountServiceAccountToken: false
      containers:
      - name: pod-unet-train-{{sso}}-{{fold}}
        image: gitlab-registry.nrp-nautilus.io/aomqc/umc_dsa_8430_sp2022_image:latest
        workingDir: /{{persistentVolume_name}}
        command: ["/bin/sh","-c"]
        args: ["python3 U_Net_training.py {{ fold }} {{ epochs }}; python3 U_Net_prediction.py {{ fold }}; python3 U_Net_evaluation.py {{ fold }} "]
        volumeMounts:
        - name: {{persistentVolume_name}}
          mountPath: /{{persistentVolume_name}}
        resources:
            limits:
              memory: 20Gi
              cpu: 1
              nvidia.com/gpu: 1
            requests:
              memory: 20Gi
              cpu: 1
              nvidia.com/gpu: 1
      volumes:
      - name: {{persistentVolume_name}} 
        persistentVolumeClaim:
            claimName: {{persistentVolume_name}}
      restartPolicy: OnFailure      
  backoffLimit: 


'''

In [4]:
j2_template = Template(template)

# Fill the variables below with the appropriate information

Update SSO with your Name, as in prior notebooks!

In [5]:
sso = 'scottgs' # as string
persistentVolume_name = 'scottgs-pv' # as string
epochs = 10 # you can choose a different value if you wish

In [6]:
for fold in list(range(5)):
    data = {'fold':fold,'sso':sso,'persistentVolume_name':persistentVolume_name,'epochs':epochs}
    output_file = j2_template.render(data)
    fileout = open('job-U_Net_training-{}-{}.yml'.format(sso,fold),'w')
    fileout.write(output_file)
    fileout.close()

### Before you continue, verify that you have created the 5 YAML files.

![Kubernetes_Exercise_CreatedYAML_FromTemplate.png MISSING](../images/Kubernetes_Exercise_CreatedYAML_FromTemplate.png)



---
# Part 2 Submit a job for training


You will have to submit each job manually, so you will have to do it 5 times since we have 5 folds.  

### For the same of the workshop, we will just submit our first YAML file.

Note, the below command is an example, where `{sso}` would be replaced with your actual SSO id, as you entered for the variable above.

```BASH
kubectl -n gpn-mizzou-tutorial create -f job-U_Net_training-{sso}-{fold}.yml
```


You can monitor the progress of your training by checking on your pods  
```BASH
kubectl -n gpn-mizzou-tutorial  get pods
```


Eventually, your pods should all be completed!

```BASH
jovyan@jupyter-scottgs-40missouri-2eedu:~/ParallelProgrammingAnalytics/module5/exercises$ kubectl get pods
NAME                                READY   STATUS      RESTARTS   AGE
job-unet-train-scottgs-0--1-zqs8w   0/1     Completed   0          9m15s
job-unet-train-scottgs-1--1-vlpxz   0/1     Completed   0          16m
job-unet-train-scottgs-2--1-q9c4s   0/1     Completed   0          9m12s
job-unet-train-scottgs-3--1-kmngw   0/1     Completed   0          9m8s
job-unet-train-scottgs-4--1-n659m   0/1     Completed   0          9m5s
```


### If not all your jobs complete successfully, you may need to resubmit it!

Connecting to your PVC, you can check for the existance of the expected output files.

When you have all your evaluation files, you are good to proceed.

```BASH
root@pod-scottgs:/data# find evaluation/ | sort
evaluation/
evaluation/eval_results_train0.csv
evaluation/eval_results_train1.csv
evaluation/eval_results_train2.csv
evaluation/eval_results_train3.csv
evaluation/eval_results_train4.csv
evaluation/eval_results_validation0.csv
evaluation/eval_results_validation1.csv
evaluation/eval_results_validation2.csv
evaluation/eval_results_validation3.csv
evaluation/eval_results_validation4.csv
```

# Part 3: Clean Up!

Once you have captured all the necessary screen shots, please clean up the pods and jobs.

**Important** Do not clean up your persistent volume storage, as the data you copied in is **required** for the next part of the workshop!

Here are some hints on clean up, where you will need to update them a little. 

See the Kubernetes documentation.
 * Cleaning up a job or pod: `kubectl -n gpn-mizzou-tutorial delete -f -f job-U_Net_training-{sso}-{fold}.yml`


---

# Notebook Complete

You have learned how to
 * Create multiple YAML files using scripting
 * Launch Pods with customized images
 
 
### If you would like a walk through of creating a custom container image and using the Nautilus Gitlab system, please see this video from Anes.
 * [Creating a custom container image for Nautilus](https://youtu.be/hNFMka10gFA)
