# Set up minikube and usage of docker image for torch processes + kale in AWS

Will follow: 

* For minikube: [minikube_sipecam/setup](https://github.com/CONABIO/kube_sipecam/tree/master/minikube_sipecam/setup#aws)

* docker image for torch: [torch/1.4.0_0.5.0/Dockerfile](https://github.com/CONABIO/kube_sipecam/blob/master/dockerfiles/torch/1.4.0_0.5.0/Dockerfile)


Will use [minikube_sipecam/deployments/torch/hostpath_pv](https://github.com/CONABIO/kube_sipecam/tree/master/minikube_sipecam/deployments/torch/hostpath_pv)

## Instance


In AWS account we can select ami: `minikube-sipecam-gpu` which has next description:

Based in k8s-1.16-debian-buster-amd64-hvm-ebs-2020-04-27 - ami-0ab39819e336a3f3f Contains kubectl 1.19.1 minikube 1.13.0 kubeflow 1.0.2  nvidia-docker Docker version 19.03.4, build 9013bf583a nvidia-container-runtime runc version 1.0.0-rc8+dev commit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657 spec: 1.0.1-dev


and instance `p2.xlarge` with `100` gb of disk.


Use next bash script for user data:


```
#!/bin/bash
##variables:
region=us-west-2
name_instance=minikube-gpu-18-09-2020
##System update
apt-get update -yq
##Tag instance
INSTANCE_ID=$(curl -s http://instance-data/latest/meta-data/instance-id)
PUBLIC_IP=$(curl -s http://instance-data/latest/meta-data/public-ipv4)
aws ec2 create-tags --resources $INSTANCE_ID --tag Key=Name,Value=$name_instance-$PUBLIC_IP --region=$region
```

**Ssh to instance, all commands will be executed as root**

`sudo su`


**Next will start minikube create device plugin for nvidia and kubeflow pods:**

```
cd /root && minikube start --driver=none


kubectl create -f /root/nvidia-device-plugin.yml


cd /opt/kf-test && /root/kfctl apply -V -f kfctl_k8s_istio.v1.0.2.yaml
```


Check pods and status with:

```
minikube status

minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured
```

```
kubectl get pods -n kubeflow

#all running except:
spark-operatorcrd-cleanup-2p7x2                                0/2     Completed   0          7m6s
```



**To access kubeflow UI set:**

```
export INGRESS_HOST=$(minikube ip)
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
echo $INGRESS_PORT
```


**And go to:**

```
http://<ipv4 of ec2 instance>:$INGRESS_PORT
```



## Deployments and services 


**Set:**

```
TORCH_PV=hostpath-pv
TORCH_PVC=hostpath-pvc
TORCH_URL=https://raw.githubusercontent.com/CONABIO/kube_sipecam/master/minikube_sipecam/deployments/torch/
```

**Create storage:**


```
kubectl create -f $TORCH_URL/hostpath_pv/$TORCH_PV.yaml
kubectl create -f $TORCH_URL/hostpath_pv/$TORCH_PVC.yaml
```

**Next for GPU deployment (for testing notebooks):**

**Create service:**

```
TORCH_LOAD_BALANCER_SERVICE_GPU=loadbalancer-torch-1.4.0_0.5.0-hostpath-pv-gpu
kubectl create -f $TORCH_URL/hostpath_pv/$TORCH_LOAD_BALANCER_SERVICE_GPU.yaml
```

**Create deployment:**

```
TORCH_JUPYTERLAB_SERVICE_GPU=jupyterlab-torch-1.4.0_0.5.0-hostpath-pv-gpu
kubectl create -f $TORCH_URL/hostpath_pv/$TORCH_JUPYTERLAB_SERVICE_GPU.yaml
```

**And go to:**

```
http://<ipv4 of ec2 instance>:30001/torchurl
```

**Next for CPU deployment and launch via kale:**

**Create service:**

```
TORCH_LOAD_BALANCER_SERVICE_CPU=loadbalancer-torch-1.4.0_0.5.0-hostpath-pv-cpu
kubectl create -f $TORCH_URL/hostpath_pv/$TORCH_LOAD_BALANCER_SERVICE_CPU.yaml
```

**Create deployment:**

```
TORCH_JUPYTERLAB_SERVICE_CPU=jupyterlab-torch-1.4.0_0.5.0-hostpath-pv-cpu
kubectl create -f $TORCH_URL/hostpath_pv/$TORCH_JUPYTERLAB_SERVICE_CPU.yaml
```

**And go to:**

```
http://<ipv4 of ec2 instance>:30002/torchurl
```

# Note:

If disk is full which could happen if a kubeflow pipeline will be uploaded from kale:

```
HTTP response headers: HTTPHeaderDict({'Date': 'Tue, 01 Sep 2020 18:12:22 GMT', 'Content-Length': '487', 'Content-Type': 'text/plain; charset=utf-8'})
HTTP response body: {"error_message":"Error creating pipeline: Create pipeline failed: InternalServerError: Failed to store b2fa5a70-cab4-4c89-8784-9c0cb118d1b4: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed.","error_details":"Error creating pipeline: Create pipeline failed: InternalServerError: Failed to store b2fa5a70-cab4-4c89-8784-9c0cb118d1b4: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed."}
```

Delete kubeflow (MAD-Mex and geonode deployments)

To free space:

```
minikube stop
minikube delete
```

Check:

```
docker system df
docker system prune --all --volumes
rm -r /root/.minikube/*
rm -r /root/.kube/*
rm -r /opt/kf-test
```

Start again (being in root dir):

```
CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_k8s_istio.v1.0.2.yaml"
source ~/.profile
chmod gou+wrx -R /opt/
mkdir -p ${KF_DIR}
#minikube start
cd /root && minikube start --driver=none
#kubeflow start
cd ${KF_DIR}

wget $CONFIG_URI
wget https://codeload.github.com/kubeflow/manifests/tar.gz/v1.0.2 -O v1.0.2.tar.gz

```

change kfctl_k8s_istio.v1.0.2.yaml at the end uri:

```
#this section:
  repos:
  - name: manifests
    uri: https://github.com/kubeflow/manifests/archive/v1.0.2.tar.gz
#for: 
  repos:
  - name: manifests
    uri: file:///opt/kf-test/v1.0.2.tar.gz
```

Then:

```
kfctl apply -V -f kfctl_k8s_istio.v1.0.2.yaml
```



ref: https://github.com/aws-samples/eks-workshop/issues/639

If there's problems with geonode (because stack of docker-compose was deleted, clone again repo and deploy geonode)