# Set up minikube and usage of docker image for torch processes + kale in AWS

Will follow: 

* For minikube: [minikube_sipecam/setup](https://github.com/CONABIO/kube_sipecam/tree/master/minikube_sipecam/setup#aws)

* docker image for torch: [torch/1.4.0_0.5.0/Dockerfile](https://github.com/CONABIO/kube_sipecam/blob/master/dockerfiles/torch/1.4.0_0.5.0/Dockerfile)

Will use [minikube_sipecam/deployments/torch/hostpath_pv](https://github.com/CONABIO/kube_sipecam/tree/master/minikube_sipecam/deployments/torch/hostpath_pv)

## Instance

In AWS we can select ami: `k8s-1.16-debian-buster-amd64-hvm-ebs-2020-04-27 - ami-0ab39819e336a3f3f` and instance `p2.xlarge` with `100` gb of disk.

Use next bash script for user data to install `kubectl`, download `minikube` and `kfctl`:


```
#!/bin/bash
##variables:
region=us-west-2
user=admin
name_instance=minikube-gpu
shared_volume=/shared_volume
##System update
export DEBIAN_FRONTEND=noninteractive
apt-get update -yq
##Install awscli
apt-get install -y python3-pip build-essential && pip3 install --upgrade pip
pip3 install awscli --upgrade
##Tag instance
INSTANCE_ID=$(curl -s http://instance-data/latest/meta-data/instance-id)
PUBLIC_IP=$(curl -s http://instance-data/latest/meta-data/public-ipv4)
aws ec2 create-tags --resources $INSTANCE_ID --tag Key=Name,Value=$name_instance-$PUBLIC_IP --region=$region
#check if locales are ok with next lines:
echo "export LC_ALL=C.UTF-8" >> /root/.profile
echo "export LANG=C.UTF-8" >> /root/.profile
echo "export mount_point=$shared_volume" >> /root/.profile
wget http://us.download.nvidia.com/tesla/418.116.00/NVIDIA-Linux-x86_64-418.116.00.run -O /root/NVIDIA-Linux-x86_64-418.116.00.run
cd /root/ && chmod a+x NVIDIA-Linux-x86_64-418.116.00.run && ./NVIDIA-Linux-x86_64-418.116.00.run --accept-license --silent
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey |   sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list |   sudo tee /etc/apt/sources.list.d/nvidia-docker.list
apt-get update
apt-get install -y nvidia-docker2 nvidia-container-runtime
echo '{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}' > /etc/docker/daemon.json
systemctl start docker
usermod -aG docker $user
newgrp docker
#Create shared volume
mkdir $shared_volume
#kubectl installation
curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x ./kubectl
mv ./kubectl /usr/local/bin/kubectl
kubectl version --client
#bash completion, needs to exit and enter again to take effect
#echo "source <(kubectl completion bash)" >> /root/.bashrc
#apt-get install -y bash-completion
#minikube download
curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 \
  && chmod +x minikube
cp minikube /usr/local/bin/
install minikube /usr/local/bin/
apt-get install conntrack -y
#kfctl download
cd /root && wget https://github.com/kubeflow/kfctl/releases/download/v1.0.2/kfctl_v1.0.2-0-ga476281_linux.tar.gz
tar -xvf kfctl_v1.0.2-0-ga476281_linux.tar.gz
echo "export PATH=$PATH:$(pwd)" >> /root/.profile
# Set KF_NAME to the name of your Kubeflow deployment. This also becomes the
# name of the directory containing your configuration.
# For example, your deployment name can be 'my-kubeflow' or 'kf-test'.
echo "export KF_NAME=kf-test" >> ~/.profile
echo "export BASE_DIR=/opt" >> ~/.profile
source ~/.profile
echo "export KF_DIR=${BASE_DIR}/${KF_NAME}" >> ~/.profile
wget https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.6.0/nvidia-device-plugin.yml -O /root/nvidia-device-plugin.yml
```

Check installation in AWS instance with: `tail -n 15  /var/log/cloud-init-output.log`.

Check installation of NVIDIA toolkit: `nvidia-smi`

**Ssh to instance, all commands will be executed as `root`**

```
sudo su
```


**Next will install, start `minikube` using `none` driver and install `kfctl`:**

```
CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_k8s_istio.v1.0.2.yaml"
source ~/.profile
chmod gou+wrx -R /opt/
mkdir -p ${KF_DIR}
#minikube start
cd /root && minikube start --driver=none

kubectl create -f /root/nvidia-device-plugin.yml

#check: kubectl describe daemonsets -n kube-system

#check: kubectl describe daemonsets.app -n kube-system

#kubeflow start
cd ${KF_DIR}

wget $CONFIG_URI
wget https://codeload.github.com/kubeflow/manifests/tar.gz/v1.0.2 -O v1.0.2.tar.gz

```

change kfctl_k8s_istio.v1.0.2.yaml at the end uri:

```
#this section:
  repos:
  - name: manifests
    uri: https://github.com/kubeflow/manifests/archive/v1.0.2.tar.gz
#for: 
  repos:
  - name: manifests
    uri: file:///opt/kf-test/v1.0.2.tar.gz
```

Then:

```
kfctl apply -V -f kfctl_k8s_istio.v1.0.2.yaml
```



**Check pods and status with:**



`minikube status`

```
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured
```

`kubectl get pods -n kubeflow`

```
#all running except:
spark-operatorcrd-cleanup-2p7x2                                0/2     Completed   0          7m6s
```



**To access kubeflow UI set:**



```
export INGRESS_HOST=$(minikube ip)
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
echo $INGRESS_PORT
```


**And go to:**

```
http://<ipv4 of ec2 instance>:$INGRESS_PORT
```

## Deployments and services 


**Set:**

```
TORCH_PV=hostpath-pv
TORCH_PVC=hostpath-pvc
TORCH_URL=https://raw.githubusercontent.com/CONABIO/kube_sipecam/master/minikube_sipecam/deployments/torch/
```

**Create storage:**


```
kubectl create -f $TORCH_URL/hostpath_pv/$TORCH_PV.yaml
kubectl create -f $TORCH_URL/hostpath_pv/$TORCH_PVC.yaml
```

**Next for GPU deployment (for testing notebooks):**

**Create service:**

```
TORCH_LOAD_BALANCER_SERVICE_GPU=loadbalancer-torch-1.4.0_0.5.0-hostpath-pv-gpu
kubectl create -f $TORCH_URL/hostpath_pv/$TORCH_LOAD_BALANCER_SERVICE_GPU.yaml
```

**Create deployment:**

```
TORCH_JUPYTERLAB_SERVICE_GPU=jupyterlab-torch-1.4.0_0.5.0-hostpath-pv-gpu
kubectl create -f $TORCH_URL/hostpath_pv/$TORCH_JUPYTERLAB_SERVICE_GPU.yaml
```

**And go to:**

```
http://<ipv4 of ec2 instance>:30001/torchurl
```

**Next for CPU deployment and launch via kale:**

**Create service:**

```
TORCH_LOAD_BALANCER_SERVICE_CPU=loadbalancer-torch-1.4.0_0.5.0-hostpath-pv-cpu
kubectl create -f $TORCH_URL/hostpath_pv/$TORCH_LOAD_BALANCER_SERVICE_CPU.yaml
```

**Create deployment:**

```
TORCH_JUPYTERLAB_SERVICE_CPU=jupyterlab-torch-1.4.0_0.5.0-hostpath-pv-cpu
kubectl create -f $TORCH_URL/hostpath_pv/$TORCH_JUPYTERLAB_SERVICE_CPU.yaml
```

**And go to:**

```
http://<ipv4 of ec2 instance>:30002/torchurl
```

# Note:

If disk is full which could happen if a kubeflow pipeline will be uploaded from kale:

```
HTTP response headers: HTTPHeaderDict({'Date': 'Tue, 01 Sep 2020 18:12:22 GMT', 'Content-Length': '487', 'Content-Type': 'text/plain; charset=utf-8'})
HTTP response body: {"error_message":"Error creating pipeline: Create pipeline failed: InternalServerError: Failed to store b2fa5a70-cab4-4c89-8784-9c0cb118d1b4: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed.","error_details":"Error creating pipeline: Create pipeline failed: InternalServerError: Failed to store b2fa5a70-cab4-4c89-8784-9c0cb118d1b4: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed."}
```

Delete kubeflow (MAD-Mex and geonode deployments)

To free space:

```
minikube stop
minikube delete
```

Check:

```
docker system df
docker system prune --all --volumes
rm -r /root/.minikube/*
rm -r /root/.kube/*
rm -r /opt/kf-test
```

Start again (being in root dir):

```
CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_k8s_istio.v1.0.2.yaml"
source ~/.profile
chmod gou+wrx -R /opt/
mkdir -p ${KF_DIR}
#minikube start
cd /root && minikube start --driver=none
#kubeflow start
cd ${KF_DIR}

wget $CONFIG_URI
wget https://codeload.github.com/kubeflow/manifests/tar.gz/v1.0.2 -O v1.0.2.tar.gz

```

change kfctl_k8s_istio.v1.0.2.yaml at the end uri:

```
#this section:
  repos:
  - name: manifests
    uri: https://github.com/kubeflow/manifests/archive/v1.0.2.tar.gz
#for: 
  repos:
  - name: manifests
    uri: file:///opt/kf-test/v1.0.2.tar.gz
```

Then:

```
kfctl apply -V -f kfctl_k8s_istio.v1.0.2.yaml
```



ref: https://github.com/aws-samples/eks-workshop/issues/639

If there's problems with geonode (because stack of docker-compose was deleted, clone again repo and deploy geonode)