# Set up minikube and usage of docker image for torch processes + kale in AWS

Will follow: 

* For minikube: [minikube_sipecam/setup](https://github.com/CONABIO/kube_sipecam/tree/master/minikube_sipecam/setup#aws)

* docker image for torch: 


Will use [minikube_sipecam/deployments/torch/hostpath_pv]

## Instance

In AWS we can select ami: `k8s-1.16-debian-buster-amd64-hvm-ebs-2020-04-27 - ami-0ab39819e336a3f3f` and instance `p2.xlarge` with `100` gb of disk.

Use next bash script for user data to install `kubectl`, download `minikube` and `kfctl`:


```
#!/bin/bash
##variables:
region=us-west-2
user=admin
name_instance=minikube-gpu
shared_volume=/shared_volume
##System update
export DEBIAN_FRONTEND=noninteractive
apt-get update -yq
##Install awscli
apt-get install -y python3-pip build-essential && pip3 install --upgrade pip
pip3 install awscli --upgrade
##Tag instance
INSTANCE_ID=$(curl -s http://instance-data/latest/meta-data/instance-id)
PUBLIC_IP=$(curl -s http://instance-data/latest/meta-data/public-ipv4)
aws ec2 create-tags --resources $INSTANCE_ID --tag Key=Name,Value=$name_instance-$PUBLIC_IP --region=$region
#check if locales are ok with next lines:
echo "export LC_ALL=C.UTF-8" >> /root/.profile
echo "export LANG=C.UTF-8" >> /root/.profile
echo "export mount_point=$shared_volume" >> /root/.profile
wget http://us.download.nvidia.com/tesla/418.116.00/NVIDIA-Linux-x86_64-418.116.00.run -O /root/NVIDIA-Linux-x86_64-418.116.00.run
cd /root/ && chmod a+x NVIDIA-Linux-x86_64-418.116.00.run && ./NVIDIA-Linux-x86_64-418.116.00.run --accept-license --silent
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey |   sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list |   sudo tee /etc/apt/sources.list.d/nvidia-docker.list
apt-get update
apt-get install -y nvidia-docker2 nvidia-container-runtime
echo '{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}' > /etc/docker/daemon.json
systemctl start docker
usermod -aG docker $user
newgrp docker
#Create shared volume
mkdir $shared_volume
#kubectl installation
curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x ./kubectl
mv ./kubectl /usr/local/bin/kubectl
kubectl version --client
#bash completion, needs to exit and enter again to take effect
#echo "source <(kubectl completion bash)" >> /root/.bashrc
#apt-get install -y bash-completion
#minikube download
curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 \
  && chmod +x minikube
cp minikube /usr/local/bin/
install minikube /usr/local/bin/
apt-get install conntrack -y
#kfctl download
cd /root && wget https://github.com/kubeflow/kfctl/releases/download/v1.0.2/kfctl_v1.0.2-0-ga476281_linux.tar.gz
tar -xvf kfctl_v1.0.2-0-ga476281_linux.tar.gz
echo "export PATH=$PATH:$(pwd)" >> /root/.profile
# Set KF_NAME to the name of your Kubeflow deployment. This also becomes the
# name of the directory containing your configuration.
# For example, your deployment name can be 'my-kubeflow' or 'kf-test'.
echo "export KF_NAME=kf-test" >> ~/.profile
echo "export BASE_DIR=/opt" >> ~/.profile
source ~/.profile
echo "export KF_DIR=${BASE_DIR}/${KF_NAME}" >> ~/.profile
```

Check installation in AWS instance with: `tail -n 15  /var/log/cloud-init-output.log`.

**Ssh to instance, all commands will be executed as `root`**

```
sudo su
```


**Next will install, start `minikube` using `none` driver and install `kfctl`:**

```
CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_k8s_istio.v1.0.2.yaml"
source ~/.profile
chmod gou+wrx -R /opt/
mkdir -p ${KF_DIR}
#minikube start
cd /root && minikube start --driver=none

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.6.0/nvidia-device-plugin.yml

#check: kubectl describe daemonsets -n kube-system

#kubeflow start
cd ${KF_DIR}

wget $CONFIG_URI
wget https://codeload.github.com/kubeflow/manifests/tar.gz/v1.0.2 -O v1.0.2.tar.gz

```

change kfctl_k8s_istio.v1.0.2.yaml at the end uri:

```
#this section:
  repos:
  - name: manifests
    uri: https://github.com/kubeflow/manifests/archive/v1.0.2.tar.gz
#for: 
  repos:
  - name: manifests
    uri: file:///opt/kf-test/v1.0.2.tar.gz
```

Then:

```
kfctl apply -V -f kfctl_k8s_istio.v1.0.2.yaml
```



**Check pods and status with:**



`minikube status`

```
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured
```

`kubectl get pods -n kubeflow`

```
#all running except:
spark-operatorcrd-cleanup-2p7x2                                0/2     Completed   0          7m6s
```



**To access kubeflow UI set:**



```
export INGRESS_HOST=$(minikube ip)
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
echo $INGRESS_PORT
```


**And go to:**

```
http://<ipv4 of ec2 instance>:$INGRESS_PORT
```

## Deployments and services 


**Set:**

```
MAD_MEX_PV=hostpath-pv
MAD_MEX_PVC=hostpath-pvc
MAD_MEX_URL=https://raw.githubusercontent.com/CONABIO/kube_sipecam/master/minikube_sipecam/deployments/MAD_Mex/
```

**Create storage:**


```
kubectl create -f $MAD_MEX_URL/hostpath_pv/$MAD_MEX_PV.yaml
kubectl create -f $MAD_MEX_URL/hostpath_pv/$MAD_MEX_PVC.yaml
```

**Create service:**

`loadbalancer-torch-1-4-0-0-5-0-hostpath-pv.yaml`

```

kind: Service
apiVersion: v1
metadata:
        name: loadbalancer-torch-1-4-0-0-5-0-hostpath-pv
        namespace: kubeflow
spec:
        type: LoadBalancer
        ports:
                - port: 8888
                  targetPort: 8888
                  protocol: TCP
                  nodePort: 30001 #select port of your preference
        selector:
                app: jupyterlab-torch-1-4-0-0-5-0-app

```

`kubectl create -f loadbalancer-torch-1-4-0-0-5-0-hostpath-pv.yaml`

**Create deployment:**

Dockerfile

```
#nbclient 0.5.0 requires nbformat>=5.0, but you'll have nbformat 4.4.0 which is incompatible.
#kfp 0.3.0 requires kfp-server-api<0.4.0,>=0.2.5, but you'll have kfp-server-api 0.1.18.3 which is incompatible.
FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04

ENV TIMEZONE America/Mexico_City
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV DEBIAN_FRONTEND noninteractive
ENV DEB_BUILD_DEPS="sudo nano less git wget curl python3-dev python3-pip python3-setuptools software-properties-common"
ENV DEB_PACKAGES=""
ENV PIP_PACKAGES_COMMON="numpy==1.18.0 scipy==1.4.1 pandas matplotlib seaborn"
ENV PIP_PACKAGES_TORCH="torch==1.4.0+cu100 torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html"
ENV PIP_PACKAGES_KALE="setuptools==41.2 click==7.0 six==1.12.0 urllib3==1.24.2 kubeflow-kale==0.5.0"

RUN apt-get update && export $DEBIAN_FRONTEND && \
    echo $TIMEZONE > /etc/timezone && apt-get install -y tzdata

RUN apt-get update && apt-get install -y $DEB_BUILD_DEPS $DEB_PACKAGES && pip3 install --upgrade pip

RUN curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash - && apt-get install -y nodejs

RUN pip3 install jupyter "jupyterlab<2.0.0" --upgrade

RUN jupyter notebook --generate-config && sed -i "s/#c.NotebookApp.password = .*/c.NotebookApp.password = u'sha1:115e429a919f:21911277af52f3e7a8b59380804140d9ef3e2380'/" /root/.jupyter/jupyter_notebook_config.py

RUN pip3 install $PIP_PACKAGES_COMMON --upgrade
RUN pip3 install $PIP_PACKAGES_TORCH --upgrade
RUN pip3 install $PIP_PACKAGES_KALE --upgrade

RUN jupyter labextension install kubeflow-kale-launcher

VOLUME ["/shared_volume"]

ENV NB_PREFIX torchurl

ENTRYPOINT ["/usr/local/bin/jupyter", "lab", "--ip=0.0.0.0", "--no-browser", "--allow-root", "--LabApp.allow_origin='*'", "--LabApp.base_url=torchurl"]

```


```
REPO_URL=sipecam/torch-kale
BUILD_DIR=$(pwd)
TORCH_AND_KALE_VERSION=1.4.0_0.5.0
docker build $BUILD_DIR --force-rm -t $REPO_URL:$TORCH_AND_KALE_VERSION
```

**Is not needed entry nvidia.com/gpu when using kale to launch to kubeflow pipelines. Although it's needed when testing the notebook:**

`jupyterlab-torch-1-4-0-0-5-0-hostpath-pv.yaml`

```
kind: Deployment
apiVersion: apps/v1
metadata:
        name: jupyterlab-torch-1-4-0-0-5-0
        namespace: kubeflow
spec:
        replicas: 1 # This is the number of containers that are going to be deployed.
        selector:
                matchLabels:
                        app: jupyterlab-torch-1-4-0-0-5-0-app
        template:
                metadata:
                        labels:
                                app: jupyterlab-torch-1-4-0-0-5-0-app
                spec:
                        containers: 
                        - name: jupyterlab-torch-1-4-0-0-5-0
                          imagePullPolicy: Always
                          image: sipecam/torch-kale:1.4.0_0.5.0
                          ports:
                                  - containerPort: 9999
                          env:
                                  - name: mount_point
                                    value: /shared_volume
                                  - name: LC_ALL
                                    value: C.UTF-8
                                  - name: LANG
                                    value: C.UTF-8
                          resources:
                                  requests:
                                          cpu: ".5" # This value depends of type of AWS instance chosen
                                          memory: 15Gi # This value depends of type of AWS instance chosen
                                  limits:
                                          cpu: ".5" # This value depends of type of AWS instance chosen
                                          memory: 15Gi # This value depends of type of AWS instance chosen
                                          nvidia.com/gpu: 1
                          volumeMounts:
                                  - name: hostpath-pv
                                    mountPath: "/shared_volume"
                        volumes:
                        - name: hostpath-pv
                          persistentVolumeClaim:
                                  claimName: hostpath-pvc 

```

`kubectl create -f jupyterlab-torch-1-4-0-0-5-0-hostpath-pv.yaml`

**And go to:**

```
http://<ipv4 of ec2 instance>:30001/torchurl
```

# Note:

If disk is full which could happen if a kubeflow pipeline will be uploaded from kale:

```
HTTP response headers: HTTPHeaderDict({'Date': 'Tue, 01 Sep 2020 18:12:22 GMT', 'Content-Length': '487', 'Content-Type': 'text/plain; charset=utf-8'})
HTTP response body: {"error_message":"Error creating pipeline: Create pipeline failed: InternalServerError: Failed to store b2fa5a70-cab4-4c89-8784-9c0cb118d1b4: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed.","error_details":"Error creating pipeline: Create pipeline failed: InternalServerError: Failed to store b2fa5a70-cab4-4c89-8784-9c0cb118d1b4: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed."}
```

Delete kubeflow (MAD-Mex and geonode deployments)

To free space:

```
minikube stop
minikube delete
```

Check:

```
docker system df
docker system prune --all --volumes
rm -r /root/.minikube/*
rm -r /root/.kube/*
rm -r /opt/kf-test
```

Start again (being in root dir):

```
CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_k8s_istio.v1.0.2.yaml"
source ~/.profile
chmod gou+wrx -R /opt/
mkdir -p ${KF_DIR}
#minikube start
cd /root && minikube start --driver=none
#kubeflow start
cd ${KF_DIR}

wget $CONFIG_URI
wget https://codeload.github.com/kubeflow/manifests/tar.gz/v1.0.2 -O v1.0.2.tar.gz

```

change kfctl_k8s_istio.v1.0.2.yaml at the end uri:

```
#this section:
  repos:
  - name: manifests
    uri: https://github.com/kubeflow/manifests/archive/v1.0.2.tar.gz
#for: 
  repos:
  - name: manifests
    uri: file:///opt/kf-test/v1.0.2.tar.gz
```

Then:

```
kfctl apply -V -f kfctl_k8s_istio.v1.0.2.yaml
```



ref: https://github.com/aws-samples/eks-workshop/issues/639

If there's problems with geonode (because stack of docker-compose was deleted, clone again repo and deploy geonode)