# 3. Neural Style Transfer on AKS

We've tested locally in the previous notebook. Now use an AKS cluster and test that our neural style transfer script still works as expected when running across multiple nodes in parallel on AKS.

1. Build AKS Docker Image
2. Test style transfer on Docker locally
3. Push docker image to Docker hub
4. Provision AKS cluster 
5. Test style transfer on parallel on AKS cluster

---

### Import packages and load .env

In [None]:
from dotenv import set_key, get_key, find_dotenv, load_dotenv
from pathlib import Path
import json
import os
%load_ext dotenv
%dotenv

In [None]:
env_path = find_dotenv(raise_error_if_not_found=True)
load_dotenv(env_path)

### Define Variables

In [None]:
docker_login = "<docker-login>"
aks_image_repo = "batchscoringdl_aks_app"

In [None]:
aks_cluster = "<your-aks-cluster>"

### Build AKS Docker Image

In [None]:
%%writefile aks/requirements.txt
azure==4.0.0
torch==0.4.1
torchvision==0.2.1

In [None]:
%%writefile aks/Dockerfile

FROM nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04

RUN echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list

RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential \
        ca-certificates \
        cmake \
        curl \
        git \
        nginx \
        supervisor \
        wget && \
        rm -rf /var/lib/apt/lists/*

ENV PYTHON_VERSION=3.6
RUN curl -o ~/miniconda.sh -O  https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh  && \
    chmod +x ~/miniconda.sh && \
    ~/miniconda.sh -b -p /opt/conda && \
    rm ~/miniconda.sh && \
    /opt/conda/bin/conda create -y --name py$PYTHON_VERSION python=$PYTHON_VERSION && \
    /opt/conda/bin/conda clean -ya
ENV PATH /opt/conda/envs/py$PYTHON_VERSION/bin:$PATH
ENV LD_LIBRARY_PATH /opt/conda/envs/py$PYTHON_VERSION/lib:/usr/local/cuda/lib64/:$LD_LIBRARY_PATH
ENV PYTHONPATH /code/:$PYTHONPATH

RUN mkdir /app
WORKDIR /app
ADD process_images_from_queue.py /app
ADD style_transfer.py /app
ADD main.py /app
ADD util.py /app
ADD requirements.txt /app

RUN pip install --no-cache-dir -r requirements.txt

CMD ["python", "main.py"]

In [None]:
!sudo docker build -t $aks_image_repo aks

### Test Docker image locally (before deploying on AKS)

add images to queue

In [None]:
input_frames_dir = "orangutan_frames_test"
docker_output_frames_dir = "orangutan_frames_docker_test_processed"

In [None]:
!python aci/add_images_to_queue.py \
    --input-dir $input_frames_dir \
    --output-dir $docker_output_frames_dir \
    --style "mosaic" \
    --queue-limit 10

In [None]:
!sed -e "s/=\"/=/g" -e "s/\"$//g" .env > .env.docker

In [None]:
!cat .env.docker

run docker locally

In [None]:
!sudo docker run --runtime=nvidia --env-file ".env.docker" $aks_image_repo

Check that queue is now empty

In [None]:
!az servicebus queue show \
    --name {get_key(env_path, "SB_QUEUE")} \
    --namespace-name {get_key(env_path, "SB_NAMESPACE")} \
    --resource-group {get_key(env_path, "RESOURCE_GROUP")} \
    --query 'countDetails.activeMessageCount'

tag and push docker image

In [None]:
!sudo docker tag $aks_image_repo $docker_login/$aks_image_repo

In [None]:
!sudo docker push $docker_login/$aks_image_repo

### Provision AKS cluster

This step may take a while... Please note that this step creates another resource group in your subscription containing the actual compute of the AKS cluster.

In [None]:
node_count = 10

In [None]:
!az aks create \
    --resource-group {get_key(env_path, "RESOURCE_GROUP")} \
    --name $aks_cluster \
    --node-count $node_count \
    --node-vm-size "Standard_NC6s_v2" \
    --generate-ssh-keys

Install Kubectl - this tool is used to manage the kubernetes cluster.

In [None]:
!sudo az aks install-cli

In [None]:
!az aks get-credentials \
    --resource-group {get_key(env_path, 'RESOURCE_GROUP')}\
    --name $aks_cluster

In [None]:
!kubectl get nodes

In [None]:
!kubectl get pods

### Deploy docker image to AKS cluster

To deploy our neural style transfer script into our AKS cluster, we need to define what the deployment will look like:

In [None]:
aks_deployment_json = {
    "apiVersion": "apps/v1beta1",
    "kind": "Deployment",
    "metadata": {
        "name": "aks-app", 
        "labels": {
            "purpose": "dequeue_messages_and_apply_style_transfer"
        }
    },
    "spec": {
        "replicas": node_count,
        "template": {
            "metadata": {
                "labels": {
                    "app": "aks-app"
                }
            },
            "spec": {
                "containers": [
                    {
                        "name": "aks-app",
                        "image": "{}/{}:latest".format(docker_login, aks_image_repo),
                        "volumeMounts": [
                            {
                                "mountPath": "/usr/local/nvidia", 
                                "name": "nvidia"
                            }
                        ],
                        "resources": {
                            "requests": {
                                "alpha.kubernetes.io/nvidia-gpu": 1
                            },
                            "limits": {
                                "alpha.kubernetes.io/nvidia-gpu": 1
                            },
                        },
                        "ports": [{
                            "containerPort": 433
                        }],
                        "env": [
                            {
                                "name": "LB_LIBRARY_PATH",
                                "value": "$LD_LIBRARY_PATH:/usr/local/nvidia/lib64:/opt/conda/envs/py3.6/lib",
                            },
                            {
                                "name": "DP_DISABLE_HEALTHCHECKS", 
                                "value": "xids"
                            },
                            {
                                "name": "STORAGE_MODEL_DIR",
                                "value": get_key(env_path, "STORAGE_MODEL_DIR")
                            },
                            {
                                "name": "SUBSCRIPTION_ID",
                                "value": get_key(env_path, "SUBSCRIPTION_ID")
                            },
                            {
                                "name": "RESOURCE_GROUP",
                                "value": get_key(env_path, "RESOURCE_GROUP")
                            },
                            {
                                "name": "REGION",
                                "value": get_key(env_path, "REGION")
                            },
                            {
                                "name": "STORAGE_ACCOUNT_NAME", 
                                "value": get_key(env_path, "STORAGE_ACCOUNT_NAME")
                            },
                            {
                                "name": "STORAGE_ACCOUNT_KEY",
                                "value": get_key(env_path, "STORAGE_ACCOUNT_KEY")
                            },
                            {
                                "name": "STORAGE_CONTAINER_NAME",
                                "value": get_key(env_path, "STORAGE_CONTAINER_NAME")
                            },
                            {
                                "name": "SB_SHARED_ACCESS_KEY_NAME",
                                "value": get_key(env_path, "SB_SHARED_ACCESS_KEY_NAME")
                            },
                            {
                                "name": "SB_SHARED_ACCESS_KEY_VALUE",
                                "value": get_key(env_path, "SB_SHARED_ACCESS_KEY_VALUE")
                            },
                            {
                                "name": "SB_NAMESPACE",
                                "value": get_key(env_path, "SB_NAMESPACE")
                            },
                            {
                                "name": "SB_QUEUE", 
                                "value": get_key(env_path, "SB_QUEUE")
                            },
                        ],
                    }
                ],
                "volumes": [
                    {
                        "name": "nvidia", 
                        "hostPath": {
                            "path": "/usr/local/nvidia"
                        }
                    }
                ],
            },
        },
    },
}

In [None]:
with open("aks_deployment.json", "w") as outfile:
    json.dump(aks_deployment_json, outfile, indent=4, sort_keys=True)
    outfile.write('\n\n')

### Run style transfer on AKS

In [None]:
aks_output_frames_dir = "orangutan_frames_aks_test_processed"

In [None]:
!python aci/add_images_to_queue.py \
    --input-dir $input_frames_dir \
    --output-dir $aks_output_frames_dir \
    --style "mosaic" \
    --queue-limit 12

In [None]:
!kubectl create -f aks_deployment.json

In [None]:
!kubectl get pods

In [None]:
pod_json = !kubectl get pods -o json
pod_dict = json.loads(''.join(pod_json))
!kubectl logs {pod_dict['items'][0]['metadata']['name']}

In [None]:
!az servicebus queue show \
    --name {get_key(env_path, "SB_QUEUE")} \
    --namespace-name {get_key(env_path, "SB_NAMESPACE")} \
    --resource-group {get_key(env_path, "RESOURCE_GROUP")} \
    --query 'countDetails.activeMessageCount'

### Monitor in kubernetes dashboard

In [None]:
!kubectl create -f kube_dashboard_access.yaml

In [None]:
!az aks browse -n $aks_cluster -g {get_key(env_path, "RESOURCE_GROUP")}

### (Optional) Additional commands for AKS

Scale your AKS cluster

In [None]:
!az aks scale --name $aks_cluster --resource-group {get_key(env_path, "RESOURCE_GROUP")} --node-count 5

Scale your deployment

In [None]:
! kubectl scale deployment.apps/aks-app --replicas=10

---

### Conclusion

Since we'll be using these settings throughout this tutorial, we'll also same them to the `.env` file.

In [None]:
set_key(env_path, "DOCKER_LOGIN", docker_login)
set_key(env_path, "AKS_IMAGE", aks_image_repo)
set_key(env_path, "AKS_CLUSTER", aks_cluster)

Check that our `.env` file looks correct.

In [None]:
!cat .env

Continue to the next [notebook](/notebooks/04_deploy_logic_app.ipynb).