#### <center>Intermediate Python and Software Enginnering</center>


## <center>Section 05 - Containers 2 - Exercise Solutions</center>


### <center>Innovation Scholars Programme</center>
### <center>King's College London, Medical Research Council and UKRI <center>

### 02 Exercises

This exercise will cover GPU computation with docker images.

### Installation

We need to install the Nvidia Container Toolkit, instructions are [here](https://github.com/NVIDIA/nvidia-docker).

*NB: At some point in the past on Ubuntu 16.04 I had to follow [these instructions](https://cnvrg.io/how-to-setup-docker-and-nvidia-docker-2-0-on-ubuntu-18-04) for getting the correct version of Docker and the Nvidia runtime. This was with a version of Docker before 19.03, so check your version with `docker version` to make sure you have at least that version otherwise those instructions might be necessary.*


### Exercise 1:

We'll write a simple Pytorch program to produce some information about the environment they're being run under to see what a CUDA-aware container looks like.

**Step 1:** Create a directory called `exercise02` and copy the following into it as `pytorch_test.py`:

In [1]:
%%bash
mkdir -p exercise02

In [2]:
%%writefile exercise02/pytorch_test.py

import os
import torch
import timeit

print(f'Host name: {os.environ["HOSTNAME"]}')
print(f'Pytorch version: {torch.__version__}')

for d in range(torch.cuda.device_count()):
    p = torch.cuda.get_device_properties(d)
    print(f"Device {d} is {p.name} with {p.total_memory/2**30}GiB of memory")
    
test=torch.rand(10,1,64,64)
conv=torch.nn.Conv2d(1,1,3,1,1)

result=timeit.timeit("conv(test)",number=1000,globals=locals())
print(f'CPU time: {result}')

test=test.to('cuda:0')
conv=conv.to('cuda:0')

result=timeit.timeit("conv(test)",number=1000,globals=locals())
print(f'GPU time: {result}')

Writing exercise02/pytorch_test.py


**Step 2:** Now define the Dockerfile based off one of the CUDA [images of your choice](https://hub.docker.com/r/nvidia/cuda/tags). You'll need to install Pytorch so refer to the notes for how to do that. The Dockerfile should copy over the test file we just created and run it as the command.

In [21]:
%%writefile exercise02/Dockerfile

FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04

RUN apt update --fix-missing
RUN apt install -y python3-pip

RUN pip3 install torch==1.5.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
    
COPY pytorch_test.py /

RUN adduser dockeruser --shell /bin/bash
USER dockeruser

CMD ["python3", "pytorch_test.py"]

Overwriting exercise02/Dockerfile


**Step 3:** build the image tagging it as `pytorch-test`:

In [22]:
!docker build exercise02 -t pytorch-test

Sending build context to Docker daemon  3.584kB
Step 1/8 : FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
 ---> e135227729c4
Step 2/8 : RUN apt update --fix-missing
 ---> Using cache
 ---> d25e1520b14e
Step 3/8 : RUN apt install -y python3-pip
 ---> Using cache
 ---> 8cf891a12497
Step 4/8 : RUN pip3 install torch==1.5.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
 ---> Using cache
 ---> 3a652ee32b8f
Step 5/8 : COPY pytorch_test.py /
 ---> Using cache
 ---> 92a2710c7629
Step 6/8 : RUN adduser dockeruser --shell /bin/bash
 ---> Running in babe80f0451e
Adding user `dockeruser' ...
Adding new group `dockeruser' (1000) ...
Adding new user `dockeruser' (1000) with group `dockeruser' ...
Creating home directory `/home/dockeruser' ...
Copying files from `/etc/skel' ...
[91mEnter new UNIX password: Retype new UNIX password: passwd: Authentication token manipulation error
passwd: password unchanged
[0m[91mUse of uninitialized value $answer in chop at /usr/sbin/adduser line 5

**Step 4:** Run the image with and without the GPU flag (refer to the notes) to see how it works:

In [23]:
!docker run --gpus all --rm -u $(id -u):$(id -g) pytorch-test

Host name: d7d074a74c25
Pytorch version: 1.5.0+cu101
Device 0 is TITAN X (Pascal) with 11.91021728515625GiB of memory
Device 1 is GeForce GTX 980 with 3.9422607421875GiB of memory
CPU time: 0.33359767915681005
GPU time: 0.07265682704746723


**Step 5:** So far we've been using `CMD` to define the command to run. A related Dockerfile command is `ENTRYPOINT` which provides a command to run but is not replaced when arguments are used with `docker run`. With `CMD` any extra arguments will replace what was provided, but with `ENTRYPOINT` they are appended to the end of the command to act as additional arguments. Both can be used at once, `ENTRYPOINT` defining the command and `CMD` giving default arguments.

Let's modify our Dockerfile to run the given script as the default behaviour but allow any other arguments to be used as well. We can define a general purpose container this way which can used to execute any script we provided through a volume directory.

In [24]:
%%writefile exercise02/Dockerfile

FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04

RUN apt update --fix-missing
RUN apt install -y python3-pip

RUN pip3 install torch==1.5.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
    
COPY pytorch_test.py /

RUN adduser dockeruser --shell /bin/bash
USER dockeruser

ENTRYPOINT ["python3"]
CMD ["pytorch_test.py"]

Overwriting exercise02/Dockerfile


### Exercise 2:
Make an account on Docker Hub and upload your new image.

### Exercise 3:

Docker Swarm allows us to run instances of Docker images on worker nodes commanded from a manager node. Typically these nodes would be separate server hosts and any incoming requests are distributed amongst them to balance load. Large scale web services are architected this way since single servers wouldn't be able to respond to the amount of traffice they'd receive. 

We don't have spare server racks kicking around to demonstrate this on but we can see what it looks like with one manager node at least.

**Step 1:** Create a directory called `hello_swarm` and copy the following into it as `hello_host.py`:

In [None]:
!mkdir -p hello_swarm

In [11]:
%%writefile hello_swarm/hello_host.py

import os
from flask import Flask

app = Flask(__name__)

@app.route('/')
def hello_world():
    return f"Hello from host {os.environ['HOSTNAME']}"


Overwriting hello_swarm/hello_host.py


**Step 2:** Define a Dockerfile like those used for Flask applications already but with the `FLASK_APP` value set to `hello_host.py`.

In [None]:
%%writefile hello_swarm/Dockerfile

from python:3.7
ENV FLASK_APP hello_host.py
COPY hello_host.py /
RUN pip install flask

EXPOSE 5000

CMD ["flask","run","--host=0.0.0.0"]


**Step 3:**
Build the image tagging it as `hello-swarm`.

In [13]:
!docker build hello_swarm -t hello-swarm

Sending build context to Docker daemon  3.072kB
Step 1/6 : from python
 ---> 659f826fabf4
Step 2/6 : ENV FLASK_APP hello_host.py
 ---> Running in f0a9e1742efc
Removing intermediate container f0a9e1742efc
 ---> 37a8592160d9
Step 3/6 : COPY hello_host.py /
 ---> ca856cd544cd
Step 4/6 : RUN pip install flask
 ---> Running in de82183df348
Collecting flask
  Downloading Flask-1.1.2-py2.py3-none-any.whl (94 kB)
Collecting click>=5.1
  Downloading click-7.1.2-py2.py3-none-any.whl (82 kB)
Collecting itsdangerous>=0.24
  Downloading itsdangerous-1.1.0-py2.py3-none-any.whl (16 kB)
Collecting Jinja2>=2.10.1
  Downloading Jinja2-2.11.2-py2.py3-none-any.whl (125 kB)
Collecting Werkzeug>=0.15
  Downloading Werkzeug-1.0.1-py2.py3-none-any.whl (298 kB)
Collecting MarkupSafe>=0.23
  Downloading MarkupSafe-1.1.1-cp38-cp38-manylinux1_x86_64.whl (32 kB)
Installing collected packages: click, itsdangerous, MarkupSafe, Jinja2, Werkzeug, flask
Successfully installed Jinja2-2.11.2 MarkupSafe-1.1.1 Werkzeug-1.0

**Step 4:**
Initialize Swarm with with the command `docker swarm init`.

In [15]:
!docker swarm init

Swarm initialized: current node (st8r37wmqe95npg4robbwmktq) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-0gwlcfgnztev9we5mmkp0ujp9llm0womddzosb45b0drmj9152-1j0zipzqtq3mkyrsd9kwwmdlq 10.246.179.34:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.



**Step 5:**
The command for join it gives you could be run on other hosts to add them as worker nodes, we could create a VM with Virtualbox or use Docker-machine to do this for us, but for now we can run a service with multiple instances on your single node.

We want to create a service, specifying our image as the one to use (or multiple images with Docker Compose) with other flags to specify name, number of replicas to use (ie. how many copies of the container to run), and port routing. Run the following:

```sh
docker service create --name hello_swarm --publish published=5000,target=5000 --replicas 2 hello_swarm
```

In [16]:
!docker service create --name hello_swarm --publish published=5000,target=5000 --replicas 2 hello_swarm

image hello_swarm:latest could not be accessed on a registry to record
its digest. Each node will access hello_swarm:latest independently,
possibly leading to different nodes running different
versions of the image.

hw5tpukdg4anx9n44y4igr4ra

[1Ball progress: 0 out of 2 tasks 
[1B   K
[3Ball progress: 2 out of 2 tasks [2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[1A[2K[2A[2K[2A[2K[2A[2K[1A[2K
[1Bfy: Service converged to verify that tasks are stable... [1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K

**Step 6:**
We want to now query our running application through the IP address mentioned in the `docker swarm join` command suggested when you ran `docker swarm init`. This is the external interface IP address Swarm is listening to for incoming requests. You can open a browser and navigate to that IP and port 5000 or use `curl` if you can't access the machine directly. What you expect to see is the server responding with 2 different host names, one for each replica.

In [22]:
!curl 10.246.179.34:5000

Hello from host 9ecf76a3f789

In [23]:
!curl 10.246.179.34:5000

Hello from host eab61d930c73

**Step 7:** Inspect what services are running with `docker service ls`. Our `hello-swarm` service is running, we don't need it now so kill it with `docker service rm hello_swarm`.

In [24]:
!docker service ls

ID                  NAME                MODE                REPLICAS            IMAGE                PORTS
hw5tpukdg4an        hello_swarm         replicated          2/2                 hello_swarm:latest   *:5000->5000/tcp


In [25]:
!docker service rm hello_swarm

hello_swarm


So if we have an application (ie. a web server running in a container serving pages and other media to clients), how do they get data, communicate with each other if necessary, communicate with databases and synchronize their contents, and otherwise behave like a single server? One way is with volumes which can be created through Docker so that containers can access and share file data. Other running services can be set to serve multiple workers, ie. having a container running the database multiple instance of a web server access. 