# Intro to Docker

### Resources
Interactive docker tutorials and guides - https://training.play-with-docker.com/

### Installation

1. Go to the following site https://docs.docker.com/install/
2. Download and follow guides for your OS
3. Test with the following command
```
docker run hello-world
```

In [4]:
!docker run hello-world

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world

[1BDigest: sha256:9572f7cdcee8591948c2963463447a53466950b3fc15a247fcad1917ca215a2f
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 h

----
## Overview

1. __What is docker?__

    Docker is a tool to build, deploy, and manage"microservices", self-contained services that include all the necessary dependencies such as OS, libraries, etc. and share resources from the main server it is running on.

2. __Why use docker?__

    Docker is a useful tool to build, test, and deploy services. It enables developers to not have to focus on infrastructure or library dependencies and removes a lot of overhead. It is useful for things as simple as scripting or as complex as distributed services. For example, the ability to run linux applications on a Windows machine. This can also typically be done with a VM, but the ease of use and smaller overhead of managing docker containers makes docker a great portable/low cost option.
    
    Docker is also great for scaling because it shares resources with the underlying machine that it is running on. Machine clusters can be run to essentially create a pool of resources that docker can pull from allowing for ease of scalability. In my opinion, this is the main reason why Docker is really great. There is no need to provision extra VMs in order to handle load. One can just spin up more docker containers and increase the overall pool of resources if they are constrained. Docker is elastic so it will grow and shrink its needs depending on capacity and usage.
    
    It is open source and widely used which means a lot of contributors have put up useful docker images and code. It is a large community and most IT companies use it now to build and develop services. This essentially means that other people have already done most of the work for you, so we can simply juse use their stuff to our advantage
    
3. __How to use docker?__

    Lets run through the notebook!

----
## Basic commands

Docs - https://docs.docker.com/engine/reference/commandline/docker/

In [41]:
# List built images
!docker images
!echo
# List all containers. Remove the -a tag to only view running containers
!docker container ls -a

REPOSITORY                                       TAG                 IMAGE ID            CREATED              SIZE
demo                                             latest              ebeae554916b        About a minute ago   943MB
<none>                                           <none>              dbb8fd6c75f9        4 minutes ago        943MB
<none>                                           <none>              a601ef92d4b9        5 minutes ago        943MB
procedure_search                                 pytest              92aafc43e9f9        47 hours ago         1.02GB
procedure_search                                 latest              08b729f7f074        47 hours ago         1.02GB
test                                             latest              1cae8c8cf52d        47 hours ago         1.02GB
bert-server                                      latest              09ff3bf77650        2 days ago           4.71GB
python                                           3.7                 

In [13]:
# Run image as container
!docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



In [18]:
# Start docker container with name and run command. If starting without name, a default name will be given
!docker run --name eugene_os centos echo "hello world"

hello world


In [19]:
# Start existing docker container
!docker start eugene_os

eugene_os


----
## Dockerfile

- __Docker images__ are the blueprints for Docker containers

- __Docker containers__ are the services that are run

- __Docker images__ are created using Dockerfiles

- __Docker-compose__ is a cli tool to orchestrate and service multiple interactive docker containers

In [44]:
# List dockerfile 
!cat demo/Dockerfile

# Start from a base image. In this case, we are using a docker image with python 3 installed
FROM python:3

# Copy any directories from local storage to the docker image
ADD . /demo

# Change working directory. If ran in -it mode, we would begin in /demo
WORKDIR /demo

# Run any commands necessary for setup. This is usually for installation or setup
RUN pip install -q -r requirements.txt

# Expose port to be accessed
EXPOSE 5000

# Last line of a dockerfile. Runs command. Usually used to run a daemon or webserver
CMD ["python", "app.py"]


In [36]:
# Build docker image from Dockerfile with tag
!docker build -t demo -f demo/Dockerfile .

Sending build context to Docker daemon  30.21kB
Step 1/6 : FROM python:3
 ---> efdecc2e377a
Step 2/6 : ADD src /demo
 ---> Using cache
 ---> 10f1388f02e1
Step 3/6 : WORKDIR /demo
 ---> Using cache
 ---> 985deeabed03
Step 4/6 : RUN pip install -q -r requirements.txt
 ---> Using cache
 ---> 489b751bd51d
Step 5/6 : EXPOSE 5000
 ---> Using cache
 ---> a9cf677ae0a5
Step 6/6 : CMD ["python", "app.py"]
 ---> Running in 01d674a19afb
Removing intermediate container 01d674a19afb
 ---> ebeae554916b
Successfully built ebeae554916b
Successfully tagged demo:latest


In [38]:
# Run service and map port 5000 to be able to access on our local network
!docker run -p 5000:5000 -d --name demo demo
!docker ps
!curl localhost:5000/healthcheck

1a4961442428fde1638c81ecf3c49c0e71825a426cd870e7672915fd6edc15ab


In [46]:
# Shut down service and remove container and image
!docker rm -f demo
!docker rmi demo

Error: No such container: demo
Untagged: demo:latest
Deleted: sha256:ebeae554916bf218cc4c5731217e4e084bc9c3bdd21dfe30ca971b5d7fc99564


----
## Docker-compose

docker-compose is a cli tool that can be used to build and deploy multiple integrated services with a single command. It is handy for building simple services or creating complex architectures

### Simple example


In [45]:
# List docker-compose contents
!cat docker-compose.yml

---
version: '3'
services:
  demo:
    build : 
      context: demo
    image: demo:latest
    container_name: demo
    ports:
     - "5000:5000"


In [51]:
# Build and run same service as before
!docker-compose up -d

Creating network "intro_to_docker_default" with the default driver
Building demo
Step 1/6 : FROM python:3
 ---> efdecc2e377a
Step 2/6 : ADD . /demo
 ---> 7a70bf03dfc9
Step 3/6 : WORKDIR /demo
 ---> Running in fd46482e61c8
Removing intermediate container fd46482e61c8
 ---> db5dc44b1dcb
Step 4/6 : RUN pip install -q -r requirements.txt
 ---> Running in 07b327b7190a
Removing intermediate container 07b327b7190a
 ---> 9721f1072ce5
Step 5/6 : EXPOSE 5000
 ---> Running in 4e83cc9429be
Removing intermediate container 4e83cc9429be
 ---> 2b2ce206d470
Step 6/6 : CMD ["python", "app.py"]
 ---> Running in db4012465f7d
Removing intermediate container db4012465f7d
 ---> 63605e706d07
Successfully built 63605e706d07
Successfully tagged demo:latest
Creating demo ... 
[1Bting demo ... [32mdone[0m

In [52]:
# Shut down and remove service
!docker-compose down

Stopping demo ... 
[1Bping demo ... [32mdone[0mRemoving demo ... 
[1BRemoving network intro_to_docker_default


## Complex Example

For this example, we will start up a hadoop cluster that is capable of using hive.

Using this repo

https://github.com/big-data-europe/docker-hive

In [55]:
!git clone git@github.com:big-data-europe/docker-hive.git

Cloning into 'docker-hive'...
remote: Enumerating objects: 127, done.[K
remote: Total 127 (delta 0), reused 0 (delta 0), pack-reused 127[K
Receiving objects: 100% (127/127), 30.70 KiB | 10.23 MiB/s, done.
Resolving deltas: 100% (66/66), done.


In [56]:
!docker-compose -f docker-hive/docker-compose.yml up -d

Creating network "docker-hive_default" with the default driver
Creating volume "docker-hive_namenode" with default driver
Creating volume "docker-hive_datanode" with default driver
Pulling namenode (bde2020/hadoop-namenode:2.0.0-hadoop2.7.4-java8)...
2.0.0-hadoop2.7.4-java8: Pulling from bde2020/hadoop-namenode

[1Bf87b52c1: Pulling fs layer
[1Bb563ad5e: Pulling fs layer
[1B1ffa5626: Pulling fs layer
[1B0e4b5de7: Pulling fs layer
[1B35d29f88: Pulling fs layer
[1Bf205e109: Pulling fs layer
[1B864ab017: Pulling fs layer
[1B9b117fd9: Pulling fs layer
[1Bf32bfdd6: Pulling fs layer
[1B04c9f21e: Pulling fs layer
[1B93d9ab2a: Pulling fs layer
[2B93d9ab2a: Waiting fs layer
[2B5a1f3ed5: Waiting fs layer
[1B0e2fb479: Pulling fs layer
[1B04cc05e4: Pulling fs layer
Digest: sha256:54a9482c51d4e701e530f15ef2e01ca2d3a15545760c42ea6ad0e65e8196c335    355B/355B1kBB[14A[2K[16A[2K[14A[2K[16A[2K[14A[2K[16A[2K[13A[2K[16A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K

In [57]:
# Show running containers
!docker ps

CONTAINER ID        IMAGE                                             COMMAND                  CREATED              STATUS                        PORTS                                          NAMES
b0291d9f7e1f        bde2020/hive:2.3.2-postgresql-metastore           "entrypoint.sh /opt/…"   About a minute ago   Up About a minute             10000/tcp, 0.0.0.0:9083->9083/tcp, 10002/tcp   docker-hive_hive-metastore_1
8037e6d85e1f        bde2020/hadoop-namenode:2.0.0-hadoop2.7.4-java8   "/entrypoint.sh /run…"   About a minute ago   Up About a minute (healthy)   0.0.0.0:50070->50070/tcp                       docker-hive_namenode_1
9ba4fa882931        bde2020/hadoop-datanode:2.0.0-hadoop2.7.4-java8   "/entrypoint.sh /run…"   About a minute ago   Up About a minute (healthy)   0.0.0.0:50075->50075/tcp                       docker-hive_datanode_1
4a8b3da18ba2        bde2020/hive:2.3.2-postgresql-metastore           "entrypoint.sh /bin/…"   About a minute ago   Up About a minute             0

## Testing
Load data into Hive:
```
  $ docker-compose exec hive-server bash
  # /opt/hive/bin/beeline -u jdbc:hive2://localhost:10000
  > CREATE TABLE pokes (foo INT, bar STRING);
  > LOAD DATA LOCAL INPATH '/opt/hive/examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
```

Then query it from PrestoDB. You can get [presto.jar](https://prestosql.io/docs/current/installation/cli.html) from PrestoDB website:
```
  $ wget https://repo1.maven.org/maven2/io/prestosql/presto-cli/308/presto-cli-308-executable.jar
  $ mv presto-cli-308-executable.jar presto.jar
  $ chmod +x presto.jar
  $ ./presto.jar --server localhost:8080 --catalog hive --schema default
  presto> select * from pokes;
```

If you want to run multiple nodes, you can use the following docker-compose

https://github.com/big-data-europe/docker-hadoop/blob/e62d698b37905fddffb9e6fe9a83a32515ff0674/docker-compose-3dn.yml

----
## Next steps

Interactive training made by Docker developers

https://training.play-with-docker.com/

Kubernetes is a orchestration framework that leverages cloud resources

Kubernetes - https://kubernetes.io/docs/home/

Dockerhub is an open source docker repository where contributors can upload their Docker images for others to use

Dockerhub - https://hub.docker.com/

## Some of my favorite docker containers

``` 
# Runs jupyter notebook on localhost:8888 with preinstalled data science libraries
docker run -p 8888:8888 jupyter/datascience-notebook
```

```
#GPU Power baby!
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 rapidsai/rapidsai:cuda10.0-runtime-ubuntu18.04
```

BERT as service web app to run some bert embedding models. (Do cool Dan stuff)

https://github.com/hanxiao/bert-as-service