# Docker Lunch and Learn

## Installation

### Installing and Configuring Docker

#### Install on Mac 

Follow this link for all contemporary (~> 2016) Mac hardware: https://hub.docker.com/editions/community/docker-ce-desktop-mac


#### Install on AWS

Installing Docker on an AWS instance is a trivial process. It consists of running an install script that can be obtained from Docker and then adding your user to the Docker group. Below, we run these two commands. First, we download the install script from https://get.docker.com, then immediately pipe the script into the shell (`| sh`).

There are new projects called [Docker on AWS](https://docs.docker.com/docker-for-aws/) and [Docker for Azure](https://docs.docker.com/docker-for-azure/) that do this config for you. 



$\square$ **Note:** It is generally considered to be a significant security vulnerability to execute arbitrary code obtained from an unknown, or untrusted source. For our purposes, the source (https://get.docker/com) is considered trustworthy, we are using SSL to perform the curl. It may make the security minded more comfortable to `curl` the script to a local file, inspect, and then run it. **In practice the method below is the method I use to install Docker.**

#### Important Note re: Installation on Linux

$\square$ **Note:** Do not install via `apt` or `yum` directly. Follow the Docker instructions for your system. https://docs.docker.com/install/

<img src="https://www.evernote.com/l/AAFDeHFBjO9P5qVykXyJZ22ycdQJArLe1fUB/image.png" width=600px>

#### Install Docker via a Shell Script

<include type="listing" label="install-docker">
    
```
$ curl -sSL https://get.docker.com | sh
# Executing docker install script, commit: 1d31602
+ sudo -E sh -c apt-get update -qq >/dev/null
...

Client:
 Version:   18.02.0-ce
 API version:   1.36
 Go version:    go1.9.3
 Git commit:    fc4de44
 Built: Wed Feb  7 21:16:33 2018
 OS/Arch:   linux/amd64
 Experimental:  false
 Orchestrator:  swarm

Server:
 Engine:
  Version:  18.02.0-ce
  API version:  1.36 (minimum version 1.12)
  Go version:   go1.9.3
  Git commit:   fc4de44
  Built:    Wed Feb  7 21:15:05 2018
  OS/Arch:  linux/amd64
  Experimental: false

...

```

</include>

#### Add the Ubuntu User to the Docker Group

When the script completes there is one last thing to be done. Add the `ubuntu` user to the `docker` group. By default, the command line docker client will require sudo access in order to issue commands to the docker daemon. You can add the `ubuntu` user to the `docker` group in order to allow the `ubuntu` user to issue commands to docker without sudo.

<include type="listing" label="add-to-docker-group">
    
```
$ sudo usermod -aG docker ubuntu
```

</include>

#### Disconnect and Reconnect to Update Group

Finally, in order to force the changes to take effect, you should disconnect and reconnect to their remote system. You can achieve this by typing `exit` or `ctrl-d` and then reconnecting via ssh to your EC2 instance.

### Test Docker Installation

Minimally, using Docker to run your code consists of the following:

1. Pull a precompiled or build a new **image** from a `Dockerfile`.
2. Run the image as a new **container**.

If you have just installed Docker for the first time, you might try some minimal commands as verification that the Docker client is correctly installed and available on your path. Here, we demonstrate three ways that this can be done: `docker version`, `docker help`, or `which docker` work well as a minimal test.


In [24]:
%%bash

docker version

Client: Docker Engine - Community
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        6247962
 Built:             Sun Feb 10 04:12:39 2019
 OS/Arch:           darwin/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 04:13:06 2019
  OS/Arch:          linux/amd64
  Experimental:     false


In [25]:
%%bash
docker help


Usage:	docker [OPTIONS] COMMAND

A self-sufficient runtime for containers

Options:
      --config string      Location of client config files (default
                           "/Users/joshuacook/.docker")
  -D, --debug              Enable debug mode
  -H, --host list          Daemon socket(s) to connect to
  -l, --log-level string   Set the logging level
                           ("debug"|"info"|"warn"|"error"|"fatal")
                           (default "info")
      --tls                Use TLS; implied by --tlsverify
      --tlscacert string   Trust certs signed only by this CA (default
                           "/Users/joshuacook/.docker/ca.pem")
      --tlscert string     Path to TLS certificate file (default
                           "/Users/joshuacook/.docker/cert.pem")
      --tlskey string      Path to TLS key file (default
                           "/Users/joshuacook/.docker/key.pem")
      --tlsverify          Use TLS and verify the remote
  -v, --version            

In [26]:
%%bash 
which docker

/usr/local/bin/docker


## Hello, Docker

Having verified that the Docker client is properly installed, you can move on to the canonical “Hello, World!”.

In [27]:
%%bash

docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



### What just happened?

When you execute this command, the Docker client sends the `run hello-world` command to the Docker engine. The Docker engine then does the following:

1. Checks for the hello-world image in your local cache of images.
2. If the image does not exist locally, downloads the image from Docker Hub.
3. Creates a new container using the image.
4. Allocates a filesystem and adds a read-write layer to the top of the image.
5. Sets up an IP address for the system.
6. Executes the shell command /hello as specified in the image’s Dockerfile.
7. Upon completion of this process, terminates the container and shuts down.

![](img/image001.png)

## Docker Run Ubuntu

In [28]:
%%bash

docker run ubuntu ls -la

total 72
drwxr-xr-x   1 root root 4096 Feb 27 19:06 .
drwxr-xr-x   1 root root 4096 Feb 27 19:06 ..
-rwxr-xr-x   1 root root    0 Feb 27 19:06 .dockerenv
drwxr-xr-x   2 root root 4096 Jan 22 17:50 bin
drwxr-xr-x   2 root root 4096 Apr 24  2018 boot
drwxr-xr-x   5 root root  340 Feb 27 19:06 dev
drwxr-xr-x   1 root root 4096 Feb 27 19:06 etc
drwxr-xr-x   2 root root 4096 Apr 24  2018 home
drwxr-xr-x   8 root root 4096 May 23  2017 lib
drwxr-xr-x   2 root root 4096 Jan 22 17:49 lib64
drwxr-xr-x   2 root root 4096 Jan 22 17:48 media
drwxr-xr-x   2 root root 4096 Jan 22 17:48 mnt
drwxr-xr-x   2 root root 4096 Jan 22 17:48 opt
dr-xr-xr-x 184 root root    0 Feb 27 19:06 proc
drwx------   2 root root 4096 Jan 22 17:50 root
drwxr-xr-x   1 root root 4096 Jan 22 22:41 run
drwxr-xr-x   1 root root 4096 Jan 22 22:41 sbin
drwxr-xr-x   2 root root 4096 Jan 22 17:48 srv
dr-xr-xr-x  13 root root    0 Feb 27 17:15 sys
drwxrwxrwt   2 root root 4096 Jan 22 17:51 tmp
drwxr-xr-x   1 root root 4096 Jan 22 1

### What just happened?


Here, we run the latest Ubuntu image (`run ubuntu`). When you execute this command, the Docker client sends the command to the Docker engine.
The Docker engine does the following:

1. Checks for the ubuntu image in your local cache of images.
1. Downloads the image from Docker Hub, unless the image exists locally.
1. Creates a new container using the image.
1. Allocates a filesystem and adds a read-write layer to the top of the image.
1. Sets up an IP address for the system.
1. Executes the process `/bin/bash ls -la` within the container.

$\square$ **Note:** the `/bin/bash` command prefix is defined in the [`Dockerfile`](https://github.com/tianon/docker-brew-ubuntu-core/blob/1cc295b1507b68a66942b2ff5c2dbf395850208a/xenial/Dockerfile#L45) for the Ubuntu image

## PIDs

In [29]:
%%bash

ps

  PID TTY           TIME CMD
32332 ttys000    0:00.06 /Applications/iTerm.app/Contents/MacOS/iTerm2 --server login -fp joshuacook
32334 ttys000    0:00.62 -bash
36159 ttys000    0:25.39 /Users/joshuacook/repos/miniconda/bin/python /Users/joshuacook/repos/miniconda/bin/jupyter-notebook
36724 ttys001    0:00.05 /Applications/iTerm.app/Contents/MacOS/iTerm2 --server login -fp joshuacook
36727 ttys001    0:01.13 -bash
39517 ttys003    0:00.05 /Applications/iTerm.app/Contents/MacOS/iTerm2 --server login -fp joshuacook
39519 ttys003    0:00.45 -bash
47413 ttys003    0:03.46 /Users/joshuacook/repos/miniconda/bin/python /Users/joshuacook/repos/miniconda/bin/jupyter-notebook
47640 ttys004    0:00.04 /Applications/iTerm.app/Contents/MacOS/iTerm2 --server login -fp joshuacook
47642 ttys004    0:00.31 -bash


In [30]:
%%bash

ps aux | wc -l

     397


### Docker PIDs

In [31]:
%%bash 

docker run ubuntu ps

  PID TTY          TIME CMD
    1 ?        00:00:00 ps


In [33]:
%%bash 

docker run ubuntu ps aux | wc -l

       2


In [34]:
%%bash 

docker run ubuntu ps aux

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  25940  1580 ?        Rs   19:06   0:00 ps aux


### Wait, what?

It is useful to take note of the state while your Ubuntu image was running. It is not unusual that `ls` would show a complete standard Linux filesystem. It is not unusual that `ps` would return just a few items. It is **highly unusual** that `ps aux` would return one item. 

`ps aux` shows (a) processes for all users, (u) showing the owner of the process, and (x) including processes that are not attached to any terminal. In other words, in running `ps aux`, you have effectively shown all of the processes currently running on the system. 

Again, it is highly unusual that the only processes running on the system are the command we just passed. **THIS IS DOCKER**.


### Deep Dive: Docker PIDs

- https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
- https://github.com/krallin/tini
- https://engineeringblog.yelp.com/2016/01/dumb-init-an-init-for-docker.html

## [Containers are not virtual machines](https://blog.docker.com/2016/03/containers-are-not-vms/). Containers are virtual processes.

### Docker Conceptual Paradigms

A `Dockerfile` defines a Docker image. At run time, docker uses an image to launch a container.



In [35]:
%%bash

docker images

REPOSITORY                       TAG                 IMAGE ID            CREATED             SIZE
databrickseducation/build-tool   latest              f54ba466599b        5 days ago          1.08GB
databrickseducation/build-tool   <none>              cd274fa6c2ca        2 weeks ago         1.08GB
dnd                              latest              0a4de7d1344a        3 weeks ago         5.64GB
dva                              latest              0a4de7d1344a        3 weeks ago         5.64GB
ubuntu                           latest              20bb25d32758        5 weeks ago         87.5MB
jupyter/scipy-notebook           latest              c6eed931aa71        7 weeks ago         4.9GB
hello-world                      latest              fce289e99eb9        8 weeks ago         1.84kB
qqb_this_jupyter                 latest              30c9ffd8009a        2 months ago        5.46GB
qqb_this_scheduler               latest              ddcc3d6a4325        2 months ago        963MB
qqb_

In [36]:
%%bash

docker ps 

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES


In [37]:
%%bash

docker ps -a


CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                         PORTS               NAMES
170ed87eed97        ubuntu              "ps aux"                 4 minutes ago       Exited (0) 4 minutes ago                           relaxed_antonelli
8cc57fee8ed8        ubuntu              "ps aux"                 4 minutes ago       Exited (0) 4 minutes ago                           cranky_chandrasekhar
7461ea8a0256        ubuntu              "ps aux"                 4 minutes ago       Exited (0) 4 minutes ago                           pensive_shockley
8635788043a2        ubuntu              "ps"                     4 minutes ago       Exited (0) 4 minutes ago                           affectionate_tereshkova
e141d504c445        ubuntu              "ls -la"                 4 minutes ago       Exited (0) 4 minutes ago                           romantic_cartwright
3b469ff50f57        hello-world         "/hello"                 4 minutes ago

### Weak Allegory - Object-Oriented Programming

- Images ~ Classes
- Containers ~ Objects

### Strong Allegory - Binaries

- Images ~ Compiled Binaries
- Containers ~ Running Process of a Binary

That's why we use `docker ps`!

## Defining an Image

A `Dockerfile` is used to define a Docker image.

### The `build-tool` image

```
FROM python:2
RUN touch /etc/in-docker
RUN pip install git+https://github.com/databricks-edu/build-tooling
RUN apt-get update && apt-get install -y less zip unzip vim nano
RUN curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
RUN unzip awscli-bundle.zip
RUN ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws
```

Built using the [Python](https://hub.docker.com/_/python) core image.

### `docker build`

To use this image, it must be built. Brian and I have set up an automated build process to do this for us. 

Every time a change is made to the github repo `databricks-edu/build-tooling` an automated build is triggered on Docker Hub: https://hub.docker.com/r/databrickseducation/build-tool

This could be done manually via `docker build` e.g.

```
docker build -t SOME_IMAGE_NAME SOME_DOCKERFILE_LOCATION
```

This means that to get the latest updates of the build-tooling, all this is necessary is to do a `docker pull`. 

Brian added this to the script `docker/install.sh`.

In [38]:
cd ~/repos/build-tooling/

/Users/joshuacook/repos/build-tooling


In [39]:
%%bash

git pull

Already up to date.


In [40]:
%%bash

./docker/install.sh

Pulling databrickseducation/build-tool:latest ...

latest: Pulling from databrickseducation/build-tool
Digest: sha256:77a8dbb69b52f32090bce773823adbf05d85a8275445397f5f274c785364e502
Status: Image is up to date for databrickseducation/build-tool:latest

Updating aliases ...

Done!

If you haven't already done so, add the following to your .bashrc or .zshrc:

. ~/.build-tools-aliases.sh


## Using the Build Tools

### The Aliases

In [41]:
%%bash

head -n 14 ~/.build-tools-aliases.sh | tail -n 4

alias bdc="docker run -it --rm -w `pwd` -e DB_SHARD_HOME=$DB_SHARD_HOME -e HOME=$HOME -v $HOME:$HOME databrickseducation/build-tool:$BUILD_TOOL_DOCKER_TAG bdc"
alias databricks="docker run -it --rm -w `pwd` -e HOME=$HOME -v $HOME:$HOME databrickseducation/build-tool:$BUILD_TOOL_DOCKER_TAG databricks"
alias gendbc="docker run -it --rm -w `pwd` -e HOME=$HOME -v $HOME:$HOME databrickseducation/build-tool:$BUILD_TOOL_DOCKER_TAG gendbc"
alias master_parse="docker run -it --rm -w `pwd` -e HOME=$HOME -v $HOME:$HOME databrickseducation/build-tool:$BUILD_TOOL_DOCKER_TAG master_parse"


### What will happen?

When you run 

```
bdc some_course.yaml
``` 

via the alias, you are actually running

```
docker run -it --rm -w `pwd` -e DB_SHARD_HOME=$DB_SHARD_HOME -e HOME=$HOME -v $HOME:$HOME databrickseducation/build-tool:$BUILD_TOOL_DOCKER_TAG bdc
```

The Docker client sends the command to the Docker engine. The Docker engine does the following:

1. Locates the `databrickseducation/build-tool` image in your local cache of images.
1. Creates a new container using the image.
1. Allocates a filesystem and adds a read-write layer to the top of the image.
1. Executes the process `bdc some_course.yaml` within the container.

The following flags are used:

- `-it ` ensures that the process is interactive with terminal emulation attached
- `--rm` removes the container when the process is complete
- `-e` exports each of these environment variables
- `-v` attaches `$HOME` on the host to `$HOME` in the container (which was previously defined to be the same thing)
- `-w` sets the current directory on the host system (```pwd```) to be the working directory (the directory from which docker will run the command) in the container.

### The `course` tool

The course tool requires several additional environment variables. To do this, the alias file contains two helper functions:

- `create_course_envfile` - which creates a file of the necessary environment 
- `course` - a wrapper function that creates a temporary envfile and passes this file to the docker engine at runtime. 

In [42]:
%%bash

tail -n 23 ~/.build-tools-aliases.sh

function create_course_envfile {
  : ${1?'Missing file name'}
  egrep=`echo $COURSE_ENV_VARS | sed 's/ /|/g'`
  env | egrep "$egrep" >$1
}

# The course tool can look at a lot of environment variables. To ease passing
# the entire environment into the tool, this "alias" is defined as a function.

function course {

  : ${BUILD_TOOL_DOCKER_TAG:=latest}

  TMP_ENV=/tmp/course-env.$$

  create_course_envfile $TMP_ENV

  docker run -it --rm -w `pwd` --env-file $TMP_ENV -e HOME=$HOME -v $HOME:$HOME databrickseducation/build-tool:$BUILD_TOOL_DOCKER_TAG course "$@"

  rm -f $TMP_ENV
}




## What do I need to do?

1. Install Docker.
2. Clone the `build-tooling` repo.
3. `cd build-tooling`
4. (If necessary) update `build-tooling`
   ```
   git pull
   ```
5. Run the install script.

   ```
   ./docker/install.sh
   ```
6. Use the build tools as you normally would 

   ```
   course arg1 arg2 ...
   bdc ... course.yaml
   ...
   ```
 