# Docker Lunch and Learn

## Installation

### Installing and Configuring Docker

#### Install on Mac 

Follow this link for all contemporary (~> 2016) Mac hardware: https://hub.docker.com/editions/community/docker-ce-desktop-mac


#### Install on AWS

Installing Docker on an AWS instance is a trivial process. It consists of running an install script that can be obtained from Docker and then adding your user to the Docker group. Below, we run these two commands. First, we download the install script from https://get.docker.com, then immediately pipe the script into the shell (`| sh`).

There are new projects called [Docker on AWS](https://docs.docker.com/docker-for-aws/) and [Docker for Azure](https://docs.docker.com/docker-for-azure/) that do this config for you. 



$\square$ **Note:** It is generally considered to be a significant security vulnerability to execute arbitrary code obtained from an unknown, or untrusted source. For our purposes, the source (https://get.docker/com) is considered trustworthy, we are using SSL to perform the curl. It may make the security minded more comfortable to `curl` the script to a local file, inspect, and then run it. **In practice the method below is the method I use to install Docker.**

#### Important Note re: Installation on Linux

$\square$ **Note:** Do not install via `apt` or `yum` directly. Follow the Docker instructions for your system. https://docs.docker.com/install/

<img src="https://www.evernote.com/l/AAFDeHFBjO9P5qVykXyJZ22ycdQJArLe1fUB/image.png" width=600px>

#### Install Docker via a Shell Script

<include type="listing" label="install-docker">
    
```
$ curl -sSL https://get.docker.com | sh
# Executing docker install script, commit: 1d31602
+ sudo -E sh -c apt-get update -qq >/dev/null
...

Client:
 Version:   18.02.0-ce
 API version:   1.36
 Go version:    go1.9.3
 Git commit:    fc4de44
 Built: Wed Feb  7 21:16:33 2018
 OS/Arch:   linux/amd64
 Experimental:  false
 Orchestrator:  swarm

Server:
 Engine:
  Version:  18.02.0-ce
  API version:  1.36 (minimum version 1.12)
  Go version:   go1.9.3
  Git commit:   fc4de44
  Built:    Wed Feb  7 21:15:05 2018
  OS/Arch:  linux/amd64
  Experimental: false

...

```

</include>

#### Add the Ubuntu User to the Docker Group

When the script completes there is one last thing to be done. Add the `ubuntu` user to the `docker` group. By default, the command line docker client will require sudo access in order to issue commands to the docker daemon. You can add the `ubuntu` user to the `docker` group in order to allow the `ubuntu` user to issue commands to docker without sudo.

<include type="listing" label="add-to-docker-group">
    
```
$ sudo usermod -aG docker ubuntu
```

</include>

#### Disconnect and Reconnect to Update Group

Finally, in order to force the changes to take effect, you should disconnect and reconnect to their remote system. You can achieve this by typing `exit` or `ctrl-d` and then reconnecting via ssh to your EC2 instance.

### Test Docker Installation

Minimally, using Docker to run your code consists of the following:

1. Pull a precompiled or build a new **image** from a `Dockerfile`.
2. Run the image as a new **container**.

If you have just installed Docker for the first time, you might try some minimal commands as verification that the Docker client is correctly installed and available on your path. Here, we demonstrate three ways that this can be done: `docker version`, `docker help`, or `which docker` work well as a minimal test.


In [None]:
%%bash

docker version

In [None]:
%%bash
docker help

In [None]:
%%bash 
which docker

## Hello, Docker

Having verified that the Docker client is properly installed, you can move on to the canonical “Hello, World!”.

In [None]:
%%bash

docker run hello-world

### What just happened?

When you execute this command, the Docker client sends the `run hello-world` command to the Docker engine. The Docker engine then does the following:

1. Checks for the hello-world image in your local cache of images.
2. If the image does not exist locally, downloads the image from Docker Hub.
3. Creates a new container using the image.
4. Allocates a filesystem and adds a read-write layer to the top of the image.
5. Sets up an IP address for the system.
6. Executes the shell command /hello as specified in the image’s Dockerfile.
7. Upon completion of this process, terminates the container and shuts down.

![](img/image001.png)

## Docker Run Ubuntu

In [None]:
%%bash

docker run ubuntu ls -la

### What just happened?


Here, we run the latest Ubuntu image (`run ubuntu`). When you execute this command, the Docker client sends the command to the Docker engine.
The Docker engine does the following:

1. Checks for the ubuntu image in your local cache of images.
1. Downloads the image from Docker Hub, unless the image exists locally.
1. Creates a new container using the image.
1. Allocates a filesystem and adds a read-write layer to the top of the image.
1. Sets up an IP address for the system.
1. Executes the process `/bin/bash ls -la` within the container.

$\square$ **Note:** the `/bin/bash` command prefix is defined in the [`Dockerfile`](https://github.com/tianon/docker-brew-ubuntu-core/blob/1cc295b1507b68a66942b2ff5c2dbf395850208a/xenial/Dockerfile#L45) for the Ubuntu image

## PIDs

In [None]:
%%bash

ps

In [None]:
%%bash

ps aux | wc -l

### Docker PIDs

In [None]:
%%bash 

docker run ubuntu ps

In [None]:
%%bash 

docker run ubuntu ps aux | wc -l

In [None]:
%%bash 

docker run ubuntu ps aux

### Wait, what?

It is useful to take note of the state while your Ubuntu image was running. It is not unusual that `ls` would show a complete standard Linux filesystem. It is not unusual that `ps` would return just a few items. It is **highly unusual** that `ps aux` would return one item. 

`ps aux` shows (a) processes for all users, (u) showing the owner of the process, and (x) including processes that are not attached to any terminal. In other words, in running `ps aux`, you have effectively shown all of the processes currently running on the system. 

Again, it is highly unusual that the only processes running on the system are the command we just passed. **THIS IS DOCKER**.


### Deep Dive: Docker PIDs

- https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
- https://github.com/krallin/tini
- https://engineeringblog.yelp.com/2016/01/dumb-init-an-init-for-docker.html

## [Containers are not virtual machines](https://blog.docker.com/2016/03/containers-are-not-vms/). Containers are virtual processes.

### Docker Conceptual Paradigms

A `Dockerfile` defines a Docker image. At run time, docker uses an image to launch a container.



In [None]:
%%bash

docker images

In [None]:
%%bash

docker ps 

In [None]:
%%bash

docker ps -a


### Weak Allegory - Object-Oriented Programming

- Images ~ Classes
- Containers ~ Objects

### Strong Allegory - Binaries

- Images ~ Compiled Binaries
- Containers ~ Running Process of a Binary

That's why we use `docker ps`!

## Defining an Image

A `Dockerfile` is used to define a Docker image.

### The `build-tool` image

```
FROM python:2
RUN touch /etc/in-docker
RUN pip install git+https://github.com/databricks-edu/build-tooling
RUN apt-get update && apt-get install -y less zip unzip vim nano
RUN curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
RUN unzip awscli-bundle.zip
RUN ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws
```

Built using the [Python](https://hub.docker.com/_/python) core image.

### `docker build`

To use this image, it must be built. Brian and I have set up an automated build process to do this for us. 

Every time a change is made to the github repo `databricks-edu/build-tooling` an automated build is triggered on Docker Hub: https://hub.docker.com/r/databrickseducation/build-tool

This could be done manually via `docker build` e.g.

```
docker build -t SOME_IMAGE_NAME SOME_DOCKERFILE_LOCATION
```

This means that to get the latest updates of the build-tooling, all this is necessary is to do a `docker pull`. 

Brian added this to the script `docker/install.sh`.

In [None]:
cd ~/repos/build-tooling/

In [None]:
%%bash

git pull

In [None]:
%%bash

./docker/install.sh

## Using the Build Tools

### The Aliases

In [None]:
%%bash

head -n 14 ~/.build-tools-aliases.sh | tail -n 4

### What will happen?

When you run 

```
bdc some_course.yaml
``` 

via the alias, you are actually running

```
docker run -it --rm -w `pwd` -e DB_SHARD_HOME=$DB_SHARD_HOME -e HOME=$HOME -v $HOME:$HOME databrickseducation/build-tool:$BUILD_TOOL_DOCKER_TAG bdc
```

The Docker client sends the command to the Docker engine. The Docker engine does the following:

1. Locates the `databrickseducation/build-tool` image in your local cache of images.
1. Creates a new container using the image.
1. Allocates a filesystem and adds a read-write layer to the top of the image.
1. Executes the process `bdc some_course.yaml` within the container.

The following flags are used:

- `-it ` ensures that the process is interactive with terminal emulation attached
- `--rm` removes the container when the process is complete
- `-e` exports each of these environment variables
- `-v` attaches `$HOME` on the host to `$HOME` in the container (which was previously defined to be the same thing)
- `-w` sets the current directory on the host system (```pwd```) to be the working directory (the directory from which docker will run the command) in the container.

### The `course` tool

The course tool requires several additional environment variables. To do this, the alias file contains two helper functions:

- `create_course_envfile` - which creates a file of the necessary environment 
- `course` - a wrapper function that creates a temporary envfile and passes this file to the docker engine at runtime. 

In [None]:
%%bash

tail -n 23 ~/.build-tools-aliases.sh

## What do I need to do?

1. Install Docker.
2. Clone the `build-tooling` repo.
3. `cd build-tooling`
4. (If necessary) update `build-tooling`
   ```
   git pull
   ```
5. Run the install script.

   ```
   ./docker/install.sh
   ```
6. Use the build tools as you normally would 

   ```
   course arg1 arg2 ...
   bdc ... course.yaml
   ...
   ```
 