# Docker

<p align=center><a href=https://www.docker.com><img src=images/Docker_Logo.png width=400></a></p>

### Context & .dockerignore


"The build context is the set of files located at the specified PATH or URL. Those files are sent to the Docker daemon during the build so it can use them in the filesystem of the image." By default, the docker context is the location from which `docker build` is executed.

> When we build the image in a given directory, __everything is added recursively to the context__ (so it can be copied into image, like above)

`Dockerfile` might be surrounded by a lot of files and it takes time to copy them into `Docker` system (might even crash if there are too many files!)

Because of that `.dockerignore` file can be specified (really similar to `.gitignore`):

In [None]:
__main__.py
requirements.txt

## Docker Containers


> <font size=+1>Containers are instantiations of images</font>

As we have built our `repository/python_image` we can create and run container from it:

In [None]:
docker run repository/python_image:latest --help

### Thinking of containers


> One should think of the containers as __standalone units__ (like applications) __having single responsibility__

Examples could be (but are not limited to):
- MySQL database container (other containers connect to it via `EXPOSE\' d` ports)
- Data preprocessing creating a single artifact (preprocessed dataset)
- Neural network training creating a single artifact (neural network)

__Never try to fit everything into a single container!__

In the above case, our container is similar to simply running `python` from the command line

> Containers should be immutable (their internal state is always the same)

This allows us to:
- Destroy and recreate containers quickly
- Always be in a well-defined state

## Dockerfile commands (instructions)


> Docker provides a couple commands, which allows us to work with in a similar fashion to command line

__Each command creates a new layer__:
- Images can be built from any layer upwards
- __Layers are cached and reused by consecutive builds__
- __Layers can be reused between different images__

### [FROM](https://docs.docker.com/engine/reference/builder/#from)


> `FROM [--platform=<platform>] <image>[:<tag>] [AS <name>]`

- Starts __build stage__ of an image
- Specifies base image (like `Ubuntu`, `node`, `conda`) which defines what one can do in the image
- `AS` defines name for the image, we will see the usage during [multi-stage builds](https://docs.docker.com/develop/develop-images/multistage-build/).

It can be mixed with `ARG` (which allows us to pass this value from a command line) like this:

In [None]:
# Version is out of build stage
ARG VERSION=latest
# Here build stage starts
FROM busybox:$VERSION

# Gets version into build stage
ARG VERSION
RUN echo $VERSION > image_version

### [RUN](https://docs.docker.com/engine/reference/builder/#run)


> Runs specified command __during build stage__ (e.g. installing some packages)

Forms:

> `RUN <command>` (execute via `shell`)

__or__:

> `RUN ["executable", "param1", "param2"]` (`exec` form)

Which form to use?
- `shell` - if we want to run `shell` (usually `bash`) command like `apt-get install`
- `exec` - if the base image has no shell __or__ we don't want string munging


### [ENTRYPOINT](https://docs.docker.com/engine/reference/builder/#entrypoint)


> __Defines entrypoint (command which will be run) WHEN CONTAINER IS CREATED FROM AN IMAGE__

Forms:

> `ENTRYPOINT ["executable", "param1", "param2"]` (preferred `exec` form)

__or__

> `ENTRYPOINT command param1 param2` (`shell` form)

- Container runs as an executable (which you should always aim to do, more on that later!)
- You should always specify it (unless you want to use `shell`)
- __Either of `ENTRYPOINT` or `CMD` is needed__

#### `exec` form

- __Does not invoke shell__, hence it is not dependent on it
- __Allows us to use optional `CMD`__ (in a second, after command)

In [None]:
FROM ubuntu
# When we run a container from the image, top -b will be run
ENTRYPOINT ["top", "-b"]

### [CMD](https://docs.docker.com/engine/reference/builder/#cmd)


> __Specifies default arguments to entrypoint (if any) WHICH USER CAN OVERRIDE DURING `docker run`__

Forms:

> `CMD ["executable","param1","param2"]` (specify `executable` as `entrypoint`, whole command can be overridden)

__or__

> `CMD ["param1","param2"]` (as default parameters to ENTRYPOINT, only those could be overridden)

__or__ 

> `CMD command param1 param2` (shell form, __discouraged__ as users is unable to override)

In [None]:
FROM ubuntu
ENTRYPOINT ["top", "-b"]
CMD ["-c"]

Now if we `run` the container from the above image, command `top -b -c` will be run.

- `top -b` __will always run__
- `-c` can be changed to some other flag/command via `docker run`

Let's see how `CMD` interacts with `ENTRYPOINT` for a better understanding:

Note: `/bin/sh -c` is just command which executes the proceeding code in the terminal

![](images/docker_entrypoint_cmd_interaction.png)

### [COPY](https://docs.docker.com/engine/reference/builder/#copy)


> __Allows users to specify which file(s) or directories should be copied into the image from host system__

> `COPY <src> <destination>`

Often idiom `COPY . .` is used, which copies file from context location to current working directory inside container. 

It might look like we're copying something to the same location, but that's not what's happening. We essentially have two file structures which we are moving files between. The file system which the first argument to `COPY` refers to is the build context (wherever you run `docker build` from). The file system which the second argument to `COPY refers to is the file system within your docker container.

### Other commands

There are a few others, notably:
- `LABEL <key>=<value>` - allows us to add metadata to our image (like author, maintainer, way of contacting)
- `WORKDIR dir` - sets working directory to a different one
- `ENV <key>=<value>` - environment variable readable throughout the concrete build stage
- `EXPOSE <port>` - `EXPOSE 80` would expose port `80` inside the container for others to connect (usually a person running `docker` command will specify which ports to expose and connect, hence this one isn't used very often).

## Commands tips



### Know how to use cache

> __Some commands invalidate cache__ and when this happens, every step following it will have to be re-run when you create the image

Let's look at the example Dockerfile below (similar to the first one):

In [None]:
FROM ubuntu:18.04

RUN apt-get update
COPY . .

RUN apt-get install -y --no-install-recommends python3
RUN rm -rf /var/lib/apt/lists/*

Now, no matter what happens, `python3` will be installed during each `docker build`, because:

> Docker has no mechanism to check whether `context` for `COPY` command changed

Instead we could do what we've seen at the very beginning __as Python installation is not dependent on the context__.

> __If possible, put `COPY` statements AFTER setting up OS dependencies__

### Chain commands together

> Whenever possible __chain multiple command using `&&`__ so they are all in a single `RUN` directive

Docker works in a similar fashion to `git`, __it only stashes changes (additions) to the system__.

This is often undesirable, because:
- Temporary files are left out and increase image's size (`rm -rf /var/lib/apt/lists/*` seen at the beginning of the lesson)
- Containers __are less of a black box__, which means attackers can analyze Docker system easier and find it's weak points

__Main command to look out for is `RUN`, most of the commands (like `LABEL`) DO NOT create an additional layer__

## Small, self-contained images


> __The smaller the image, the better__

Imagine we have to run `10` containers from a single image published image. Now, if the image weighs `1GB` we would need to use at least `10GB` of bandwidth.

Compare that to an image of `10MB`. Other pros include:
- Smaller latency for users (setup takes considerably shorter)
- Easier to recreate a fleet of containers (more on that during Kubernetes)
- Easier to replace a failed container

There is one killer feature of Docker which helps us achieve it, namely...


## Multi-stage builds



> First image builds the application (creating an artifact), while the second copies it and sets it up for running in a container

The easiest approach is to look at an example `Dockerfile`:

In [None]:
# FIRST (BUILDER) STAGE
FROM golang:1.7.3 AS builder

# Obtain golang code
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html  
COPY app.go .
# Compile is as a single exectuable file called app
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .

# SECOND STAGE
# alpine is a very slim file system (few MB) great for lightweight deployment
FROM alpine:latest  

# Setup only bare necessities
RUN apk --no-cache add ca-certificates
WORKDIR /root/
# Copy self-contained app into the smaller image
COPY --from=builder /go/src/github.com/alexellis/href-counter/app .
# Setup the application as container's ENTRYPOINT
ENTRYPOINT ["./app"]  

> __You should use multi-stage builds wherever possible!__

### Pros

- Drastically reduces image size
- Simplifies maintenance

### Cons

- __Mainly usable for compilable language__ (sorry Python :( ) like Go, C++
- __Even better with statically linked__ (e.g. everything is contained in a single executable)
- __Hard to make it work with ML/DL__ as those require a lot of dependencies

One way to go around it is to use `torchscript` and `C++` PyTorch's frontend (or neural network conversion to Tensorflow) and changing the language

> Be aware, that this approach is currently really hard and might bring you a lot of headaches!

> __Remember deployment is not only about neural networks, there are other things (like servers, databases etc.) that might benefit from this approach!__

> __Online registry with many official and third party images uploaded and ready to be run as containers (or act as base)__

![](images/dockerhub_main_page.png)

One can `docker run` those images directly and those will be downloaded automatically:

In [None]:
docker run busybox:latest ls -la

### Tips

- Use official images if possible
- Use smallest image fitting the job (e.g. `alpine` instead of `ubuntu` if possible)
- __Check unofficial images__ (or roll out your own)

## Docker Commands


Now, that we know a little bit about images, we may dive into Docker's command line interface.

> __In new Docker version (`>=1.13`) the command line was redesigned in order to be more readable and grouped logicially__

__High level most important commands__:
- [`docker image SUBCOMMAND`](https://docs.docker.com/engine/reference/commandline/image/) - manage `Docker` images (like building or inspecting them)
- [`docker container SUBCOMMAND`](https://docs.docker.com/engine/reference/commandline/container/) - manage `Docker` containers (creating from image, stopping, restarting, killing, inspecting etc.)
- [`docker volume SUBCOMMAND`](https://docs.docker.com/engine/reference/commandline/volume/) - manages volumes (persistent data storage which might be shared and attached to Docker containers)
- [`docker network SUBCOMMAND`](https://docs.docker.com/engine/reference/commandline/network/) - manage `Docker` networks (like creating, inspecting, listing etc., __not covered here, we will use Kubernetes for network related tasks__)

__High level less important commands__:
- [`docker config SUBCOMMAND`](https://docs.docker.com/engine/reference/commandline/config/) - configuration of Docker
- [`docker stack`](https://docs.docker.com/engine/reference/commandline/stack/) - manage multiple containers as whole (__not covered, we will use Kubernetes instead__)
- [`docker secret`](https://docs.docker.com/engine/reference/commandline/secret/) - manage secrets (like passwords and other sensitive data inside containers/images)
- [`docker system`](https://docs.docker.com/engine/reference/commandline/system/) - manage Docker itself (how much space is used, cleaning images/containers etc.)

Those `SUBCOMMAND`s are also available inside `docker` (e.g. `docker image build` is equivalent to `docker build`) and may be a source of confusion

> __Check out documentation for `docker build` instead of `docker image build` as it has more information, BUT USE THE LATTER AS IT IS WAY MORE READABLE!__


### docker image build


> `docker image build [OPTIONS] PATH | URL | -`

As seen previously one can use it to `build` `IMAGE` from `Dockerfile` __and__ `context`.

Except building from local, one can also build from:
- `github`: `docker build github.com/creack/docker-firefox`
- `tar.gaz`: `docker build -f ctx/Dockerfile http://server/ctx.tar.gz` (here context is on a different server)
- From `stdin` (no context in this case): `docker build - < Dockerfile`

#### Options

- `-t` - tag the image (__always use it!__): `docker build -t whenry/fedora-jboss:latest -t whenry/fedora-jboss:v2.1 .` (multiple tags supported)
- `-f` - specify different file: `docker build -f dockerfiles/Dockerfile.debug -t myapp_debug .`; __useful for separate production, testing, debugging images!__
- `--build-arg` - pass arguments to the build stage (`ARG` above): `docker build --build-arg HTTP_PROXY=http://10.20.30.2:1234 .`




### docker container run



__The most important command you will all the time with A LOT of options!__

> __REMINDER:__ "Docker runs processes in isolated containers. A container is a process which runs on a host." 

> `docker container run [OPTIONS] IMAGE[:TAG] [COMMAND] [ARG...]`

Using `OPTIONS` developer can override defaults set by the image creator, including, but not limited to:
- detached or foreground running
- network settings
- runtime constraints
- command run

> `COMMAND` specifies a command to be passed to image entrypoint

> `ARG` arguments passed to the command

We have seen examples above

#### Running container interactively

> `docker container run` can attach streams (`STDIN`, `STDOUT`, `STDERR`) of container and attach a terminal so we can interact with the container

- `-a NAME_OF_STREAM`
- `-t` allocate pseudo TTY (terminal)
- `-i` keep STDIN open even if not attached to CLI

`/bin/bash` specifies the entrypoint of TTY we have attached (or attached to)

### Options

- `--name NAME` - specify name for the container (__always do this__)
- `--rm` - remove container after exit (__usually do this__); otherwise it will prevail in your operating system not doing anything
- `-d` - run in a detached mode (default runs in foreground, equivalent to `-d=false`)
- `-m` - set memory limit (`docker run -it -m 300M ubuntu:14.04 /bin/bash`), there are also flags for other resources
- `-e NAME=VALUE` - pass environment variable to the container (`export today=Wednesday; docker run -e "deep=purple" -e today --rm alpine env` would allow us to use `$today` inside the container)
- `--entrypoint` - override default image entrypoint (`docker run -it --entrypoint /bin/bash example/redis`); __reset via `--entrypoint=""`!__

> For more check out [`docker run` reference](https://docs.docker.com/engine/reference/run/)

### Exit codes

> If your `docker run` fails check the return code to know where to look for bugs

- `125` - error within the daemon (e.g. wrong flag passed; `docker run --foo busybox`)
- `126` - contained comment cannot be __invoked__ (`docker run busybox /etc` - it is a directory, not a command)
- `127` - contained comment cannot be found (`docker run busybox foo` - no command `foo`)

> Otherwise return code of contained comment will be returned (usually `0` if executed correctly)

## Docker volumes

We have been through a lot of content, there is one thing left... __Docker may create artifacts__ (like metrics from training or data after preprocessing).

__How to get them out of container?__

There are two options:
- using `docker container cp` command
- using volumes

> __Volumes are persistent storage shared between host machine and Docker container(s)__

Benefits should be obvious:
- data sharing between containers and hosts (example: data preprocessing container creates dataset, neural network container trains our model on it)
- we can copy to/from the containers and do the live updates of their data content
- we can set the volume to be readable only for increased security

### docker volume create

> Create volume __which includes contents of the directory it was created in__

(see `exercise` to check how to verify what's inside the volume)

In [None]:
docker volume create docker_lesson

Now, we can __mount__ the volume to `/lesson` directory inside container (and list it's contents):

In [None]:
docker container run --rm -v docker_lesson:/lesson busybox ls /lesson

### Mounting

Let's take a look at a couple ways to `mount` the volume using `--mount`:

In [None]:
docker run \
  --name devtest \
  --mount source=myvol2,target=/app \
  nginx:latest

## Exercise

- Read the following aliases with comments and copy them to your aliases location
- Run `--help` with them to know a little bit more (and check docs/google if something interests you)

In [None]:
# Images
alias di="docker image" # General for docker images

alias dib="docker image build" # Build docker image
alias dil="docker image ls" # List docker images (check --help)
alias dip="docker image push" # Push NAME image
alias dirm="docker image rm" # Remove NAME image
alias dirmall="docker image prune -a" # Remove all images not used by containers

# Containers
alias dc="docker container" # General for docker containers

alias dcr="docker container run" # Run container from an IMAGE image
alias dccp="docker container cp" # Copy data from src to dst inside container
alias dce="docker container exec" # Execute COMMAND inside container
alias dci="docker container inspect" # Inspect container
alias dck="docker container kill"  # Kill container
alias dcl="docker container ls" # List all available containers
alias dcs="docker container stop" # Stop running container

alias dcrma='docker ps -a -q | xargs sudo docker rm' # Remove all non-running containers

# Volumes
alias dv="docker volume" # General volume command

alias dvc="docker volume create" # Create NAMED volume
alias dvl="docker volume ls" # List all available volumes
alias dvrm="docker volume rm" # Remove volumes

# List files inside the created volume
dvi(){
  docker run --rm -i -v="$1":/tmp/myvolume busybox find /tmp/myvolume
}

# System
alias dsi="docker system info" # Display system-wide information
alias dsdf="docker system df" # How much images & containers take in terms of space
alias dsp="docker system prune" # Remove every unused image/containerj

## Summary

In this notebook we have learned:
- How to ignore files that we don't want to be part of our Docker build.
- Learned in more detail what each of the Docker commands do when building an image.
- Learned how to make multi-stage Docker builds. 
- And how to create persistent volumes to get the data from the container to our local machine.