# Docker Context, Instructions, & Tips

<p align=center><a href=https://www.docker.com><img src=images/Docker_Logo.png width=400></a></p>

## Context and .dockerignore


The build context is a set of files located at a specified PATH or URL. These files are sent to the Docker daemon during the creation step for use in the filesystem of the image. By default, the docker context is the location from which `docker build` is executed.

> When an image is built in a given directory, __everything is added recursively to the context__ so that they can be copied into the image.

Since `Dockerfile` may be surrounded by many files, it may take a significant amount of time to copy them all into the `Docker` system.

As a solution, the `.dockerignore` file can be specified, which is similar to `.gitignore`.
Each line in this file represents the path to a file or folder which should be ignored by Docker, and not sent to the Docker daemon during creation.

An example `.gitignore` might look like this:

In [None]:
__main__.py
requirements.txt

## Dockerfile Instructions


Docker instructions are the capitalised words found at the start of every line in a Dockerfile.

Each instruction creates a new _layer_.
- Images can be built from any layer upwards.
- Layers are cached and reused by consecutive builds.
- Layers can be reused between different images.

### [FROM](https://docs.docker.com/engine/reference/builder/#from)

> `FROM [--platform=<platform>] <image>[:<tag>] [AS <name>]`

- The `FROM` command starts the __build stage__ of an image.
- It specifies the base image (e.g. `Ubuntu`, `node` or `conda`), which defines what can be done in the image.
- `AS` defines the image name, which is useful for _[multi-stage builds](https://docs.docker.com/develop/develop-images/multistage-build/)_.
- It can be combined with `ARG`, which allows us to pass a value from a command line, as follows:

In [None]:
# Version is out of the build stage
ARG VERSION=latest
# Here, the build stage starts
FROM busybox:$VERSION

# Gets version into the build stage
ARG VERSION
RUN echo $VERSION > image_version

### [RUN](https://docs.docker.com/engine/reference/builder/#run)


> The `RUN` command runs the command specified __during the build stage__ (e.g. installing some packages).

That means that the run command runs the command not when the container is run, but when the image is being built.

*__`RUN` can be used in several forms__*

> `RUN <command>` (execute via `shell`)

__or__

> `RUN ["executable", "param1", "param2"]` (`exec` form)

- `shell` is employed for running `shell` (usually `bash`) commands, such as `apt-get install`.
- `exec` is employed if the base image has no shell __or__ or to avoid string munging.


### [ENTRYPOINT](https://docs.docker.com/engine/reference/builder/#entrypoint)


> __This command defines the entrypoint (i.e. the command to be run) when a container is created from an image.__

The command specified in an `ENTRYPOINT` instruction does not get run when the Dockerfile is used to create the image. It runs when a container is created from the image.

*__`ENTRYPOINT` can be used in several forms__*

> `ENTRYPOINT ["executable", "param1", "param2"]` (preferred `exec` form)

- __It is not run through the shell__ and therefore, it is not dependent on it.
- __It allows the use of the optional `CMD`__.

__or__

> `ENTRYPOINT command param1 param2` (`shell` form)

- Either `ENTRYPOINT` or `CMD` is required.


In [None]:
FROM ubuntu
# When a container is run from the image, the top -b will be run.
ENTRYPOINT ["top", "-b"]

### [CMD](https://docs.docker.com/engine/reference/builder/#cmd)


> __This command specifies the default arguments in ENTRYPOINT, if any, which can be overridden by the user during `docker run`.__

*__Forms__*

> `CMD ["executable","param1","param2"]` (specify `executable` as `entrypoint`; the whole command can be overridden)

__or__

> `CMD ["param1","param2"]` (as default parameters in ENTRYPOINT; only these can be overridden)

__or__ 

> `CMD command param1 param2` (shell form; this form is __discouraged__ because it cannot be overridden by users)

In [None]:
FROM ubuntu
ENTRYPOINT ["top", "-b"]
CMD ["-c"]

Now, if we `run` the container from the above image, command '`top -b -c`' will run.

Things to note:
- `top -b` __will always run.__
- `-c` can be replaced with another flag/command via `docker run`.

To improve clarity, we examine the interaction between `CMD` and `ENTRYPOINT` (see the figure below).

Note that `/bin/sh -c` is simply a command that executes the proceeding code in the terminal.

![](images/docker_entrypoint_cmd_interaction.png)

### [COPY](https://docs.docker.com/engine/reference/builder/#copy)


> __This command allows users to specify the file(s) or directories that would be copied into the image from the host system.__

> `COPY <src> <destination>`

Often, '`COPY . .`' is used, which copies file(s) from the context location to the current working directory inside the container. 

Essentially, there are two file structures between which we are moving files. The file system referred to by the first argument to `COPY` is the build context (i.e. where `docker build` is run from). The file system referred to by the second argument to `COPY` is the file system within the docker container.

## Other Commands

Other notable commands include
- `LABEL <key>=<value>`, which allows the addition of metadata to images (such as author, maintainer, contact information, etc.).
- `WORKDIR dir`, which changes the working directory.
- `ENV <key>=<value>` - environment variable readable throughout the concrete build stage
- `EXPOSE <port>` indicates which port the container listens on. Note that it does not actually expose the port, but instead acts like a form of documentation for the user. They should read it, and then expose the port themselves using the `-p` flag on the `docker run` command. For example, `EXPOSE 80` indicates that port `80` should be exposed to the outside of the container so that it can be used for communication.

## Command Tips



### Using the cache

> __Some commands invalidate the cache.__ When this occurs, every step following it must be re-run when the image is created.

Consider the example Dockerfile below (similar to the first one):

In [None]:
FROM ubuntu:18.04

RUN apt-get update
COPY . .

RUN apt-get install -y --no-install-recommends python3
RUN rm -rf /var/lib/apt/lists/*

Now, regardless of any occurrence, `python3` will be installed during each `docker build`.

> This occurs because Docker has no mechanism for ascertaining if the `context` for the `COPY` command has changed.

Instead, we can use what we learnt at the beginning __since Python installation is not dependent on the context__.

> __If possible, `COPY` statements should be inserted after setting up the OS dependencies.__

### Chaining commands

> Whenever possible, __multiple commands should be chained using `&&`__ so they can all be part of a single `RUN` directive.

Docker works similarly to `git` in that __it only stashes changes (additions) to the system__.

This is often undesirable because
- temporary files are excluded, which increases the image's size (`rm -rf /var/lib/apt/lists/*` shown at the beginning of the lesson).
- containers __are less of a black box__, indicating that attackers can analyse the Docker system easily and find its faults.

__The main command to consider is `RUN` since most commands (e.g. `LABEL`) do not create an additional layer.__

## Image Size

When working with containers, small, self-contained images are preferred to large images.

Consider a case where `10` containers are to be run from a single `1GB` published image. At least `10GB` of bandwidth would be required, which is quite large compared to the case with a `10MB` image.

### Benefits of a small image
- Low latency for users (setup takes considerably less time).
- Easy to recreate a fleet of containers.
- Easy to replace a failed container.

Docker provides a feature that enables us to achieve small-sized images, namely multi-stage builds.


## Multi-stage Builds


Multi-stage builds enable the use of multiple images to create an application (usually the final product).
Generally, the first image builds the application (i.e. creating an artifact), while the second copies it and configures it for running in a container.

> __Multi-stage builds should be employed wherever possible.__

To understand this better, we consider an example `Dockerfile`.

In [None]:
# FIRST (BUILDER) STAGE
FROM golang:1.7.3 AS builder

# Obtain golang code
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html  
COPY app.go .
# Compile as a single executable file called app.
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .

# SECOND STAGE
# Alpine is a very slim file system (few MB) that is highly suitable for lightweight deployment.
FROM alpine:latest  

# Setup only the bare necessities.
RUN apk --no-cache add ca-certificates
WORKDIR /root/
# Copy the self-contained app into the smaller image.
COPY --from=builder /go/src/github.com/alexellis/href-counter/app .
# Setup the application as the container's ENTRYPOINT.
ENTRYPOINT ["./app"]  

### Pros

- A multi-stage build drastically reduces the image size.
- It simplifies maintenance.

### Cons

- It is __mainly suitable for compilable languages__, such as Go and C++.
- It supports __static linking__ (i.e. everything is contained in a single executable).
- It does not typically work nicely with deep learning projects as they commonly require many dependencies.
    - These cons can be addressed by changing the language from Python to something like C++ using `torchscript` or PyTorch's `C++` frontend. However, be warned that this approach comes with many hurdles and is difficult to carry out.


### Image sourcing
> __As an online registry, Dockerhub contains many official and third-party images ready to be run as containers or for use as a base__

![](images/dockerhub_main_page.png)

To download these images automatically, simply run `docker run`:

In [None]:
docker run busybox:latest ls -la

### Tips

- Use official images whenever possible.
- Use the smallest image fitting for the job (e.g. `alpine` instead of `ubuntu` whenever possible).
- __Explore unofficial images__ or roll out yours.

## Docker Commands


Of course, there's lots more to learn about Docker's CLI.

### Important high-level commands
- [`docker image SUBCOMMAND`](https://docs.docker.com/engine/reference/commandline/image/): manages `Docker` images (e.g. building or inspecting).
- [`docker container SUBCOMMAND`](https://docs.docker.com/engine/reference/commandline/container/): manages `Docker` containers (creating from an image, stopping, restarting, killing, inspecting, etc.).
- [`docker volume SUBCOMMAND`](https://docs.docker.com/engine/reference/commandline/volume/): manages volumes (persistent data storage, which might be shared and attached to Docker containers).
- [`docker network SUBCOMMAND`](https://docs.docker.com/engine/reference/commandline/network/): manage `Docker` networks (e.g. creating, inspecting, and listing. This is not covered here; Kubernetes will be employed for network-related tasks.)

### Less-important high-level commands
- [`docker config SUBCOMMAND`](https://docs.docker.com/engine/reference/commandline/config/): configures Docker.
- [`docker stack`](https://docs.docker.com/engine/reference/commandline/stack/): manages multiple containers as a whole (not covered here; Kubernetes is used instead.)
- [`docker secret`](https://docs.docker.com/engine/reference/commandline/secret/): manages secrets (such as passwords and other sensitive data inside containers/images).
- [`docker system`](https://docs.docker.com/engine/reference/commandline/system/): manages Docker itself (the amount of space expended, image/container cleaning, etc.).

Note that these `SUBCOMMAND`s are also available inside `docker` (e.g. `docker image build` is equivalent to `docker build`); thus, they may be a source of confusion.

> __Explore the documentation for `docker build` instead of `docker image build` as it has more information; note, however, that the latter is considerably more readable.__


### Docker image build


> `docker image build [OPTIONS] PATH | URL | -`

As shown previously, this command can be employed to `build` an `IMAGE` from `Dockerfile` __and__ `context`.

Apart from building from local, it is possible to build from
- `github`: `docker build github.com/creack/docker-firefox`.
- `tar.gaz`: `docker build -f ctx/Dockerfile http://server/ctx.tar.gz` (here, the context is on a different server).
- `stdin` (no context in this case): `docker build - < Dockerfile`.

#### Options

- `-t` adds a tag to the image (__it should always be used__): `docker build -t whenry/fedora-jboss:latest -t whenry/fedora-jboss:v2.1 .` (multiple tags supported).
- `-f` specifies different files: `docker build -f dockerfiles/Dockerfile.debug -t myapp_debug .` (useful for separate image production, testing, and debugging).
- `--build-arg` passes arguments to the build stage (`ARG` above): `docker build --build-arg HTTP_PROXY=http://10.20.30.2:1234 .`




### Docker container run



This is considered the most important command, and it has many options.

> __REMINDER:__ Docker runs processes in isolated containers. A container is a process that runs on a host. 

> `docker container run [OPTIONS] IMAGE[:TAG] [COMMAND] [ARG...]`

Using `OPTIONS`, developers can override the defaults set by the image creator, including, but not limited to,
- detached or foreground running.
- network settings.
- runtime constraints.
- command run.

> `COMMAND` specifies the command to be passed to the image entrypoint.

> `ARG` are the arguments passed to the command.

#### Running a container interactively

> `docker container run` can attach streams (`STDIN`, `STDOUT`, `STDERR`) of the container and attach a terminal to facilitate interaction with the container.

- `-a NAME_OF_STREAM`
- `-t` allocates a pseudo TTY (terminal).
- `-i` keeps STDIN open even when not attached to the CLI.

`/bin/bash` specifies the entrypoint of the TTY we have attached.

### Options

- `--name NAME` specifies the name of the container (__this should always be done__).
- `--rm` removes the container after exit (__usually do this__); otherwise, it will prevail in the OS not doing anything.
- `-d` runs in a detached mode (default runs in the foreground, equivalent to `-d=false`).
- `-m` sets the memory limit (`docker run -it -m 300M ubuntu:14.04 /bin/bash`); there are also flags for other resources.
- `-e NAME=VALUE` passes the environment variable to the container (`export today=Wednesday; docker run -e "deep=purple" -e today --rm alpine env` would enable the use of `$today` inside the container).
- `--entrypoint` overrides the default image entrypoint (`docker run -it --entrypoint /bin/bash example/redis`); __it can be reset via `--entrypoint=""`.__

> For more information, check out [`docker run` reference](https://docs.docker.com/engine/reference/run/).

### Exit codes

> If `docker run` fails, inspect the return code to determine the location of the bug.

- `125`: error within the daemon (e.g. wrong flag passed; `docker run --foo busybox`).
- `126`: contained comment cannot be __invoked__ (`docker run busybox /etc` - a directory, not a command).
- `127`: contained comment cannot be found (`docker run busybox foo` - no command `foo`).

> Otherwise, the return code of the contained comment will be returned (usually `0` if executed correctly).

## Docker Volumes

__Docker may create artifacts__ (such as metrics from training or data after preprocessing).

There are __two approaches for retrieving these artifacts from containers__:
- using the `docker container cp` command
- using volumes

> __Volumes are persistent storage spaces shared between the host machine and Docker container(s).__

### Pros
- Data sharing between containers and hosts (e.g. the data-preprocessing container creates the datasets, while the neural-network container trains the model on it).
- It is possible to copy to/from the containers and perform a live update of their data contents.
- It is possible to set the volume to be readable only for increased security.

### Docker volume create

> This command creates a volume that __includes the contents of the directory in which it was created.__

In [None]:
docker volume create docker_lesson

Now, we can __mount__ the volume to the `/lesson` directory inside the container and list its contents:

In [None]:
docker container run --rm -v docker_lesson:/lesson busybox ls /lesson

### Mounting

Now, we explore a few approaches to `mount` the volume using `--mount`:

In [None]:
docker run \
  --name devtest \
  --mount source=myvol2,target=/app \
  nginx:latest

## Cheat Sheet 

Here are some common Docker commands and their explanations.

In [None]:
# Images
alias di="docker image" # General for docker images

alias dib="docker image build" # Build docker image
alias dil="docker image ls" # List docker images (check --help)
alias dip="docker image push" # Push NAME image
alias dirm="docker image rm" # Remove NAME image
alias dirmall="docker image prune -a" # Remove all images not used by the containers

# Containers
alias dc="docker container" # General for docker containers

alias dcr="docker container run" # Run container from an IMAGE image
alias dccp="docker container cp" # Copy data from src to dst inside container
alias dce="docker container exec" # Execute COMMAND inside container
alias dci="docker container inspect" # Inspect container
alias dck="docker container kill"  # Kill container
alias dcl="docker container ls" # List all available containers
alias dcs="docker container stop" # Stop running container

alias dcrma='docker ps -a -q | xargs sudo docker rm' # Remove all non-running containers

# Volumes
alias dv="docker volume" # General volume command

alias dvc="docker volume create" # Create NAMED volume
alias dvl="docker volume ls" # List all available volumes
alias dvrm="docker volume rm" # Remove volumes

# List files inside the created volume
dvi(){
  docker run --rm -i -v="$1":/tmp/myvolume busybox find /tmp/myvolume
}

# System
alias dsi="docker system info" # Display system-wide information
alias dsdf="docker system df" # How much images and containers take in terms of space
alias dsp="docker system prune" # Remove every unused image/container

## Conclusion

At this point, you should have a good understanding of
- how to ignore unwanted files in our Docker build.
- the role of each of the Docker commands in building an image.
- how to create multi-stage Docker builds. 
- how to create persistent volumes to retrieve data from a container to our local machine.