Introduction to Docker

A brief introduction to Docker Containers

🚧

Docker Logo

What is Docker? 🐳

Docker, a platform based service (PaaS) uses OS-level virtualization to deliver software in packages called containers.

In other words, Docker is a platform used to containerize your software. With it, you can build your application, packaging it with all dependencies required for your application into a container. These containers can then be easily shipped to run on other machines

Docker Architecture

Representation of Docker Architecture

Docker Containerized Apps

The Docker software as a service consists of three components:

Software: The Docker Engine includes:

The Docker daemon, called dockerd, which is a process that manages Docker containers and handles container objects. The daemon listens for requests sent via the Docker Engine API.
The Docker client program, called docker, provides a command-line interface (CLI), that allows users to interact with Docker daemons.

📝 Note: (click to open)

There is a Docker Engine called Docker Desktop, available for MacOS/Windows/Linux that includes a Docker daemon and Docker client (CLI) and other tools to run locally on your machine.

Objects:

Docker objects are various entities used to assemble an application in Docker. Objects are of three classes:

A Docker container is a standardized, encapsulated environment that runs applications, and is managed using the Docker API or CLI.
A Docker image is a read-only template used to build containers used to store and ship applications.
A Docker service allows containers to be scaled across multiple Docker daemons, resulting in what is known as a Docker swarm, a set of cooperating daemons that communicate through the Docker API.

An important distinction is between base and child images.

A base image is an images that has no parent image, usually are images with some OS version installed (busybox, alpine, ubuntu, centos, amazonlinux, debian, etc.)
A child image is build on a base image with some extra functionality integrated.

Then we can find official and user images.

Official images are maintained and supported by the staff at Docker.
User images are images build on base images with extra functionalities, created and shared by general users. These images can be identified as user/image-name. You can find certified users and general users.

Registries: A Docker registry is a repository for Docker images.

Docker clients connect to registries to download ("pull") images for use or upload ("push") images that they have built.
Container registries can be public or private. Two main public registries are Docker Hub, and Gitlab Registry. Docker Hub is the default registry where Docker looks for images.

Docker images at Docker Hub 🐳

Docker Hub (https://hub.docker.com), is the official repository for images.

Some popular images are:

Hello World. Used for testing your Docker Engine installation. (To download it type: docker pull hello-world).
Alpine. It is a minimal Linux image less than 5MB in size. (To download it type: docker pull alpine).
Ubuntu. Is an Ubuntu Linux distribution. (To download: docker pull ubuntu)
rocker/rstudio. RStudio image. (To download: docker pull rocker/rstudio)
jupyter/datascience-notebook. Jupyter Notebook Data Science Stack. (To download: docker pull jupyter/datascience-notebook)
pangeo/pangeo-notebook. Pangeo big data geosciences. (To download: docker pull pangeo/pangeo-notebook).

Getting Started with Docker 🐳

ℹ️ First, you need to open a user account on Docker.com.

Next, to use Docker, you need to have either installed Docker Desktop on your machine or have access to a Github Codespaces developing environment in an Organization Github.

Docker Desktop will install all docker tools for container development and deployment. It provides with all needed software for running containers in our local machine.

Docker Desktop

It is a good practice, to also have an integrated code development environment VS Code Editor installed in your computer, that allows you to easily synchronize files with Github repositories. Please install it on your machine.

Aside that you can develop code using VS Code, you can also add a collection of extensions to integrate Container development and deployment (Docker, Kubernetes, Google Cloud, Azure and others), work in Data Science (Python, Jupyter Notebooks, PyTorch, Azure ML and more).

VS Code IDE

We will assume that you have your environment ready to start working with Docker.

Exercise 1. Testing your Docker environment with the hello-world docker image.

Open VSCode, and start a Terminal.

You can do a simple test, running the hello-world docker application.

docker run hello-world

The docker API will download the hello-world latest image and run it as a container and you should be getting back a message as a result of the action

Hello from Docker!
This message shows that your installation appears to be working correctly. ...

And it explains all the processes that were involved in printing the Hello World message to your Terminal.

Next, we can enter the command: docker ps -a and the docker system will show a log history of containers that have been executed. Of the returned information, we need to note the CONTAINER_ID. We can clean these cache memory by executing the command docker rm CONTAINER_ID, it is sufficient to enter the first 3 unique characters of the CONTAINER_ID, we do not need to enter the full ID.

Basic Docker commands 💻

The main docker command option is --help: docker --help

Initial commands:

Command	Description
`docker --help`	List all Docker command options
`docker create IMAGE_NAME`	Searches Docker Hub for that image, downloads it to your system and creates a stopped container.
`docker run [Options] IMAGE_NAME`	If image is not found, will search Docker Hub, download it and run it.
`docker rename CONTAINER NEW_NAME`	Rename a container.
`docker search TERM`	Searches Docker Hub for images.

Container and Image manipulation:

Command	Description
`docker container --help`	List Docker container options
`docker container ls`	List containers
`docker ps`	List the running containers
`docker ps -a`	Lists all active containers status
`docker container rm CONTAINER_ID` or `docker rm CONTAINER_ID`	Removes a container by ID
`docker image --help`	List Docker image options
`docker image ls`	Lists available local static docker images
`docker image rm IMAGE_ID` or `docker rmi IMAGE_ID`	Removes a specific static docker image

From the terminal you can list the running containers by typing: docker ps -a or docker container ls. These commands will return information of the running containers (CONTAINER ID, IMAGE, COMMAND, CREATED, STATUS, PORT, NAME). The CONTAINER ID and IMAGE will be used as an argument for other docker commands. The CONTAINER_ID and NAME tags, change every run.

📝 Note: (click to open)

(We can substitute the full `CONTAINER_ID` or `IMAGE_ID` string, with the first 3 or 4 unique characters of the ID)

Docker Start/Stop/Restart/Pause/Unpause CONTAINER_ID:.

Command	Description
`docker start CONTAINER_ID`	Starts a stopped container
`docker stop CONTAINER_ID`	Stops a running container
`docker restart CONTAINER_ID`	Restarts a stopped container
`docker pause CONTAINER_ID`	Pauses a running container
`docker unpause CONTAINER_ID`	Resumes a paused container

Docker Volumes:

Command	Description
`docker volume --help`	List Docker volume options
`docker volume ls`	List available volumes
`docker volume create _myvol_`	Create a local volume named `_myvol_`
`docker volume inspect _myvol_`	Returns volume general description
`docker volume rm _myvol_`	Removes the specific volume

List of instructions of Dockerfile

If we start with a Docker image base, and we would like to customize it by adding some additional packages to fit our needs, the we need to configure a Dockerfile to build a new Docker image. The Dockerfile is a set of line instructions and does not have any file extension.

Instruction	Description
`FROM`	Initializes a new build stage and sets the base image.
`ARG`	Defines a variable that users can pass at build-time to the builder with the `docker build` command. The `ARG` variable can be used before `RUN` to pass a default value.
	`ARG VERSION=latest`
	`FROM base:${VERSION}`
`RUN`	Executes any command in a new layer on top of the current image and commits the results.
	`RUN <command>`. The command runs in a shell.
	`RUN ["executable", "param1", "param2"]`. The exec form.
`CMD`	Also has 3 forms:
	`CMD ["executable", "param1", "param2"]`. The exec form (preferable).
	`CMD ["param1", "param2"]`. As default parameters to ENTRYPOINT.
	`CMD command param1 param2`. The shell form.
`LABEL`	Adds metadata to an image.
`EXPOSE`	Informs Docker that the container listens on the specified network ports at runtime. The port number must be included in the `docker run -p 80:80`
`ENV`	Sets an environment variable value.
`ADD`	Copies new files, directories or remote file and adds them to the filesystem of the image at the path.
`COPY`	Copies new files or directories from `<src>` and adds them to the filesystem of the image at the path `<dest>`.
	Has 2 forms:
	`COPY <src> ... <dest>`
	`COPY ["<src>",...,"<dest>"]`
`ENTRYPOINT`	has 2 forms:
	`ENTRYPOINT ["executable", "param1", "param2"]`. The exec form (preferable)
	`ENTRYPOINT command param1 param2`. The shell form.
	You can override the default value with `--entrypoint` and an executable command.
`VOLUME`	Creates a mount point for exterior mounts.
	Format can be `VOLUME ["/home/user"]` or `VOLUME /home/user`.
`USER`	Sets the user name (or UID) to use when running the image and for any `RUN`, `CMD`, and `ENTRYPOINT` instructions that follows in the `Dockerfile`.
`WORKDIR`	Sets the working directory path for any `RUN`, `CMD`, `ENTRYPOINT`, `COPY`, and `ADD` instructions followed in the `Dockerfile`.

Exercise 2. Running an Alpine Linux docker image.

Next we will run a minimal Docker image based on Alpine Linux named alpine, which is only 5 MB in size. 🐧

Let's create a local volume called myvol.

docker volume create myvol

And type docker volume ls and find out what docker volume inspect myvol.

The local volumes can be assigned to a docker container using the -v myvol:/tmp option, where myvol will track changes in docker container /tmp directory.

Run

docker run -it --rm -v myvol:/tmp --name AlpineLinux alpine

This command will download the latest docker image of Alpine Linux, and run a docker container, where we have introduced the docker options:

-it, which runs ion interactive mode inside the terminal.
The --rm option tells the docker CLI to remove the cached image from memory when we finish.
The -v myvol:/tmp assigns equivalency between my volume myvol and the /tmp directory in the docker container.
The -name option assigns a specific static name AlpineLinux to our running docker container. If we don't specify a name, the docker system will assign one in a random fashion every run.

Next, explore the Alpine container doing the following:

Use the apk update command, to update the available packages list.
If you want to use the nano editor, you will find that it is not installed. Use apk add nano to install it.
Change directory to /tmp and edit a sample.txt text file and save it.

Unfortunately when the Alpine docker container stops, we will loose all of our work. We need to find the way of importing and saving our work in an external work directory, available from the docker container.

To save a copy of the file we created inside the Alpine docker container, we can use the docker container cp command to copy files/folders between a container and the local filesystem.

docker container cp AlpineLinux:/tmp/sample.txt .

Will copy the file sample.txt from the /tmp directory of docker running container into the present working directory . in the terminal you are working. The copy command works in both directions to get information into the docker container or out of it.

Once you finish using this container, from a terminal enter docker ps -a to find the CONTAINER_ID, then enter docker stop CONTAINER_ID. Remember you can stop it, pause/unpause or restart later.

Customizing a Docker image 🐳

Say, we want to enhance out Alpine Linux base image and add an editor and also being able to compile code in C. So, we proceed to add a nano editor and an essential C developer kit.

Create/Edit a file named Dockerfile in one of your directories.

# The base image
FROM alpine:latest

LABEL author="your-name" 
LABEL email="your@email-address"
LABEL version="v1.0"
LABEL description="This is your first Dockerfile"
LABEL date_created="2022-05-10"

# Install dev environment (editors & gcc compilers)
RUN apk update && \
    apk add nano && \
    apk add alpine-sdk

Then we can build a new customized Alpine Linux for software development, using the following command

docker build -t linux/alpine-sdk:latest .

The -t flag option is the tag name linux/alpine-sdk:latest for the customized docker image. The last . in the above command tells it the location of the Dockerfile, in this case is the present working directory.

After running the above command, we can see that now we have a new docker image in our list.

docker image list

Now we can run the new docker image and test it.

docker run -it --rm \
  --name alpine-sdk -v myvol:/home/src \
  linux/alpine-sdk:latest

In your Docker container change directory to /home/src. Edit/copy the usual Hello World! in C (hello.c), using the nano editor:

// Simple C program to display "Hello World"
  
// Header file for input output functions
#include <stdio.h>
  
// main function -
// where the execution of program begins
int main()
{
  
    // prints hello world
    printf("Hello World! \n");
  
    return 0;
}

Then compile and run it.

gcc -o hello hello.c
./hello

and see if your program worked.

Exercise 3. Running a RStudio Server.

If we use RStudio for data analysis, then the Docker image rocker/rstudio can be used.

To run it, from a terminal we enter the following command:

docker run -it --rm \ 
   -v $(pwd):/home/rstudio -e PASSWORD=rs_rocks \ 
   -p 8787:8787 rocker/rstudio:latest

Where the options we have used are:

-it, it keeps the process running on the used terminal, where all the process log is being received.
--rm, will delete the container image after it stops.
-v "$(pwd)":/home/rstudio, will link the present working directory of the terminal running the process and the rstudio container directory /home/rstudio.
-e PASSWORD=rs_rocks, we are setting a login password for the default user rstudio.
-p 8787:8787 are the internal and external ports to connect via the browser.

We then connect via browser to the RStudio Docker container landing page: http://localhost:8787

To login use username rstudio and the set password rs_rocks.

Since we mapped our local present working directory to the /home/rstudio directory, all our R scripts can be accessed from there. Any saved edit will be saved in our local directory.

We are all set.

To stop the RStudio session, we need to save all our work from inside the RStudio container and then quit our session. Then use the command docker stop CONTAINER_ID to stop it.

See more information and R Docker Images options in The Rocker Project. 🚀

Exercise 4. Running a Jupyter Notebook.

Next, we show how to start a personal Jupyter Lab Notebook server in a local Docker container running the jupyter/datascience-notebook Docker image.

 docker run -it --rm \
   -v "${PWD}":/home/jovyan/work \
   -p 8888:8888 jupyter/datascience-notebook

Where the following options are used:

-it, it keeps the process running on the used terminal, where all the process log is being received.
--rm, will delete the container image after it stops.
-v "${PWD}":/home/jovyan/work, will link the present working directory of the terminal running the process and the container directory /home/jovyan/work, which will appear on the Files section of the Jupyter Notebook.
-p 8888:8888 refers to the numeric port mapping, -p External:Internal. The external port number will be used to connect to the local machine running the container via http://127.0.0.1:8888/lab?token=TOKEN_ID.

By running the above command, the terminal will receive all messages off the running container. Look for lines similar to the following, that returns the instructions how to access your container.

To access the server, open this file in a browser:
        file:///home/jovyan/.local/share/jupyter/runtime/jpserver-7-open.html
    Or copy and paste one of these URLs:
        http://43ec85338263:8888/lab?token=e53fd4dfeaca90cc78df58d1bde9bfb941f94041052eb5f6
     or http://127.0.0.1:8888/lab?token=e53fd4dfeaca90cc78df58d1bde9bfb941f94041052eb5f6

Copy the last line and copy it into a Web Browser Tab, and you are ready to start working.

You can copy a Jupyter Notebook into the working directory where your terminal is running, and it will show inside the Jupyter container, since the local directory is mapped to the working directory in the Jupyter Notebook. All changes made will be saved to the local machine.

To stop the Jupyter Notebook container, you can use the usual way of exiting by doing twice Ctrl-C to shut down the kernel, or you can use the standard docker stop CONTAINER_ID.

You can read more information about how to use this Docker image at Jupyter Docker Stacks

Basic References 🐳

Official Docs

Supplementary

CyVerse: Container Camp Spring 2022.
Docker for beginners. Prakhar Srivastav.
Docker Cheat Sheet
Awesome Docker
Why use Docker Containers for Machine Learning Development. Shashank Prasanna, AWS.

University of Arizona, D7 Data Science Institute, 2022.

Python & General Tools for Data Science Resources

Workshops Home

General Tools

Python

Data Science in Geosciences

Data Science Tools and Methods in Earth Sciences

Carlos Lizárraga, Data Lab, Data Science Institute, University of Arizona.

CC BY-NC-SA 4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly