# What is Docker?
  
Gain an Introduction to Docker
  
Docker is a tool used to develop, run, and ship containers. It’s an essential part of every data professional’s toolbelt, helping to create robust, secure, and scalable applications or workflows.
In this course, you’ll become a Docker pro, gaining hands-on experience using Docker CLI.
Explore the Docker Basics
  
Learn the Docker basics and understand how to create and manage containers using Dockerfiles and instructions. After an introduction to containers and when they are useful, you’ll learn the Docker terminology and get hands-on experience with Docker commands using the Docker Command Line Interface.
Learn About Docker Containers & Images
  
As you progress, you’ll learn how to create and manage Docker containers using Dockerfiles and Dockerfile instructions. To wrap up, you’ll learn Docker image security best practices to make your images safe and secure.
  
You’ll start this course with an introduction to containers, what they are, their uses, and their advantages. You'll learn about Docker Engine, Docker’s tool to create, run and manage containers. Then you’ll learn the difference between a container and an image and compare containers and virtual machines, learning about their differences.

## Resources
  
**Notebook Syntax**
  
<span style='color:#7393B3'>NOTE:</span>  
- Denotes additional information deemed to be *contextually* important
- Colored in blue, HEX #7393B3
  
<span style='color:#E74C3C'>WARNING:</span>  
- Significant information that is *functionally* critical  
- Colored in red, HEX #E74C3C
  
---
  
**Links**
  
[Docker Website](https://www.docker.com)  
  
---
  
**Notable Functions**
  
<table>
  <tr>
    <th>Index</th>
    <th>Operator</th>
    <th>Use</th>
  </tr>
  <tr>
    <td>1</td>
    <td>NaN</td>
    <td>NaN</td>
  </tr>
</table>
  
---
  
**Language and Library Information**  
  
CLI (Command Line Interface)
  
---
  
**Miscellaneous Notes**
  
NaN

## Containers and their advantages
  
In this course we’ll give you hands-on experience managing, running, and creating containers. By the end of this course, you’ll be able to use containers in your own workflows and understand when you should.
  
**Prerequisites**
  
Before taking Introduction to Docker, we advise completing the prerequisite course. A basic understanding of how to work with Shell is needed in this course. We'll use nano extensively to edit files, together with several commands to find our way around the file system.
  
**Containers**
  
In this first chapter, we’ll give an introduction to containers, Docker, and give insight into the differences between containers and virtual machines. A container is a portable computing environment. It contains everything needed to run a workflow or application, including dependencies, code, and configuration.
  
**Making it less abstract**
  
We can think of a container like a new computer on which we copy our code or workflow and install all needed dependencies. Once everything is installed and configured, we make backups of that computer. Imagine we could now use that backup on another computer, and everything we installed and configured would work just like it did originally. This backup is similar to a container; both are a packaging of code together with its dependencies (including the operating system) and configuration. That's where the analogy ends since a container has many advantages over something like a backup.
  
<center><img src='../_images/containers-and-their-advantages-docker.png' alt='img' width='740'></center>
  
**Containers run identically every time**
  
One of the main benefits of containers is that whenever a container is run, the workflow or application it contains will behave identically. That is to say; containers provide reproducibility. Reproducible means we have a container X that gives an output Y; every time it is run, now, in five minutes or two months, for the same input, it will give identical output.
  
<center><img src='../_images/containers-and-their-advantages-docker1.png' alt='img' width='740'></center>
  
**Containers run identically everywhere**
  
The second main benefit of containers is that wherever a container is run, it will behave identically, containers provide portability. Portable means the container will run the same on our computer, your colleague's computer, and the cloud. No risk of removed dependencies, lost configuration files, or other changes that break our application.
  
<center><img src='../_images/containers-and-their-advantages-docker2.png' alt='img' width='740'></center>
  
**Isolation**
  
This is possible because of isolation between the container and the rest of the environment; running a container will have no impact outside of the container and vice versa. Anything happening outside the container will not affect the result of a container. A container has limited resource access to the operating system it is running on; everything else is kept separate.
  
<center><img src='../_images/containers-and-their-advantages-docker3.png' alt='img' width='740'></center>
  
**Containers provide security**
  
Because containers are completely isolated from each other, even if the security of one container is compromised, the other containers on the same host, and the host itself remain secure since there is no direct communication between containers. The compromised container still only has access to limited resources on the host and nothing more. This makes containers not only great for safely deploying applications but also for quickly prototyping workflows. You can be sure that whatever you do in the container won't affect anything outside of it and that you can start with a clean slate at any point.
  
<center><img src='../_images/containers-and-their-advantages-docker4.png' alt='img' width='740'></center>
  
**Containers are lightweight**
  
Containers not only provide security, portability, and reproducibility. An additional advantage of containers is that they are lightweight, or in other words, use few extra resources in comparison to running an application outside of a container. Containers have little overhead compared to alternatives that also provide isolation. This is especially relevant when comparing containers to virtual machines, which we will do in detail later on.
  
- Security
- Portability
- Reproducibility
- Lightweight (in comparison to running an application:)
- - Outside of a container
- - Using a virtual machine
  
**Containers and data science**
  
All these advantages make containers relevant for data science; containers make any task or workflow automatically reproducible not only on our own machine but also everywhere else. Containers help us avoid many issues when sharing our work; dependencies are automatically included, and so are datasets. Most importantly, we can be sure our code will work on our colleague's machine. Additionally, the lightweight nature of containers makes them easier to share than alternatives.
  
- Automatically reproducible
- Dependencies are automatically included
- Datasets can be included
- Code will work on your colleagues machine 
- Easier sharing than alternatives
  
**Let's practice!**
  
You now have a grasp on what a container is, its advantages, and why these advantages are relevant in data science. Let's practice!

### What is included in a container?
  
A container is a portable computing environment, but what is included exactly?
  
---
  
Possible Answers
  
- [ ] A container contains dependencies and code needed to run a workflow or application, but you have to configure your application from outside the container.
- [ ] A container contains the configuration and code needed to run a workflow or application, but you have to install dependencies separately.
- [ ] A container contains dependencies and configuration needed to run a workflow or application, but you can not put your code into a container.
- [x] A container contains everything needed to run a workflow or application; dependencies, code, and configuration.
  
Exactly! A container includes everything needed to run a workflow or application!

## The Docker Engine
  
Docker is an open-source tool that allows us to create, run and manage containers. Docker first launched in 2013, and even though containers had existed for more than a decade, it was with the launch of Docker that containers exploded in popularity.
  
**Docker ecosystem**
  
Over time, Docker has grown to be part of a large ecosystem of tools around containers; we will focus on Docker Engine, which is everything you need to create, run and manage containers. Other parts of this ecosystem are, for example, Docker Compose, a tool for defining and running multi-container Docker applications, and Kubernetes, a system for container scheduling and management.
  
**Docker Engine**
  
Docker Engine has two main parts: server and client. The client, called Docker client, is a command line interface used to talk to the server. The server is a background process that requires no user interaction, which is called a daemon, a term we will encounter repeatedly to reference the Docker server. In addition to the Docker client and daemon, Docker Engine also includes so-called API specifications, which define how you can interact with the Docker Daemon. These APIs are not only used by the Docker client to talk to the daemon but also specify how other applications can work with the daemon.
  
1. [Docker Engine](https://docs.docker.com/engine/)
  
<center><img src='../_images/the-docker-engine.png' alt='img' width='740'></center>
  
**The Docker daemon**
  
The Docker daemon is responsible for managing all Docker objects, such as images, containers, and more. However, we can't directly tell the daemon what to do; we need a client to give us a human-usable interface to it. Here the Docker command line interface is the default option, but there are others, like Docker Desktop, which gives us a Graphical User Interface to manage our containers.
  
1. [Docker Engine](https://docs.docker.com/engine/)
2. [Docker Architecture](https://docs.docker.com/get-started/overview/#docker-architecture)
  
<center><img src='../_images/the-docker-engine1.png' alt='img' width='740'></center>
  
**Images and Containers**
  
The daemon manages both images and containers, but there is a difference between them. While an image is a blueprint or recipe, like an idle copy of a hard drive with all the software we want to run, a container is a running image, like a copy of that same hard drive plugged into a running computer. We could create an image with Ubuntu and python3.9 installed. Once we start this image, we'll have a running container with Ubuntu and python3 point 9 where we can execute our code.
  
1. [Docker Engine](https://docs.docker.com/engine/)
2. [Docker Architecture](https://docs.docker.com/get-started/overview/#docker-architecture)
  
<center><img src='../_images/the-docker-engine2.png' alt='img' width='740'></center>
  
**Containers are processes**
  
Up until now, we talked about containers abstractly. To better understand containers, we can think of them as processes. When we start a container, a process is started, just as when we start a text editor, Spotify, or any other application. What makes a container process different is its permissions to resources like the file system, memory, and network.
  
<center><img src='../_images/the-docker-engine3.png' alt='img' width='740'></center>
  
For many resources of a container process, not only is access restricted, but they are also undetectable to the process. For example, instead of seeing all the files on your hard drive, the process is given access to only a single folder and cannot see files outside of that folder.
  
<center><img src='../_images/the-docker-engine4.png' alt='img' width='740'></center>
  
**Containers are isolated processes**
  
Not only blocking access to but also hiding resources may seem like a small difference, but it allows running a process that is isolated from the rest of the machine. This lets an entirely separate operating system run inside the process. A container runs its own operating system instead of using the host operating system. The Docker daemon ensures that the OS running in the container is unaware of other containers and the host OS. The operating system inside the container can start and manage it's own processes without interfering with any processes running on the host OS. In other words the operating system in the container is separated and kept unaware of anything happening outside itself, isolated from anything on the host and other containers.
  
<center><img src='../_images/the-docker-engine5.png' alt='img' width='740'></center>
  
**Let's practice!**
  
Let's ingrain our understanding of the Docker ecosystem and Docker Engine with some exercises!

### Which parts does the Docker Engine consist of?
  
Docker is part of a now much larger ecosystem of applications built to support containers in various ways. Starting out on our journey to understand and use containers, we’re focusing on the Docker Engine and its components. Of which parts does the Docker Engine consist?
  
---
  
Possible Answers

- [ ] Just the Docker daemon, other applications like the Docker client can then use the Docker daemon to work with containers.
- [ ] The Docker daemon, Docker client, and Docker Compose. The latter is used for multi-container applications.
- [ ] The Docker daemon, Docker client, Docker Compose, and Kubernetes are everything you need to run complex container setups in production environments.
- [ ] The Docker client and daemon together make up the Docker client-server interface of Docker Engine.
- [x] The Docker client, daemon, and its API specifications.
  
Correct! The Docker client and the Docker daemon make up the two main parts of Docker its server-client architecture. The API specifications make it clear how the client and external applications can talk with the daemon.

### Containers and images
  
Getting an intuitive understanding on Docker and the many new concepts surrounding it can be difficult, especially the difference between images and containers can seem blurry because they are closely related. Did you fully grasp the difference?
  
---
  
Possible Answers
  
- [ ] An image is just a different name for a container, but they really mean the same thing.
- [ ] Containers are what you use in the Docker ecosystem; images are just the more general name.
- [x] Images are blueprints, while a container is a running image, the image is just the template it got started from.
- [ ] Containers are blueprints; while an image is a running container, the container is just the template it got started from.
  
Exactly! It seems like you got the difference between images and containers figured out.

## Containers vs. Virtual Machines
  
When learning about containers, the comparison to Virtual Machines is inevitable.
  
**Containers and Virtual Machines**
  
Both Virtual Machines and containers aim to run software side by side on the same physical machine safely, that is, without interfering with each other. In that sense, Virtual Machines achieve many of the same goals as containers. However, from a technical perspective, there is a big difference making their use cases different.
  
<center><img src='../_images/containers-vs-virtual-machines.png' alt='img' width='740'></center>
  
**Resource Virtualization**
  
Running software side by side on the same physical machine safely is done using virtualization. Virtualization means that resources like RAM, CPU, or Disk can be split up and look like separate resources to the software using them. For example, a hard disk of 100GB can be virtualized to look like four hard disks of 25GB. This way, different pieces of software can each use 25GB, yet they can't interfere with the other parts. Both containers and VMs are virtualization technologies.
  
<center><img src='../_images/containers-vs-virtual-machines1.png' alt='img' width='740'></center>
  
**Containers vs Virtual Machines**
  
The key difference between containers and virtual machines is that virtual machines virtualize the entire machine down to the hardware. Whereas with containers, their virtualization happens in a software layer above the operating system level. This means separation in VMs is better as only the hardware is shared, while for containers the host operating system is also shared.
  
<center><img src='../_images/containers-vs-virtual-machines2.png' alt='img' width='740'></center>
  
**Security of Virtualization**
  
This better separation of VMs over containers makes them more secure and points us to the main drawback of containers, that there is always a possibility for attackers to get access to the host OS. This, in turn, can give access to all containers running on the same machine. Since attackers breaking out of a container to the host operating system is the main risk of using containers, Docker and other container providers spend extensive resources on making their containers as secure as possible. The risk of attackers accessing the host is limited when using an industry-standard container provider like Docker. Nonetheless, it is worth considering VMs when security is paramount, for example, when working with sensitive data.
  
<center><img src='../_images/containers-vs-virtual-machines3.png' alt='img' width='740'></center>
  
**Containers are lightweight**
  
While containers have a slight disadvantage in the amount of security they provide, there are several advantages of using containers over VMs. One significant advantage is their size in memory and on disk compared to VMs. In other words, containers require less RAM and less disk space. Containers are significantly smaller because they only need to include a small part of a full OS, sharing the rest of the OS with the Host OS and other containers.
  
<center><img src='../_images/containers-vs-virtual-machines4.png' alt='img' width='740'></center>
  
**Advantages of containers**
  
The smaller size is at the base of many advantages of containers. It makes containers faster to start and stop. And also makes containers faster to distribute and to change or update. Because of their small size, there is a large ecosystem of pre-made containers with many popular software applications like programming languages, databases, or web servers pre-installed. In comparison, VMs can quickly become several GigaBytes in size, which means they are often built from scratch for every use case.
  
<center><img src='../_images/containers-vs-virtual-machines5.png' alt='img' width='740'></center>
  
**Advantages of Virtual Machines**
  
Of course, slightly better security is not the only advantage of VMs. If your use case needs a Graphical User Interface, then for now, a VM is the best option; no container supports GUI applications, while VMs support both GUIs and command lines fully.
  
<center><img src='../_images/containers-vs-virtual-machines6.png' alt='img' width='740'></center>
  
**Let's practice!**
  
Now it is your turn to show you understand the difference between Virtual Machines and containers.

### What is virtualization?
  
You just learned about virtual machines and containers, both of which are virtualization technologies, but do you remember what that means exactly?
  
---
  
Possible Answers
  
- [ ] The goal of virtualization is to run multiple pieces of software isolated from each other on the same machine.
- [ ] Virtualization allows an operating system to run on top of another operating system.
- [ ] Virtualization is when system resources like RAM, CPU, or Disk can be ‘virtualized’ and represented as multiple resources.
- [x] All of the above.
  
Exactly, it looks like you get what virtualization stands for!

### How do containers and virtual machines relate?
  
Which of the following answers best describes the relationship between containers and virtual machines?
  
---
  
Possible Answers

- [ ] Containers and virtual machines are two names for different ways to use exactly the same technology.
- [ ] Both are ways to virtualize hardware. However, only virtual machines are really used in industry because they offer much better security than containers.
- [x] Containers and virtual machines are both virtualization technologies, but they have different strengths and weaknesses, making their use cases different.
  
Well done! Correct, both containers and virtual machines have their uses! Even though containers popularity far surpasses that of VMs, VMs are still used extensively today.