# AUA, DS 229 – MLOps
### Week 7 – Introduction to Docker

***

<center><img src="./images/docker_logo.png" width=400 height = 700/></center>

**Docker** is an open-source platform that allows you to create, deploy, and manage applications using containerization technology. **Containerization** is a way of running multiple applications on the same machine in an isolated manner, without interfering with each other.

With Docker, you can package your application and all its dependencies into a container, which can then be deployed to any environment that supports Docker. **This ensures that your application runs consistently across different environments**, making it easier to move between development, testing, and production environments.

***
## The Linux Command Line   

### Navigating the file system  
| Command | Description |
| :------| :-----------|
| `pwd` | path to data files to supply the data that will be passed into templates. |
| `ls` | to list the files and directories |
| `ls -l` | to print a long list |
| `cd /` | to go to the root directory |
| `cd bin` | to go to the bin directory |
| `cd ..` | to go one level up |
|  `cd ~` | to go to the home directory |
  
### Manipulating files and directories 
| Command | Description |
| :------| :-----------|
|`mkdir test`              | to create the test directory  |
|`mv test docker`          | to rename a directory   |
|`touch file.txt`          | to create file.txt  |
|`mv file.txt hello.txt`   | to rename a file   |
|`rm hello.txt`            | to remove a file   |
|`rm -r docker`            | to recursively remove a directory  |
  
### Editing and viewing files  
| Command | Description |
| :------| :-----------|
|`nano file.txt`        | to edit file.txt  |
|`cat file.txt`         | to view file.txt  |
|`less file.txt`        | to view with scrolling capabilities  |
|`head file.txt`        | to view the first 10 lines  |
|`head -n 5 file.txt`   | to view the first 5 lines   |
|`tail file.txt`        | to view the last 10 lines   |
|`tail -n 5 file.txt`   | to view the last 5 lines   |
  
### Searching for text  
| Command | Description |
| :------| :-----------|
|`grep hello file.txt`        | to search for hello in file.txt  |
|`grep -i hello file.txt`     | case-insensitive search   |
|`grep -i hello file*.txt`    | to search in files with a pattern  |
|`grep -i -r hello .`         | to search in the current directory  |
  
### Finding files and directories  
| Command | Description |
| :------| :-----------|
|`find`               | to list all files and directories  |
|`find -type d`       | to list directories only  |
|`find -type f`       | to list files only  |
|`find -name “f*”`    | to filter by name using a pattern  |
  
### Managing environment variables  
| Command | Description |
| :------| :-----------|
|`printenv`           | to list all variables and their value  |
|`printenv PATH`      | to view the value of PATH  |
|`echo $PATH`         | to view the value of PATH  |
|`export name=bob`    | to set a variable in the current session  |
  
### Managing processes 
| Command | Description |
| :------| :-----------|
|`ps`                 | to list the running processes  |
|`kill 34`            | to kill the process with ID 37  |
  
### Managing users and groups  
| Command | Description |
| :------| :-----------|
|`useradd -m john`    | to create a user with a home directory  |
|`adduser john`       | to add a user interactively  |
|`usermod`            | to modify a user  |
|`userdel`            | to delete a user  |
|`groupadd devs`      | to create a group   |
|`groups john`        | to view the groups for john  |
|`groupmod`           | to modify a group  |
|`groupdel`           | to delete a group  |
  
### File permissions  
| Command | Description |
| :------| :-----------|
|`chmod u+x deploy.sh`    | give the owning user execute permission  |
|`chmod g+x deploy.sh`    | give the owning group execute permission  |
|`chmod o+x deploy.sh`    | give everyone else execute permission  |
|`chmod ug+x deploy.sh`   | to give the owning user and group execute permission  |
|`chmod ug-x deploy.sh`   | to remove the execute permission from the owning user and group |

#### Some examples

In [None]:
!pwd

In [None]:
!ls

In [None]:
!mkdir class_notes 

In [None]:
!ls

In [None]:
!mkdir class_notes

In [None]:
!man mkdir

In [None]:
!man mkdir | head -n 5

In [None]:
!man mkdir | tail -n 5

In [None]:
!mkdir -p class_notes

In [None]:
!touch ./class_notes/notes.txt

In [None]:
!ls class_notes/

In [None]:
!echo "Spring break is coming :D"

In [None]:
!echo "Spring break is coming :D" >> ./class_notes/notes.txt

In [None]:
!cat ./class_notes/notes.txt

In [None]:
!echo "Spring break is coming :D" >> ./class_notes/notes.txt

In [None]:
!cat ./class_notes/notes.txt

In [None]:
!echo Yuhu! > ./class_notes/notes.txt

In [None]:
!cat ./class_notes/notes.txt

In [None]:
!rm ./class_notes/notes.txt
!touch ./class_notes/notes.txt
!cat ./class_notes/notes.txt

!echo 1 > ./class_notes/notes.txt
!echo 2 > ./class_notes/notes.txt
!echo 3 > ./class_notes/notes.txt

!cat ./class_notes/notes.txt

In [None]:
!rm ./class_notes/notes.txt
!touch ./class_notes/notes.txt
!cat ./class_notes/notes.txt

!echo 1 >> ./class_notes/notes.txt
!echo 2 >> ./class_notes/notes.txt
!echo 3 >> ./class_notes/notes.txt

!cat ./class_notes/notes.txt

In [None]:
!cat ./class_notes/notes.txt | grep 1

In [None]:
!cat ./class_notes/notes.txt | grep 2

In [None]:
!rm -rf class_notes/

In [None]:
# The PATH variable is an environment variable containing an ordered list of paths 
# that Linux will search for executables when running a command. Using these paths 
# means that we don't have to specify an absolute path when running a command.

!printenv PATH

In [None]:
!echo $PATH

In [None]:
!printenv | head -n 6

In [None]:
!export MY_ENV=MY_VALUE  # Visible only for the current session.

***

## Virtual Machines vs Containers

A virtual machine (VM) is a software-based emulation of a physical computer or server that allows multiple operating systems and applications to run on a single physical machine. A virtual machine typically runs on top of a hypervisor (e.g. VirtualBox), which is responsible for managing and allocating the physical resources of the host machine (such as CPU, memory, and storage) to the virtual machines.

Each virtual machine is isolated from other virtual machines and the host machine, allowing multiple operating systems and applications to coexist on the same physical machine without interfering with each other. This makes it possible to run applications that require different operating systems or software configurations on the same physical machine.

Virtual machines are commonly used in data centers and cloud computing environments to maximize the utilization of physical hardware resources, reduce hardware costs, and simplify the deployment and management of applications. They are also used for testing and development, allowing developers to test their software on multiple operating systems without needing to use separate physical machines for each one. 

Docker containers, just like virtual machines, provide a way to run applications in an isolated environment, but there are some disadvantages of VMs compared to Docker containers:

1) **Resource overhead**: Virtual machines are heavy and resource-intensive. Each VM requires its own (full) operating system, which takes up significant disk space and memory. In contrast, Docker containers share the host operating system (kernel), which makes them much lighter and more efficient in terms of resource usage.

2) **Slow startup time**: Virtual machines can take several minutes to start up, depending on the size of the virtual hard disk, the amount of memory assigned, and other factors. In contrast, Docker containers can start up in just a few seconds, making them ideal for applications that need to be quickly deployed and scaled.

3) **Limited portability**: Virtual machines are tied to a specific hypervisor and hardware configuration, which can limit their portability. If you want to move a VM from one environment to another, you may need to make significant changes to the VM configuration. In contrast, Docker containers are highly portable and can be run on any host that supports Docker.


A container is an isolated environment for running an application. It’s essentially an operating-system process with its own file system. All containers on a host share the operating system of the host while VMs require their own OS. Particularly, **containers share  the kernel of OS**. [[image source](https://medium.com/@anandthanu/how-do-you-explain-an-os-kernel-to-a-5-year-old-92a08755e014)]
><center><img src="./images/kernel.png" width=500 height = 700/></center>
> The <b>kernel</b> is the core component of an operating system that provides low-level services to other parts of the system, such as device drivers, memory management, and process management. It is responsible for managing the system's hardware resources, including the CPU, memory, and I/O devices.
The kernel is the first component of the operating system to load into memory during the boot process and remains in memory throughout the lifetime of the system. It is responsible for managing system resources and providing an interface between applications and hardware.

**Docker Image**: A Docker image is a read-only template that contains a set of instructions for creating a Docker container. It is essentially a snapshot of a file system at a particular point in time, with all the necessary dependencies and configuration files to run an application. Docker images are built using a **Dockerfile**, which is a text file that contains a series of commands for building the image.

**Dockerfile** is a plain text file that contains instructions for docker to pakcage up an application into a **docker image**. A docker image conains **EVERYTHING** that an application needs to run - a cut-down OS, runtime environment (e.g. Python), application files and dependencies, environme variables. Once an image is ready, we can tell docker to run an application inside a special proccess called container. As a result, we can bundle an application into an image and run it on any machine that runs Docker.  
We can share our images by publishing them on Docker registries. The most popular Docker registry is **Docker Hub**.

<center><img src="./images/image_container.png" width=800 height = 500/></center>  

[[image source](https://medium.com/swlh/understand-dockerfile-dd11746ed183)]

**Containers are isolated!**  
We can start multiple containers from the same image. If we create files or do modifications inside a container, then other containers (that are ran from the same image as the first one) won't see the changes made. This is because each container gets its file system from image and thus, changes in one container are invisible for others.

When running a container, it uses an isolated filesystem. This custom filesystem is provided by the image. Since the image contains the container’s filesystem, it must contain everything needed to run an application - all dependencies, configurations, scripts, binaries, etc. The image also contains other configuration for the container, such as environment variables, a default command to run, and other metadata.

**Summary**  
A Docker image is a static template that contains instructions for creating a Docker container, while a Docker container is a lightweight and portable executable package that includes an application and all its dependencies. Docker images can be built, shared, and downloaded, while Docker containers can be started, stopped, and deleted, and can run on any system that supports Docker.

**Docker image/container only has the application layer of the OS and uses the kernel and CPU of the host machine**. That's why docker container boot's so fast. In your host machine kernel is already running, so if you boot your docker container it will share the running kernel and start the container so fast.

## Let's practice!

For demonstration we will develop a simple random integer generator for the interval `[a, b)` using `numpy`.

In [None]:
!docker --version

In [None]:
!python application/source/random_num_gen.py --a=6 --b=20

In [None]:
!touch requirements.txt

In [None]:
!mv requirements.txt application/

In [None]:
!pip list | head -n 5

In [None]:
!pip list | grep numpy

In [None]:
!echo "numpy==1.23.5" >> application/requirements.txt

In [None]:
!ls application/

In [None]:
!cat application/requirements.txt

In [None]:
!pip install -r application/requirements.txt

Now that out random number generator is developed, we will start dockerizing it.

### A step-by-step guide to dockerize an application

Here are the general steps to dockerize an application:
<div class="alert alert-block alert-success">
    
1) <b>Choose a base image</b>: You will need to choose a base image that matches the operating system and dependencies required by your application. For example, you may choose an image with a specific version of Linux or an image that already has the necessary programming language or framework installed.

2) <b>Write a Dockerfile</b>: A Dockerfile is a script that contains instructions for building a Docker image. The Dockerfile should start with the chosen base image, and then include commands to install any additional dependencies required by your application.

3) <b>Build the Docker image</b>: Use the Dockerfile to build the Docker image by running the docker build command in your terminal. This will create a new image that includes your application code and all its dependencies.

4) <b>Run the Docker container</b>: Once the Docker image is built, you can run it as a container using the docker run command. This will start the application inside the container, which runs in isolation from the host system.

5) <b>Publish the Docker image</b>: If you want to share your Docker image with others or use it on another machine, you can publish it to a Docker registry like Docker Hub or Amazon ECR. This step is optional.
</div>

These are the basic steps involved in dockerizing an application, but the specific details will depend on an application's requirements and the complexity. 


#### 1) Let's go to [Docker Hub](https://hub.docker.com/) (an official docker image repository) and choose an image that suits our needs


As our application only uses python, then it will be wise from our side to choose the base image to have python already installed. So let's search for [python official docker image](https://hub.docker.com/_/python) (just search for 'python' in Docker Hub). 

<center><img src="./images/python_slim.png" width=800 height = 700/></center>

#### 2) Now that we have already chosen our base image, we can start writing the Dockerfile

#### Dockerfile

* **FROM** - specify base image
* **WORKDIR** - specify working directoy, after this command, all other commands that are below will be executed from the specified directory
* **COPY** - copy files and directories from the current directory (where Dockerfile resides) into an image
* **ADD** - add files or directories (has some additional features compared with COPY, like local-only tar extraction and remote URL support)
* **RUN** - execute OS commands
* **ENV** - set environment variables
* **EXPOSE** - on which port to start a container
* **USER** - specify user that should run the application
* **CMD** - to set the default command/program
* **ENTRYPOINT** - to set the default command/program


**ENTRYPOINT** and **CMD** are both instructions that can be used in a Dockerfile to specify how a container should run an application. However, they have different purposes and behaviors.

**ENTRYPOINT** is used to specify the command that should be run when the container is started. It is typically used to set the primary command that the container should run. For example, if you were building a Docker image for a Python script, you might use **ENTRYPOINT** to specify that the container should run the python command followed by the name of your script. You can also specify additional arguments to the command using the **CMD** instruction.

**CMD** is used to specify default arguments that should be passed to the **ENTRYPOINT** command when the container is started. If a user specifies their own command when starting the container (using `docker run <image> <command>`), the <b>CMD</b> instruction is overridden. For example, if your <b>ENTRYPOINT</b> is set to run a Python script, you might use <b>CMD</b> to specify default arguments like the path to the script or any command-line arguments that should be passed to the script.

In [None]:
!touch Dockerfile

In [None]:
!ls

**Copy/paste this content into your Dockerfile**

```Dockerfile
FROM python:3.8-slim
COPY ./application/requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY application/ ./
ENTRYPOINT [ "python", "source/random_num_gen.py"]
CMD ["--a=6", "--b=20"]
```

The `--no-cache-dir` option tells pip to not save the downloaded packages locally, as that is only if pip was going to be run again to install the same packages, but that's not the case when working with containers. Usually, we use `--no-cache-dir` to shrink the image size by disabling the cache.

#### 3) Build an image

**Run the following command on terminal to build an image**

```bash
docker build -t randint_img .
```

P.S.  
The reason for not running it here is because jupyterlab/jupyter-notebook doesn't visualize progress bars.

#### 4) Run a container

In [None]:
!docker images

In [None]:
# Run the docker container:
!docker run randint_img --a=5 --b=7

In [None]:
!docker run randint_img

In [None]:
!docker ps -a

Choose a container id and print its logs.

In [None]:
!docker logs 0604da8526cc

#### 5) Publish docker image

**Run the following command on terminal to build your final image (recommended however you can also use the most recent workable image)**

```bash
docker build -t randint_img:prod .
```

In [None]:
!docker login

In [None]:
!docker images

In [None]:
!docker tag randint_img:prod davitp3/randint_img:prod  # Tagging to make it possible to push into Docker Hub.

In [None]:
!docker images 

In [None]:
!docker push davitp3/randint_img:prod  # Pushing the image into Docker Hub.

#### Bonues: Let's pull and run our image assuming we don't have anything related to the project.

In [None]:
!docker system prune -a -f  # Remove all unused images and containers.

In [None]:
!docker images

Go to [Docker Hub](https://hub.docker.com/) and search for your published image.

In [None]:
!docker pull davitp3/randint_img:prod

In [None]:
!docker images

In [None]:
!docker run davitp3/randint_img:prod --a=-4 --b=96

#### Cleaning up the workspace 😃

In [None]:
!rm -rf application/requirements.txt Dockerfile
!docker system prune -a -f

<center><img src="./images/spring_brake.png" width=800 height = 600/></center>  

# References
- [Docker](https://www.docker.com/)