# AUA, DS 229 – MLOps
### Week 9 – Docker++

***

<center><img src="./images/docker_logo.png" width=400 height = 700/></center>

## Linux: users, groups and permissions

The command `ls -l` can be used to see the permissions and ownership of files and directories. The option `la` will also display hidden files, while `l` will produce a detailed listing.

In [None]:
!ls -l  # You can also try `ls -la` for hidden files.

The permissions and ownership of a file or directory are displayed in the first column, which consists of 10 characters and dashes. The second column, containing a single number, indicates the number of files or directories within the directory. The owner is shown in the following column, followed by the group name, size, and the date and time of the most recent access. The file's name is displayed at the end. For example, 
> `drwxr-xr-x` are the permissions  
> `7` is the number of files or directories  
> `david` is the owner of the file or directory  
> `staff` is the group that owns the file or directory  
> `160` is the size   
> `Mar 11 15:20` is the datetime of the last access   
> `images` is the directory  


In the first column, permissions and ownership, the dash `-` (the very first symbol) indicates that it is a regular file (while `d` is for directory), and the following characters show the permissions. The permissions are broken down into 3 sets of 3 characters each: the first set represents the owner's permissions, the second set represents the group's permissions, and the third set represents everyone else's permissions. The possible characters are:
- `r`: read permission (allows a file to be opened and read | allows the affected user to list the files within the directory)
- `w`: write permission (allows a file to be modified or deleted | allows the affected user to create, rename, or delete files within the directory, and modify the directory's attributes)
- `x`: execute permission (allows a file to be executed as a program or script | allows the affected user to enter the directory, and access files and directories inside)
- `-`: no permission

<center><img src="./images/linux_permissions.png" width=600 height = 600/></center>

[[image source](https://pressidium.com/blog/deciphering-linux-file-system-permissions/)]

The command `chmod` is short for **change mode**. `chmod` is used to change permissions on files and directories. It may be used with either letters or numbers (also known as octal form) to set the permissions. The letters used with `chmod` are in the table below:
- `r`: Read
- `w`: Write
- `x`: Execute
- `X`: Execute (only if file is a directory)
- `s`: Set user or group ID on execution
- `t`: Save program text on swap device
- `u`: Current permissions the file has for owner
- `g`: Current permissions the file has for users in the same group
- `o`: Current permissions the file has for others not in the group


Some examples:
- `chmod u+x file.txt` – adds execute permission (`+x`) for the owner (`u`) of "file.txt"
- `chmod go-w file.txt` – removes write permission (`-w`) for the group (`g`) and other users (`o`) of "file.txt"
- `chmod -R a+x directory` – recursively (`-R`) adds execute permission (`+x`) for all users (`a`) to the "directory" and its contents
- `chmod 644 file.txt` – gives read and write permission (`6`) to the owner, and read permission (`4`) to the group and other users of "file.txt" (octal format of `chmod`)

<center><img src="./images/octal_chmod.png" width=400 height = 400/></center>

> In addition to the most common read/write/execute file permissions, there are some additional modes that you might find useful, specifically the `+t` mode (sticky bit) and the `+s` mode (setuid bit). These functions describe the behavior of files and executables in multi-user situations.

>When set on a directory, the sticky bit, or `+t` mode, means that only the owner (or root) can delete or rename files within that directory, regardless of which users have write access to the directory by way of group membership or ownership. This is useful when a directory is owned by a group through which a number of users share write access to a given set of files.
It’s important to note that setting the sticky bit on a file does not prevent a user with write permissions to the enclosing directory from deleting or renaming the file—the sticky bit must be set on the enclosing directory. The sticky bit has no function on modern Linux systems when set on files.
To set the sticky bit on a directory named **my_directory**, issue the following command: `chmod +t my_directory`
To remove the sticky bit from a file or directory, use the `chmod -t` command. Note, to change the sticky bit, you need to be either root or the file/directory owner. The root user is able to delete directories and files within them regardless of the status of the sticky bit.

> The setuid bit, or `+s`, when set on files allows users with permissions to execute a given file to run that file with the permissions of file owner. For instance, if the file was owned by the root user and the marketing group, members of the marketing group could run the program (file) as if they were the root user. This may pose potential security risks in some cases and executables should be properly evaluated before receiving the `+s` flag. To set the `+s` bit on a file named **file**, issue the following command: `chmod g+s file`.
In contrast to the `+s` mode for the ownership of a file, the effect of the `+s` mode on a directory is somewhat different. Files created in `+s` directories receive the ownership of that directory’s user and group, rather than the ownership of the user that created the file and their default group. 

**Ownership**  
By default, all files are “owned” by the user who creates them and by that user’s default group. To change the ownership of a file, use the chown command in the `chown user:group /path/to/file` format. In the following example, the ownership of the “file” file is changed to the “linux_user” user in the “developers” group:  
`chown linux_user:developers file`.

To change the ownership of a directory and all the files contained inside, use the recursive option with the `-R` flag. In the following example, change the ownership of **directory** to the “linux_user” user in the “developers” group:  `chown -R linux_user:developers directory`.






### File permissions  
| Command | Description |
| :------| :-----------|
|`chmod u+x file`    | give the owning user execute permission  |
|`chmod g+x file`    | give the owning group execute permission  |
|`chmod o+x file`    | give everyone else execute permission  |
|`chmod ug+x file`   | to give the owning user and group execute permission  |
|`chmod ug-x file`   | to remove the execute permission from the owning user and group |

### Managing users and groups  
| Command | Description |
| :------| :-----------|
|`useradd -m john`    | to create a user with a home directory  |
|`adduser john`       | to add a user interactively (recommended)  |
|`usermod`            | to modify a user  |
|`userdel`            | to delete a user  |
|`groupadd devs`      | to create a group   |
|`groups john`        | to view the groups for john  |
|`groupmod`           | to modify a group  |
|`groupdel`           | to delete a group  |

#### A small example demonstrating the necessity of `chmod`

By default, newly created files are assigned `-rw-r--r--` permissions.

In [None]:
!touch dummy_script.sh
!ls -l

In [None]:
!echo "echo hello world" > dummy_script.sh
!cat dummy_script.sh

In [None]:
! ./dummy_script.sh  # We don't have permission to execute the script.

In [None]:
!chmod u+x dummy_script.sh

In [None]:
!ls -l

In [None]:
! ./dummy_script.sh

In [None]:
!rm -rf dummy_script.sh

## Dockerfile optimization, interactive mode of containers, managing multiple users, volumes

<center><img src="./images/image_container.png" width=1000 height = 800/></center>

### Building images

`docker build -t <image-name> <path-to-the-directory-that-contains-Dockerfile>` – build an image    
`docker build -t <image-name>:<TAG> <path-to-the-directory-that-contains-Dockerfile>` – build an image and assign a tag   
`docker image tag <image-name> <image-name>:<new-TAG>` – re-tag an image  
`docker images` or `docker image ls` – list all images  
`docker images prune` – remove images that are unused  
`docker image rm <TAG1|NAME1> <TAG2|NAME2> ...` – remove one or more images


`docker ps` – show running docker containers (*ps* is short for processes)  
`docker ps -a` – show all containers including stopped ones  
`docker container prune` – remove all stopped containers  
`docker run <image-name>` – run a docker container  
`docker run -d <image-name>` – run a docker container in detached mode (i.e. in the background)  
`docker run --name=<container-name> <image-name>` – run a docker container and give a name to it (otherwise docker assigns random names)  

`docker run -it <image-name>` – run a docker container in interactive mode  
`docker run -it <image-name> <command-to-run-at-the-start-of-container>` – run a docker container in interactive mode and execute a command when starting the container (e.g. `bash`, `sh` for shell)  
`docker logs <container-id>` – see the output of a container  
`docker logs -f <container-id>` – see the output of a container in real-time   
`docker logs -n 6 <container-id>` – see the last 6 lines of the output of a container  
`docker logs -n 20 -t <container-id>` – see the last 20 lines of the output of a container with message timesteps  


When you pull an image, docker automatically downloads the image right for your CPU architecture.

When we build an image with `docker build` and specify the directory of Dockerfile, docker client sends **the content of the current directory** to docker engine which starts executing commands written in Dockerfile. As a result, docker engine doesn't have access to files out of the current directory and thus, we cannot for example COPY out-of-directory files into an image inside Dockerfile.

#### COPY & ADD
`COPY <file-1> <file-2> <path-to-a-directory-directory>/` - copy files into a drectory  
`COPY <filename_*.extension> <path-to-a-directory-directory>/` - copy all files that start with 'filename_' into a drectory  
`COPY . <path-to-a-directory-directory>/` - copy everything from current firectory into the specified directory   
`COPY ["<file-1>", "<file-2>", "<path-to-a-directory-directory>/"]` - copy files into a drectory, use this when names contain spaces  

We can specify working directory inside an image and use relative paths for the next commands:  
`WORKDIR /dir1`   
`COPY <file-1> <file-2> .`  
instead of   
`COPY <file-1> <file-2> /dir1/`  

`ADD <filename.zip> <path-to-a-directory-directory>/` - like copy but also unzips the file  

It is recommended to use `COPY` instead of `ADD`.

#### ENV
`ENV <VAR-NAME>=<VALUE>` - set an environment variable

#### EXPOSE
`EXPOSE <port>` - specify a port to which a container must listent to

#### RUN & CMD
`RUN` is a build-time instruction while `CMD` is runtime. The commands specified with `RUN` are executed to build the image. The commands specified with `CMD` are executed when starting a container.

<mark>Having multiple `CMD` instructions in **Dockerfile**, only the last will have an effect.</mark>

There are 2 forms of `CMD`:
* Shell form: `CMD <command-part1 command-part2>` that starts a separate shell (cmd for windows)
* Execute form: `CMD ["command-part1", "command-part2"]` that executes commands directly (RECOMMENDED)

#### ENTRYPOINT
`ENTRYPOINT` and `CMD` are both instructions that can be used in a Dockerfile to specify how a container should run an application. However, they have different purposes and behaviors. `ENTRYPOINT` has also 2 forms similar to `CMD`.

`ENTRYPOINT` is used to specify the command that should be run when the container is started. **It is typically used to set the *primary* command that the container should run**. For example, if you were building a Docker image for a Python script, you might use `ENTRYPOINT` to specify that the container should run the Python command followed by the name of your script. You can also specify additional arguments to the command using the `CMD` instruction.

`CMD` **is used to specify default arguments that should be passed to the** `ENTRYPOINT` **command when the container is started. If a user specifies his/her own command when starting the container (using** `docker run <image> <command>`**), the** `CMD` **instruction is overridden**. For example, if your `ENTRYPOINT` is set to run a Python script, you might use `CMD` to specify default arguments like the path to the script or any command-line arguments that should be passed to the script.

In [None]:
!touch Dockerfile

<div class="alert alert-block alert-danger">
<b>Action</b>:
    Copy/paste this content into Dockerfile
</div> 

```Dockerfile
FROM python:3.8-slim

WORKDIR /app

COPY ./application/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
```

<div class="alert alert-block alert-danger">
    
and build an image by running `docker build -t randint_img .`
</div> 

***

<div class="alert alert-block alert-danger">
<b>Task 1</b>:
    <b>Ignoring files and directories</b>

Run the container by starting a shell program (`docker run -it randint_img sh`) and observe that there are redundant files and directories. This is because of the last step in `Dockerfile` where we copied everything from the current directory. To ignore certain files or directories, we need to list them inside a special file named `.dockerignore` (like `.gitignore`) that must be created in the same directory with `Dockerfile`.
</div> 

In [None]:
# Create `.dockerignore`.
!touch .dockerignore

# Add redundant directories and files.
!echo images/ >> .dockerignore
!echo "dockerpp.ipynb" >> .dockerignore

!cat .dockerignore

<div class="alert alert-block alert-danger">
    
Re-build the image `docker build -t randint_img .` and run `docker run -it randint_img sh` to make sure that ignored files don't exist.
</div> 

In [None]:
!rm -rf .dockerignore  # Cleaning.

***

<div class="alert alert-block alert-danger">
<b>Task 2</b>:
    <b>Environment variables</b>

Create an environment variable inside `Dockerfile` and print its value in a container shell.
</div> 


```Dockerfile
FROM python:3.8-slim

WORKDIR /app

COPY ./application/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

ENV MY_VAR=2

COPY . .
```

***

<div class="alert alert-block alert-danger">
<b>Task 3</b>:
    <b>Creating multiple users</b>

For security reasons we often don't want to run our application as a root user that has the highest privileges. Instead we can create a new user inside `Dockerfile` that will tell docker engine to run all the following commands as a newly created user. 
    
Update `Dockerfile` and re-run container.
</div> 


```Dockerfile
FROM python:3.8-slim

RUN adduser --system --group app_user
USER app_user

WORKDIR /app

COPY ./application/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

ENV MY_VAR=2

COPY . .
```

Above we want to create a new user called **app_user** that will run the application. Note that **app_user** will be assigned to **app_user** group which is the best practice in Linux.

Check the user with `whoami` and the group with `groups app_user` (generic syntax: `groups <user>`).

***

<div class="alert alert-block alert-danger">
<b>Task 4</b>:
    <b>Optimal usage of cache</b>

Discuss the differences between the two `Dockerfile`s:
</div> 

**Version 1**
```Dockerfile
FROM python:3.8-slim

RUN adduser --system --group app_user
USER app_user

WORKDIR /app

COPY . .
RUN pip install --no-cache-dir -r ./application/requirements.txt

ENV MY_VAR=2
```


**Version 2**
```Dockerfile
FROM python:3.8-slim

RUN adduser --system --group app_user
USER app_user

WORKDIR /app

COPY ./application/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

ENV MY_VAR=2

COPY . .
```

**Answer**:

<div class="alert alert-block alert-success">
Docker has built-in optimization that checks if instructions listed in Dockerfile are not changed, then it builds from its cache saving time. <b>Note that once a layer / instruction is re-built, all the following layers / instructions need to be re-built</b>. It is recommended to develop your Dockerfile in such a way that instructions that are changed rarely to be on the top and instructions that are changed frequently to be on the bottom. With such a setup, docker will utilize its cache for most of the instructions / layers saving time when building an image.
</div> 


The thing is that in the first version, we copy all the source codes into docker image. Because source codes are changed frequently then docker engine has to do all the installations (`pip install -r requirements.txt`) after each change to the codebase. To avoid that, we can copy the `requirements.txt` and install it in advance and thereafter copy the source codes (and the remaining files). As the application requirements are rarely changed docker engine will bring the installations from its cache speeding up all the subsequent image builds.

***

Docker also provides functioality to save / load images. Play with the following commands on your own:  
`docker image save -o <output-file> <image>`  
`docker image load -i <path-to-file>`

### Containers

>If an application is run on a port X within a container, then that port is only published in that container (X is given with EXPOSE command in Dockerfile). This means that outside of that container we cannot see the result. To map the port Y in host into the container port X run the container with the port argument:  
`docker run -p Y:X <image-name>`

`docker stop <container-name | container-id>` - to stop a container  
`docker start <container-name | container-id>` - to start a stopped container  
`docker rm <container-name | container-id>` or `docker container rm <container-name | container-id>` - to remove a stopped container  
`docker rm -f <container-name | container-id>` or `docker container rm -f <container-name | container-id>` - to remove a container  

**Since containers have isolated file system** then two containers created from the same image cannot share files between each other. Hence, it is not recommended to store files / data in a container's file system. Moreover, if we delete that container, its file system will also be removed meaning that we can lose data. Here is where volumes come into play!

In [None]:
!docker run randint_img python application/source/random_num_gen.py --a=6 --b=20

***

<div class="alert alert-block alert-danger">
<b>Task 5</b>:
    <b>Show that containers created from the same image have isolated file systems</b>

To do so, start a new container (`docker exec -it <container-id> sh`) and create a plain `dummy.txt` file, write smth inside and exit (just execute `exit`). Next, start a new container from the same image (execute `docker run -it randint_img sh` to start a new container with shell program in interactive mode or `docker exec -it <container-id> sh` to start a shell program in already running container) in a separate terminal window (for clarity) and make sure that the file created in the previous step doesn't exist. Finally, exit the container and start the container of the first step in interactive mode (`docker start -i <container-id>`) to check that the file exists. 
    
This is why we should never store data files (logs, output metrics of application, etc.) into docker container. 
</div> 


>If we want to run a command inside a running docker container we need to use `exec`:  
>`docker exec <container-name | container-id> <command>`  
>* `docker exec <container-name> ls` will show all the files in the container under the directory specified by `WORKDIR` in Dockerfile  
>* `docker exec -it <container-name> sh` - run a shell (interactively), to get out run `exit` command  
>After running a command with `exec` the container won't stop, it will continue running.

***

The solution on how dontainers can interact and share data is the concept of **volume**.

**A *volume* is a storage outside of a container. It can be a directory on the host, a cloud server, etc**.  
`docker volume create <volume-name>` - create a volume in a host   
`docker volume inspect <volume-name>` - inspect a volume a 

In [None]:
!docker volume create my_volume

In [None]:
!docker volume inspect my_volume

**Mountpoint** key will represent the directory of the volume. 
>In case of macOS, since docker runs in a linux VM, we can never see that directory. For Linux and Windows, we can easily see what it contains by following the path.

`docker run -v <volume-name>:<path-in-container> <image-name>` - map a file in the host file system (i.e. volume) to a path in a container  
**If the specified volume or directory in the container is not created, docker will automatically create for us**. Note that a directory in the container will be created by a ROOT user meaning that other users cannot write there.  
After mapping, inside a container we can write in the specified directory, stop and delete that container, create a new one and again see the data written in the previous steps is there. This is because the directory in a container is mapped to another directory outside a container, i.e. volume. We can also do multiple mapping by just repeating `-v`.

***

<div class="alert alert-block alert-danger">
<b>Task 6</b>:
    <b>Demonstrate how a volume is used</b>

Change the Dockerfile as shown below (create `some_dir` which will get mapped to a volume) 
</div> 

    
```Dockerfile
FROM python:3.8-slim

RUN adduser --system --group app_user
USER app_user

WORKDIR /app

COPY . .
RUN pip install --no-cache-dir -r ./application/requirements.txt

RUN mkdir some_dir

ENV MY_VAR=2
```


<div class="alert alert-block alert-danger">
    
and build docker image: `docker build -t randint_img .`  
    
Now that image is created, we can run container shell program in interactive mode by mapping **my_volume** to `some_dir`:  
    `docker run -it -v my_volume:/app/some_dir randint_img sh`  
    
Next, inside `some_dir` we can create a new file, fill some data there, exit and delete the container (`docker container rm <container-id>`). Then, if we create another container from the same image, we can easily make sure that the data created in the previous step exists even though we have deleted the container. This is the magic of volumes!
    
</div> 

***

If we have a file inside a directory that we want to analyze further in the host (for example, a logging output), we can copy that file into host using the following command:  
`docker cp <container-name | container-id>:<path -to-file> <path-to-storage-in-host>`  
We can also use the same command with the reverse order to copy a file from host into a running container.

<div class="alert alert-block alert-danger">
<b>Task 7</b>:
    <b>Demonstrate how to copy files from container to host</b>
    
Run a container, create some file in it (e.g. `output.txt`) and fill in dummy data and exit. Next, run the below command to copy that file to host:  
    `docker cp <container-id>:/app/output.txt .`
    
Moreover, we can also send file to container. Try  
    1) `docker cp dockerpp.ipynb <container-id>:/app/`  
    2) `docker start -i <container-id>`
</div> 


> Sometime we don't want to build docker image everytime we do a tiny change to the codebase (consider you fixed a typo in comments). In such situations *binding* may be useful.  
>`docker run -v $(pwd):<path-in-container> <image-name>` - bind / map a specific directory in host to a docker container  
With this command we can, for example, map outer source codes into a container. Having this done, we can do changes that will be immediately visible to the container and the running application inside it. 

***

#### Cleaning up the workspace

In [None]:
!rm -rf Dockerfile 
!rm -rf .dockerignore
!rm -rf output.txt
!docker volume rm my_volume
!docker system prune -a -f

# References
- [Docker](https://www.docker.com/)