# A short introduction to containerized software

**After spending using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.**

**Today, we will focus on containerization, namely via Docker.** 



**1. Check if Docker is installed.**

In [1]:
!docker info

Client: Docker Engine - Community
 Version:    27.3.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.17.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.7
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 1
  Running: 0
  Paused: 0
  Stopped: 1
 Images: 12
 Server Version: 27.3.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
 runc version: v1.

Version Version: 27.3.1 is installed

### What is a container?

"A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings."

https://www.docker.com/resources/what-container/

### Why do we use containers?

They are portable and consistent, meaning that (research) results can be replicated.
They are less resource intensive than full virtual machines.

https://www.docker.com/resources/what-container/

### What is a docker image?

"Docker images are the basis of containers. An Image is an ordered collection of root filesystem changes and the corresponding execution parameters for use within a container runtime. An image typically contains a union of layered filesystems stacked on top of each other. An image does not have state and it never changes."

https://docs.docker.com/reference/glossary/#image

It is a stanfalone executable file wiht instructions to build a container

### Let's run our first docker image:

### Login to docker

In [None]:
!docker login

# the rest had to be done via a browser

### Run your first docker container

In [None]:
!docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



### Find the container ID

In [3]:
 docker ps -a

{.Id}


In [None]:
CONTAINER ID   IMAGE         COMMAND    CREATED         STATUS                     PORTS     NAMES
cca684428911   hello-world   "/hello"   2 minutes ago   Exited (0) 2 minutes ago             great_cannon
08caf040c0e5   hello-world   "/hello"   8 minutes ago   Exited (0) 8 minutes ago             gifted_poitras
1d44e9994ce9   hello-world   "/hello"   4 days ago      Exited (0) 4 days ago                loving_shannon


### Delete the container again, give prove its deleted

In [None]:
!docker rm cca684428911 
!docker rm 08caf040c0e5 
!docker rm 1d44e9994ce9

!docker ps -a

In [None]:
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

**Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/**

**Please describe the steps you took to download and run the software for the example fastq file from last week below:**

1. I tried to find a conda install. https://anaconda.org/bioconda/fastqc
2. I tried to install the conda package in a new environment
    - cona create -n fastqc
    - conda activate fastqc
    - conda install bioconda::fastqc
    - this would install version  fastqc-0.12.1  
    - this seems to be the newest stable version, as it is the same mentioned ont the official website
    - the package has been installed
3. I looked up the correct command to run fastqc
    - fastqc --help    
4. I ran it on one file
    - fastqc SRX19144486_SRR23195516_1.fastq.gz
    - This succesfully created a fastqc report     

### Very well, now let's try to make use of its docker container

**1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)**<br>
**2. use the container to generate a fastqc html of the example fastq file**

In [None]:
# pull the container
docker pull community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42

In [None]:
0.12.1--5cfd0f3cb6760c42: Pulling from library/fastqc
6360b3717211: Pull complete 
2ec3f7ad9b3c: Pull complete 
7716ca300600: Pull complete 
4f4fb700ef54: Pull complete 
8c61d418774c: Pull complete 
03dae77ff45c: Pull complete 
aab7f787139d: Pull complete 
837d55536720: Pull complete 
897362c12ca7: Pull complete 
3893cbe24e91: Pull complete 
d1b61e94977b: Pull complete 
5e39529b9f20: Pull complete 
92b86e1d0b98: Pull complete 
Digest: sha256:0c524d3abe2642c09c5852299bd79bf78ba0ee2ef040473324caab0826f64d44
Status: Downloaded newer image for community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42
community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42


In [None]:
# run the container and save the results to a new "fastqc_results" directory

# and change permissions
!chmod -R 777 $(pwd)
# create directory
!mkdir $(pwd)/fastqc_results
# mount container so docker can access it and run
!docker run -v $(pwd):/fastqc_docker community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42 fastqc /fastqc_docker/SRX19144486_SRR23195516_1.fastq.gz -o /fastqc_docker/fastqc_results


### Now that you know how to use a docker container, which approach between running everything manually and using docker was easier and which approach will be easier in the future?

Docker is easier for large projects like nextflow pipelines. For small projects I will still use the manual method.

### What would you say, which approach is more reproducible?

Docker

### Compare the file to last weeks fastqc results, are they identical?
### Is the fastqc version identical?

The file from last week would be _SNI_oxy_3_1.gz_ <br>
They look identical <br><br>

Both versions are  0.12.1


## Dockerfiles

**We now used Docker containers and images directly to boost our research.** 

**Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)**

**Hints:**
**1. Docker is Linux, so you need to know the apt-get command to install "cowsay"**

In [None]:
# open the file "my_dockerfile" in a text editor

# this is the base image the container is built on. In this case, it is a slim version of the Debian operating system.
FROM debian:bullseye-slim

# these are the labels that are added to the image. They are metadata that can be used to identify the author of the image.
LABEL image.author.name "am"
LABEL image.author.email "cows_are_cool@gmail.com"

# !TODO: add the command that is run to install the dependencies for the image. In this case, it should be updating the package list and installing curl and cowsay.

# Install the application dependencies

RUN apt-get curl
RUN apt-get cowsay
RUN apt update
RUN apt upgrade


# !TODO: add an ENV line to set environmental variables. In this case, it should set the PATH variable to /usr/games. Explain in the notebook why this is necessary.

ENV /usr/games

# Explain the RUN and ENV lines you added to the file. add an ENV line to set environmental variables. In this case, it should set the PATH variable to /usr/games. Explain in the notebook why this is necessary.

Environment variables can be used to use different environments. It makes it easier to deploy, make changes, and share the environment for deployment.


# Install the application dependencies
RUN apt-get update && \                  # update package list
    apt-get install -y curl cowsay && \  # install curl and cowsay. Use -y to automatically accept prompts 
    apt-get upgrade -y                   # upgrade packages



# !TODO: add an ENV line to set environmental variables. In this case, it should set the PATH variable to /usr/games. Explain in the notebook why this is necessary.
          
ENV PATH="$PATH:/usr/games" # as required by the task


In [None]:
# build the docker image

# create a fresh directory for building the image
!mkdir docker_build
!mv my_dockerfile docker_build/
!cd docker_build/

#build
docker build -f my_dockerfile -t cowsay_image .


In [None]:

[+] Building 21.1s (6/6) FINISHED                                                                                                                    docker:default
 => [internal] load build definition from my_dockerfile                                                                                                        0.1s
 => => transferring dockerfile: 881B                                                                                                                           0.0s
 => [internal] load metadata for docker.io/library/debian:bullseye-slim                                                                                        1.8s
 => [internal] load .dockerignore                                                                                                                              0.0s
 => => transferring context: 2B                                                                                                                                0.0s
 => [1/2] FROM docker.io/library/debian:bullseye-slim@sha256:3f9e53602537cc817d96f0ebb131a39bdb16fa8b422137659a9a597e7e3853c1                                  5.7s
 => => resolve docker.io/library/debian:bullseye-slim@sha256:3f9e53602537cc817d96f0ebb131a39bdb16fa8b422137659a9a597e7e3853c1                                  0.0s
 => => sha256:fa0650a893c25858ebb09921bc9b7824594e23405374a6adbcd3b4e27e28e3cf 31.43MB / 31.43MB                                                               4.4s
 => => sha256:3f9e53602537cc817d96f0ebb131a39bdb16fa8b422137659a9a597e7e3853c1 984B / 984B                                                                     0.0s
 => => sha256:d64241f857a1d4515f831751dad27fe6c974fe73d58b909936fefff6914ad3b9 529B / 529B                                                                     0.0s
 => => sha256:7f0f93ec8f75ec8fa17871b7488bb80619d8ee06a0d6c7eae3912acf8ae7f859 1.46kB / 1.46kB                                                                 0.0s
 => => extracting sha256:fa0650a893c25858ebb09921bc9b7824594e23405374a6adbcd3b4e27e28e3cf                                                                      1.2s
 => [2/2] RUN apt-get update &&     apt-get install -y curl cowsay &&     apt-get upgrade -y                                                                  12.9s
 => exporting to image                                                                                                                                         0.5s 
 => => exporting layers                                                                                                                                        0.5s 
 => => writing image sha256:922e3da57fcfcecc348c8231eaa7d3ac36e4e764f0d2b22ea5d69a7940a717ea                                                                   0.0s 
 => => naming to docker.io/library/cowsay_image                                                                                                                0.0s 
                                                                                                                                                                    
 2 warnings found (use docker --debug to expand):                                                                                                                   
 - LegacyKeyValueFormat: "LABEL key=value" should be used instead of legacy "LABEL key value" format (line 5)
 - LegacyKeyValueFormat: "LABEL key=value" should be used instead of legacy "LABEL key value" format (line 6)


In [None]:
# make sure that the image has been built
!docker images

In [None]:
REPOSITORY                                                 TAG                        IMAGE ID       CREATED              SIZE

cowsay_image                                               latest                     922e3da57fcf   About a minute ago   151MB

In [None]:
# run the docker file 
!docker run cowsay_image cowsay -t "Hey, Docker is not that bad"

In [None]:
< Hey, Docker is not that bad >
 -----------------------------
        \   ^__^
         \  (--)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

**To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz**

In [None]:
# use the file "salmon_docker" in this directory to build a new docker image

FROM debian:bullseye-slim

LABEL image.author.name="am"
LABEL image.author.email="salmons_are_cool@gmail.com"

# Install dependencies

RUN apt-get update && \
    apt-get install -y curl tar && \
    apt-get upgrade -y

# Download and install Salmon

RUN curl -L -o salmon-1.5.2_linux_x86_64.tar.gz https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz && \
    tar -xzvf salmon-1.5.2_linux_x86_64.tar.gz && \
    cp salmon-1.5.2_linux_x86_64/bin/salmon /usr/local/bin/

    
# Set the PATH environment variable (to /usr/bin)

ENV PATH="$PATH:/usr/bin" 

In [47]:
# build the image in new directory
!docker build -f salmon_docker -t salmon_image .


In [None]:
# run the docker image to give out the version of salmon

!docker run salmon_image salmon-1.5.2_linux_x86_64/bin/salmon --version

salmon 1.5.2


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

No, they either share their docker images, or call them in their pipelines.

**Find the salmon docker image online and run it on your computer.**

the dockerfile can be found here: https://github.com/COMBINE-lab/salmon/blob/master/docker/Dockerfile

while the image is hosted at https://hub.docker.com/layers/combinelab/salmon/0.12.0/images/sha256-8ceee4b7c3b49af7eb002ffd5754c660342a3c06adf6afd63508fbc7b7f34799


In [None]:
!docker pull combinelab/salmon:1.10.3

In [None]:
1.10.3: Pulling from combinelab/salmon
7c457f213c76: Pull complete 
36c51ed9ab84: Pull complete 
bbd50bdd40c3: Pull complete 
0063893c1bc1: Pull complete 
1c8f9485d7ab: Pull complete 
Digest: sha256:cefd8bb0b2ed9b07f22b5f0fc317ddda540e5b0dc00810d1ff0d92fee5d80370
Status: Downloaded newer image for combinelab/salmon:1.10.3
docker.io/combinelab/salmon:1.10.3

In [None]:
!docker run combinelab/salmon:1.10.3 --version

salmon 1.10.3

This is the latest available version

**What is https://biocontainers.pro/ ?**

"BioContainers is a community-driven project that provides the infrastructure and basic guidelines to create, manage and distribute bioinformatics packages (e.g conda) and containers (e.g docker, singularity). BioContainers is based on the popular frameworks Conda, Docker and Singularity."

https://biocontainers-edu.readthedocs.io/en/latest/what_is_biocontainers.html