# A short introduction to containerized software

After spending using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [1]:
!docker info

Client:
 Version:    27.2.0
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.16.2-desktop.1
    Path:     /Users/jansteiger/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.2-desktop.2
    Path:     /Users/jansteiger/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.34
    Path:     /Users/jansteiger/.docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Alpha) (Docker Inc.)
    Version:  v0.0.15
    Path:     /Users/jansteiger/.docker/cli-plugins/docker-desktop
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /Users/jansteiger/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.25
    Path:     /Users/jansteiger/.docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    V

### What is a container?

A container is a unit for running a process with the specifications of an image

### Why do we use containers?

We use containers to make processes easier and more reproducible by running under the same conditions

### What is a docker image?

A docker image is a specification for the environment to run a process.

### Let's run our first docker image:

### Login to docker

In [None]:
# This you need to do on the command line directly

### Run your first docker container

In [16]:
!docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (arm64v8)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



### Find the container ID

In [21]:
!docker ps -a

CONTAINER ID   IMAGE                                                              COMMAND                  CREATED              STATUS                          PORTS                  NAMES
542cdeeab707   hello-world                                                        "/hello"                 55 seconds ago       Exited (0) 54 seconds ago                              pensive_kalam
20764d265a05   hello-world                                                        "/hello"                 About a minute ago   Exited (0) About a minute ago                          sharp_germain
6133e2de4254   hello-world                                                        "/hello"                 2 minutes ago        Exited (0) 2 minutes ago                               agitated_lehmann
f3a82ca130d7   hello-world                                                        "/hello"                 3 minutes ago        Exited (0) 3 minutes ago                               eager_raman
e5181da4d907   communi

### Delete the container again, give prove its deleted

In [24]:
!docker rm 542cdeeab707

542cdeeab707


In [26]:
!docker ps -a

CONTAINER ID   IMAGE                                                              COMMAND                  CREATED          STATUS                      PORTS                  NAMES
20764d265a05   hello-world                                                        "/hello"                 2 minutes ago    Exited (0) 2 minutes ago                           sharp_germain
6133e2de4254   hello-world                                                        "/hello"                 3 minutes ago    Exited (0) 3 minutes ago                           agitated_lehmann
f3a82ca130d7   hello-world                                                        "/hello"                 4 minutes ago    Exited (0) 4 minutes ago                           eager_raman
e5181da4d907   community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42   "/usr/local/bin/_ent…"   6 minutes ago    Exited (0) 6 minutes ago                           admiring_cray
ad605ae292ba   hello-world                                    

### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

1. Download FastQC
2. Download JRE
3. Run software

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [89]:
# pull the container
!docker pull community.wave.seqera.io/library/fastqc:0.12.1--d3caca66b4f3d3b0

0.12.1--d3caca66b4f3d3b0: Pulling from library/fastqc

[1B7c3c4e61: Pulling fs layer 
[1Bf16d8faf: Pulling fs layer 
[1B96a0b3ed: Pulling fs layer 
[1B89cafbe9: Pulling fs layer 
[1Bca8d6a51: Pulling fs layer 
[1Bb700ef54: Pulling fs layer 
[2Bb700ef54: Pulling fs layer 
[1Be1045197: Pulling fs layer 
[1B4a5e885d: Pulling fs layer 
[1B98df307d: Pulling fs layer 
[1Bf7ad9b3c: Pulling fs layer 
[1B81345b94: Pulling fs layer 
[13Bc3c4e61: Download complete MB/361MBMB[9A[2K[7A[2K[2A[2K[12A[2K[12A[2K[12A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[5A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[5A[2K[13A[2K[13A[2K[5A[2K[13A[2K[5A[2K[13A[2K[5A[2K[13A[2K[13A[2K[13A[2K[5A[2K[5A[2K[13A[2K[5A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2KDownloading  25.17MB/361MB

In [92]:
# run the container and save the results to a new "fastqc_results" directory
!docker run -v "/Users/jansteiger/Documents/Comp_workflows/day2/test_sheet/fastq":/data -v "/Users/jansteiger/Documents/Comp_workflows/ILIAS_ Docker worksheet/fastqc_results":/output community.wave.seqera.io/library/fastqc:0.12.1--d3caca66b4f3d3b0 fastqc /data/SRX19144486_SRR23195516_1.fastq.gz -o /output

application/gzip
Started analysis of SRX19144486_SRR23195516_1.fastq.gz
Approx 5% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 10% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 15% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 20% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 25% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 30% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 35% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 40% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 45% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 50% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 55% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 60% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 65% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 70% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 75% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 80% complete for SRX19144486_SRR23195

In [84]:
!docker ps -a

CONTAINER ID   IMAGE                                                              COMMAND                  CREATED          STATUS                      PORTS                  NAMES
de458f2d5e94   community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42   "/usr/local/bin/_ent…"   17 seconds ago   Exited (0) 17 seconds ago                          bold_carson
d292578f34ad   community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42   "/usr/local/bin/_ent…"   4 minutes ago    Exited (2) 4 minutes ago                           quirky_heisenberg
f809a5e3b7cb   community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42   "/usr/local/bin/_ent…"   5 minutes ago    Exited (2) 5 minutes ago                           hungry_booth
dc24d4fb7e96   community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42   "/usr/local/bin/_ent…"   5 minutes ago    Exited (2) 5 minutes ago                           jovial_cray
4f9c326cd9f1   community.wave.seqera.io/library/fastqc:0.12.1--5

### Now that you know how to use a docker container, which approach between running everything manually and using docker was easier and which approach will be easier in the future?

Running everything manually can be easier, especially if you already have most of the software installed and the barrier of entry is lower than for docker, but for more demanding tasks and better reproducibility docker makes many things easier.

### What would you say, which approach is more reproducible?

Already mentioned docker is more reproducible

### Compare the file to last weeks fastqc results, are they identical?
### Is the fastqc version identical?


Restults are identical  
FastQC version is identical (0.12.1)

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [93]:
# open the file "my_dockerfile" in a text editor
!cat my_dockerfile

# this is the base image the container is built on. In this case, it is a slim version of the Debian operating system.
FROM debian:bullseye-slim

# these are the labels that are added to the image. They are metadata that can be used to identify the author of the image.
LABEL image.author.name "yourname"
LABEL image.author.email "yourmail"

# !TODO: add the command that is run to install the dependencies for the image. In this case, it should be updating the package list and installing curl and cowsay.



# !TODO: add an ENV line to set environmental variables. In this case, it should set the PATH variable to /usr/games. Explain in the notebook why this is necessary.


### Explain the RUN and ENV lines you added to the file

RUN updates and installs packages  
ENV adds directory to path

In [1]:
# build the docker image
!docker build -t my_image -f my_dockerfile .

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.1s (1/3)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 1.31kB                                     0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.1s
 => [auth] library/debian:pull token for registry-1.docker.io              0.0s
[?25h[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (2/3)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 1.31kB                                     0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[34m => [auth] library/debian:pull token for registry-1.docker.io              0.0s
[0m

In [2]:
# make sure that the image has been built
!docker image ls my_image

REPOSITORY   TAG       IMAGE ID       CREATED         SIZE
my_image     latest    64f5d4d7f6a0   7 minutes ago   201MB


In [3]:
# run the docker file 
!docker run my_image cowsay "Docker is working"

 ___________________
< Docker is working >
 -------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [None]:
# use the file "salmon_docker" in this directory to build a new docker image

In [4]:
# build the image
!docker build -t salmon -f salmon_docker .

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 821B                                       0.0s
[0m => WARN: FromPlatformFlagConstDisallowed: FROM --platform flag should no  0.0s
 => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 821B                                       0.0s
[0m => WARN: FromPlatformFlagConstDisallowed: FROM --platform flag should no  0.0s
 => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A

In [5]:
# run the docker image to give out the version of salmon
!docker run salmon salmon --version

salmon 1.5.2


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Find the salmon docker image online and run it on your computer.

What is https://biocontainers.pro/ ?

BioContainers is a community-driven project that provides the infrastructure and basic guidelines to create, manage and distribute bioinformatics packages (e.g conda) and containers (e.g docker, singularity).

In [6]:
!docker pull combinelab/salmon
!docker run combinelab/salmon salmon --version

Using default tag: latest
latest: Pulling from combinelab/salmon
Digest: sha256:cefd8bb0b2ed9b07f22b5f0fc317ddda540e5b0dc00810d1ff0d92fee5d80370
Status: Image is up to date for combinelab/salmon:latest
docker.io/combinelab/salmon:latest
[1m
What's next:[0m
    View a summary of image vulnerabilities and recommendations → [36mdocker scout quickview combinelab/salmon[0m
salmon 1.10.3
