# A short introduction to containerized software

After spending time using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [None]:
!docker info

### What is a container?

A Docker container is a standardized, encapsulated environment that runs applications. It contains not only the code but all dependencies.

### Why do we use containers?
 Docker allows you to standardize the development and release cycle using a consisting and isolated environment. This allows for running even multiple containers parallel at the same time, since they do not influence one another and have all required parts isolated.
 It also uses less memory and is since most platforms support docker, it is eary to switch environments and ship&run them.\
 Simply, we remove all outside influences. 

### What is a docker image?
A container image is a standardized package that includes all of the files, binaries, libraries, and configurations to run a container. They can be used to run one or even multiple instances of the application at once as containers.

### Let's run our first docker image:

### Login to docker

In [None]:
# This you need to do on the command line directly

### Run your first docker container

In [5]:
!docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



### Find the container ID

In [None]:
!docker container ls -a
# container ID: 59d02ed05cf5

CONTAINER ID   IMAGE                                                                         COMMAND                  CREATED          STATUS                     PORTS     NAMES
59d02ed05cf5   hello-world                                                                   "/hello"                 10 seconds ago   Exited (0) 9 seconds ago             flamboyant_hamilton
d3ee65153f61   quay.io/biocontainers/bioconductor-tximeta:1.20.1--r43hdfd78af_0              "/usr/local/env-exec…"   4 days ago       Exited (0) 4 days ago                nxf-7Drlxs7VBYRBMvD6dawKJe8w
0e388e92a5a5   community.wave.seqera.io/library/hisat2_samtools:6be64e12472a7b75             "/usr/local/bin/_ent…"   5 days ago       Exited (0) 5 days ago                nxf-COHIGoNBKAGFOs0m2he7VII8
b5dbc9fcf588   community.wave.seqera.io/library/cutadapt_trim-galore_pigz:a98edd405b34582d   "/usr/local/bin/_ent…"   6 days ago       Exited (0) 6 days ago                nxf-RUHXOtVtKSudl4CXFgoEvVTE
e6456af7e3dc   community.wa

### Delete the container again, give prove its deleted

In [7]:
!docker container rm 59d02ed05cf5
!docker container ls -a
# container ID not visibile on the list anymore

59d02ed05cf5
CONTAINER ID   IMAGE                                                                         COMMAND                  CREATED       STATUS                     PORTS     NAMES
d3ee65153f61   quay.io/biocontainers/bioconductor-tximeta:1.20.1--r43hdfd78af_0              "/usr/local/env-exec…"   4 days ago    Exited (0) 4 days ago                nxf-7Drlxs7VBYRBMvD6dawKJe8w
0e388e92a5a5   community.wave.seqera.io/library/hisat2_samtools:6be64e12472a7b75             "/usr/local/bin/_ent…"   5 days ago    Exited (0) 5 days ago                nxf-COHIGoNBKAGFOs0m2he7VII8
b5dbc9fcf588   community.wave.seqera.io/library/cutadapt_trim-galore_pigz:a98edd405b34582d   "/usr/local/bin/_ent…"   6 days ago    Exited (0) 6 days ago                nxf-RUHXOtVtKSudl4CXFgoEvVTE
e6456af7e3dc   community.wave.seqera.io/library/cutadapt_trim-galore_pigz:a98edd405b34582d   "/usr/local/bin/_ent…"   6 days ago    Exited (0) 6 days ago                nxf-7YIyWdmupSRArIXtyjDfMAZB
63242b4f3436   quay.

### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

In [None]:
#Install it and run it in bash with the fastqc file from tuesday:
!fastqc SRX19144486_SRR23195516_1.fastq.gz -o ./fastqc/

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [None]:
# pull the container
!docker image pull community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29

In [None]:
# run the container and save the results to a new "fastqc_results" directory
!docker run -v /mnt/c/Users/katja/Documents/GitHub/computational-workflows-2025:/mnt/c/Users/katja/Documents/GitHub/computational-workflows-2025 -w /mnt/c/Users/katja/Documents/GitHub/computational-workflows-2025 community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29 fastqc notebooks/day_02/output_ngs/fastq/SRX19144486_SRR23195516_1.fastq.gz -o notebooks/day_03_part2/fastqc


### Now that you know how to use a docker container, which approach was easier and which approach will be easier in the future?

Bash scripts in my opinion were still easier to use. But they require more knowledge about the individual tools, which in result would require more time. If time is not available, pipelines and containers should be an easier solution.

### What would you say, which approach is more reproducible?
Docker contains feel more isolated, but flexible than pipelines. 

### Compare the file to last weeks fastqc results, are they identical?
### Is the fastqc version identical?

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [None]:
# open the file "my_dockerfile" in a text editor

### Explain the RUN and ENV lines you added to the file

RUN tells the instructor to run specific command, like installing packages.\
ENV sets an environment variable that will be used by a running container.

In [8]:
# build the docker image
!docker build --file ./my_dockerfile -t cowsaycurl:csc .

[1A[1B[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                          docker:default
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 869B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.4s (1/2)                                          docker:default
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 869B  

In [9]:
# make sure that the image has been built
!docker image ls

REPOSITORY                                                           TAG                        IMAGE ID       CREATED         SIZE
cowsaycurl                                                           csc                        506e5b210463   12 days ago     223MB
hello-world                                                          latest                     54e66cc1dd1f   2 months ago    20.3kB
quay.io/biocontainers/samtools                                       1.22.1--h96c455f_0         23dc2c29f457   3 months ago    109MB
community.wave.seqera.io/library/sortmerna                           4.3.7--b730cad73fc42b8e    3c873f2a4c00   4 months ago    553MB
community.wave.seqera.io/library/hisat2_samtools                     6be64e12472a7b75           c34a62b3e0c6   4 months ago    788MB
quay.io/biocontainers/multiqc                                        1.29--pyhdfd78af_0         f4ebfe78e45d   4 months ago    1.68GB
community.wave.seqera.io/library/picard                             

In [10]:
# run the docker file 
!docker run cowsaycurl:csc


 ____________________
< Hello from Docker! >
 --------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [12]:
# use the file "salmon_docker" in this directory to build a new docker image
!docker build -f ./salmon_docker -t salmon:salmon .

[1A[1B[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                          docker:default
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 790B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                          docker:default
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 790B  

In [14]:
# run the docker image to give out the version of salmon
!docker run salmon:salmon

salmon 1.5.2


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Find the salmon docker image online and run it on your computer.

What is https://biocontainers.pro/ ?

It is a databank of bioinformatics containers and workflows. Additionally, in contains instructions on how to install the containers using docker, singularity of the tool itself with conda.

## Are there other ways to create Docker (or Apptainer) images?

What is https://seqera.io/containers/ ?

A website to automatically build docker images given a list of packages. The images are saved for some time on their servers and instructions on how to download the image are given.