# A short introduction to containerized software

After spending time using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [2]:
!docker info

Client:
 Version:    27.5.1
 Context:    default
 Debug Mode: false
 Plugins:
  ai: Docker AI Agent - Ask Gordon (Docker Inc.)
    Version:  v1.9.11
    Path:     /usr/local/lib/docker/cli-plugins/docker-ai
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.28.0-desktop.1
    Path:     /usr/local/lib/docker/cli-plugins/docker-buildx
  cloud: Docker Cloud (Docker Inc.)
    Version:  v0.4.27
    Path:     /usr/local/lib/docker/cli-plugins/docker-cloud
  compose: Docker Compose (Docker Inc.)
    Version:  v2.39.2-desktop.1
    Path:     /usr/local/lib/docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.42
    Path:     /usr/local/lib/docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Docker Inc.)
    Version:  v0.2.0
    Path:     /usr/local/lib/docker/cli-plugins/docker-desktop
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.31
    Path:     /usr/local/lib/docker/cli-plugins/docker

### What is a container?

A container is a portable, lightweight self-sufficient, standard unit of software bundling an application with its libraries, dependencies, and configuration files, and it ensures that it runs the same in any computer environments. Special concepts of docker containers include standardized packaging isolation, portability, speed and efficacy.

### Why do we use containers?

Because of their very helpful feauteres: standardized packaging isolation, portability, speed and efficacy.

### What is a docker image?

A Docker image is a read-only template / blueprint used to create Docker containers.  They can be created by using a Dockerfile Afterwards they can be run to create a docker container. A docker image contains all important information to run a part of software, including the application code, dependencies, libraries, configuration files, environment variables and tools..  

### Let's run our first docker image:

### Login to docker

In [None]:
# This you need to do on the command line directly

### Run your first docker container

In [3]:
!docker run hello-world

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world

[1BDigest: sha256:54e66cc1dd1fcb1c3c58bd8017914dbed8701e2d8c74d9262e26bd9cc1642d31
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 h

### Find the container ID

In [5]:
!docker ps -a

CONTAINER ID   IMAGE         COMMAND    CREATED         STATUS                     PORTS     NAMES
126b8a04543a   hello-world   "/hello"   4 minutes ago   Exited (0) 4 minutes ago             clever_chandrasekhar


### Delete the container again, give prove its deleted

In [6]:
!docker rm 126b8a04543a

126b8a04543a


In [7]:
!docker ps -a

CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES


### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

1. First I went to the website and clicked on download now
2. I downloaded FastQC v0.12.1 (Win/Linux zip file) and extracted the zip folder
3. Then I executed the run_fastqc windows-batchfile
4. I opened and ran the fastq file for analysis

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [9]:
# pulled the container from website: container image: community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29

!docker pull community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29



0.12.1--af7a5314d5015c29: Pulling from library/fastqc

[1Bc6865366: Pulling fs layer 
[1B97a3ef36: Pulling fs layer 
[1Bc00c10a5: Pulling fs layer 
[1B74b0f85e: Pulling fs layer 
[1B2b0c44d2: Pulling fs layer 
[1Bb097362e: Pulling fs layer 
[1Ba01cff0b: Pulling fs layer 
[1Bb700ef54: Pulling fs layer 
[1Ba16bbe82: Pulling fs layer 
[1Bacc3b8ff: Pulling fs layer 
[1B47592a0a: Pulling fs layer 
[5Bb700ef54: Pulling fs layer 
[13B6865366: Pull complete ete MB/368.2MB10A[2K[10A[2K[6A[2K[6A[2K[13A[2K[13A[2K[12A[2K[2A[2K[2A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[7A[2K[13A[2K[13A[2K[7A[2K[7A[2K[13A[2K[7A[2K[13A[2K[13A[2K[13A[2K[9A[2K[9A[2K[13A[2K[9A[2K[13A[2K[9A[2K[9A[2K[9A[2K[13A[2K[7A[2K[9A[2K[9A[2K[9A[2K[13A[2K[9A[2K[9A[2K[9A[2K[7A[2K[13A[2K[13A[2K[13A[2K[9A[2K[9A[2K[9A[2K[13A[2K[9A[2K[13A[2K[9A[2K[13A[2K[9A[2K[13A[2K[9A[2K[9A[2K[13A[2K[9A[2K[9A[2K[9A

In [10]:
!docker images

REPOSITORY                                                 TAG                        IMAGE ID       CREATED         SIZE
hello-world                                                latest                     54e66cc1dd1f   7 weeks ago     20.3kB
community.wave.seqera.io/library/fastqc                    0.12.1--af7a5314d5015c29   b7f6caf35926   11 months ago   1.37GB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0       e0de72408557   17 months ago   1.99GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0          099d0e113ec8   18 months ago   1.82GB
quay.io/nf-core/ubuntu                                     20.04                      59e9d08d8dc1   2 years ago     110MB
quay.io/biocontainers/pandas                               1.5.2                      cbb54fcf8730   2 years ago     493MB
quay.io/biocontainers/gsea                                 4.3.2--hdfd78af_0          0010041fff53   2 years ago     849MB
quay.io/bioco

In [None]:
# run the container and save the results to a new "fastqc_results" directory

!docker run community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29

In [23]:
# save the results to a new "fastqc_results" directory

!docker run\
    -v "/mnt/c/Users/Mayal/Documents/Uni Tübingen/3. Semester/Practical - Computational workflows/computational-workflows-2025/notebooks/day_02/SRFetch_results/SRFetch_results/fastq:/data" \
    -v "/mnt/c/Users/Mayal/Documents/Uni Tübingen/3. Semester/Practical - Computational workflows/computational-workflows-2025/notebooks/day_02/SRFetch_results/SRFetch_results/fastqc_results:/results" \
    community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29 \
    fastqc /data/SRX19144486_SRR23195516_1.fastq.gz -o /results


application/gzip
Started analysis of SRX19144486_SRR23195516_1.fastq.gz
Approx 5% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 10% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 15% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 20% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 25% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 30% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 35% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 40% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 45% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 50% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 55% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 60% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 65% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 70% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 75% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 80% complete for SRX19144486_SRR23195

### Now that you know how to use a docker container, which approach was easier and which approach will be easier in the future?

At least today:  
The first software FastQC v0.12.1 took much less time to get downloaded and it was very easy to open the fasqc file. Also the runtime was only a few minutes, wereas docker was much more complicated to get the container and in the end run the analasis on the file. Furthermore it needed much more time (nearly 15 minutes) to run it.

(But both seem to provide the same results)


### What would you say, which approach is more reproducible?

I mean the purpose of why we use containers is that they are standardized and therefore very reproducable, therefore working with docker will provide more reproducable results.

### Compare the file to last weeks fastqc results, are they identical?
### Is the fastqc version identical?

(Hard to coompare since we have only the report from yesterday with all 32 samples and today a single fastqc file which is also named differently.)

Otherwise typically we could compare for example Per Sequence Quality scores and others...

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [None]:
# open the file "my_dockerfile" in a text editor
# done

### Explain the RUN and ENV lines you added to the file

RUN: running a command  
ENV: defining a variable in the container

In [None]:
!pwd

/mnt/c/Users/Mayal/Documents/Uni Tübingen/3. Semester/Practical - Computational workflows/computational-workflows-2025/notebooks/day_03_part2


In [53]:
# build the docker image
!docker build -t cowsay -f "/mnt/c/Users/Mayal/Documents/Uni Tübingen/3. Semester/Practical - Computational workflows/computational-workflows-2025/notebooks/day_03_part2/my_dockerfile" . 


[1A[1B[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                          docker:default
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 871B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                          docker:default
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 871B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5

In [54]:
# make sure that the image has been built
!docker images

REPOSITORY                                                 TAG                        IMAGE ID       CREATED          SIZE
cowsay                                                     latest                     07c6b728b427   10 seconds ago   191MB
hello-world                                                latest                     54e66cc1dd1f   7 weeks ago      20.3kB
community.wave.seqera.io/library/fastqc                    0.12.1--af7a5314d5015c29   b7f6caf35926   11 months ago    1.37GB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0       e0de72408557   17 months ago    1.99GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0          099d0e113ec8   18 months ago    1.82GB
quay.io/nf-core/ubuntu                                     20.04                      59e9d08d8dc1   2 years ago      110MB
quay.io/biocontainers/pandas                               1.5.2                      cbb54fcf8730   2 years ago      493MB
quay.

In [55]:
# run the docker file 
!docker run cowsay cowsay "Hello World"

 _____________
< Hello World >
 -------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [14]:
# use the file "salmon_docker" in this directory to build a new docker image
!docker build -t salmon -f "/mnt/c/Users/Mayal/Documents/Uni Tübingen/3. Semester/Practical - Computational workflows/computational-workflows-2025/notebooks/day_03_part2/salmon_docker" "/mnt/c/Users/Mayal/Documents/Uni Tübingen/3. Semester/Practical - Computational workflows/computational-workflows-2025/notebooks/day_03_part2"  

failed to fetch metadata: fork/exec /usr/local/lib/docker/cli-plugins/docker-buildx: no such file or directory

DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
            Install the buildx component to build images with BuildKit:
            https://docs.docker.com/go/buildx/

Sending build context to Docker daemon   1.29MB
Step 1/7 : FROM debian:bullseye-slim
 ---> 78305db6185d
Step 2/7 : LABEL image.author.name="yourname"
 ---> Using cache
 ---> 666dddbc2ac8
Step 3/7 : LABEL image.author.email="yourmail"
 ---> Using cache
 ---> 9f1349faee9d
Step 4/7 : RUN apt-get update && apt-get install -y     curl     tar     gzip     && rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 30e5fdb1d34e
Step 5/7 : RUN curl -L https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz     | tar -xz -C /opt     && ln -s /opt/salmon-1.5.2_linux_x86_64/bin/salmon /usr/bin/salmon
 ---> Using cache
 ---> 749050085b4a
Step 6/7 : E

In [15]:
# build the image
# again? or shall we proof that it worked?

# proof
!docker images


REPOSITORY                                  TAG                     IMAGE ID       CREATED              SIZE
salmon                                      1.5.2                   8107cd1772f3   About a minute ago   306MB
salmon                                      latest                  8107cd1772f3   About a minute ago   306MB
<none>                                      <none>                  3b586a6fed4f   2 minutes ago        81.8MB
debian                                      bullseye-slim           78305db6185d   3 days ago           80.7MB
quay.io/biocontainers/r-shinyngs            1.8.8--r43hdfd78af_0    3ae022b36dce   17 months ago        1.34GB
quay.io/biocontainers/pandas                1.5.2                   f82af9c4f9b8   2 years ago          342MB
quay.io/biocontainers/r-base                4.2.1                   8932c551c68b   2 years ago          825MB
quay.io/biocontainers/bioconductor-deseq2   1.34.0--r41hc247a5b_3   375cfc132ace   2 years ago          1.14GB
quay.io

In [16]:
# run the docker image to give out the version of salmon

!docker run salmon

salmon 1.5.2


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Find the salmon docker image online and run it on your computer.

What is https://biocontainers.pro/ ?

In [17]:
!docker pull quay.io/biocontainers/salmon:0.11.3--h86b0361_2
!docker run --rm quay.io/biocontainers/salmon:0.11.3--h86b0361_2 salmon --version

0.11.3--h86b0361_2: Pulling from biocontainers/salmon

[1B95caeb02: Pulling fs layer 
[1Bc00e8b61: Pulling fs layer 
[1Bde50789a: Pulling fs layer 
[1B8b9f3d2a: Pulling fs layer 
[1B99a2256f: Pulling fs layer 
[1B336f2e44: Pulling fs layer 
[1B3e01f2b6: Pulling fs layer 
[1Bbb32200b: Pulling fs layer 
[1BDigest: sha256:54e0c2f0e0d6c04a9e6cf8f4a75f3fe31c1512851edc5be430ad7b28cb71e95e[2K[6A[2K[5A[2K[4A[2K[2A[2K[3A[2K[2A[2K[3A[2K[3A[2K[3A[2K[2A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[

biocontainers.pro is a community-driven project, based on frameworks such as docker or conda, consisiting of different components. While "Registry" includes a list of bioinformatics containers and workflows with metadata and statistics, "Specifications" includes specifications, architecture to create, deploy and maintain software containers using Conda and Docker technologies. The third component "Resources" provides multiple tools and resources to work with containers. They furthermore have many partners such as bioconda or nextflow.

## Are there other ways to create Docker (or Apptainer) images?

What is https://seqera.io/containers/ ?

Startng with nextflow, Seqera was founded to empower scientists with modern software engineering and data analysis tools built on an open-science foundation. Providing tools like nextflow the idea was/is to make data-intensive research scalable, flexible, and collaborative. On its website also packages can be added to fetch containers with provided images.