# A short introduction to containerized software

After spending time using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [2]:
!docker info

Client:
 Version:    28.4.0
 Context:    default
 Debug Mode: false
 Plugins:
  ai: Docker AI Agent - Ask Gordon (Docker Inc.)
    Version:  v1.9.11
    Path:     /usr/local/lib/docker/cli-plugins/docker-ai
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.28.0-desktop.1
    Path:     /usr/local/lib/docker/cli-plugins/docker-buildx
  cloud: Docker Cloud (Docker Inc.)
    Version:  v0.4.29
    Path:     /usr/local/lib/docker/cli-plugins/docker-cloud
  compose: Docker Compose (Docker Inc.)
    Version:  v2.39.4-desktop.1
    Path:     /usr/local/lib/docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.42
    Path:     /usr/local/lib/docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Docker Inc.)
    Version:  v0.2.0
    Path:     /usr/local/lib/docker/cli-plugins/docker-desktop
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.31
    Path:     /usr/local/lib/docker/cli-plugins/docker

### What is a container?

A container is an isolated environment that packages software and its dependencies so it runs consistently across different systems.


### Why do we use containers?

We use containers to ensure reproducibility, portability, and usage of software without dependency conflicts.

### What is a docker image?

A Docker image is a template that defines everything needed to create a container, including application code, dependencies, and system libraries.


### Let's run our first docker image:

### Login to docker

In [None]:
# This you need to do on the command line directly

### Run your first docker container

In [3]:
!docker run hello-world

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world

[1BDigest: sha256:54e66cc1dd1fcb1c3c58bd8017914dbed8701e2d8c74d9262e26bd9cc1642d31
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 h

### Find the container ID

In [9]:
!docker ps --all


CONTAINER ID   IMAGE                               COMMAND                  CREATED         STATUS                      PORTS     NAMES
8b933c357a4f   hello-world                         "/hello"                 5 minutes ago   Exited (0) 5 minutes ago              serene_nash
982e35744a96   quay.io/biocontainers/wget:1.20.1   "/bin/bash -c 'eval …"   25 hours ago    Exited (0) 24 hours ago               nxf-AU37dXnuZO6KmvMPnw4KRogd
5978ba9f850a   quay.io/biocontainers/wget:1.20.1   "/bin/bash -c 'eval …"   26 hours ago    Exited (137) 25 hours ago             nxf-SektalNi09DoDa2EXzCOOXQF


### Delete the container again, give prove its deleted

In [10]:
!docker rm 8b933c357a4f 

8b933c357a4f


In [11]:
!docker ps --all

CONTAINER ID   IMAGE                               COMMAND                  CREATED        STATUS                      PORTS     NAMES
982e35744a96   quay.io/biocontainers/wget:1.20.1   "/bin/bash -c 'eval …"   25 hours ago   Exited (0) 24 hours ago               nxf-AU37dXnuZO6KmvMPnw4KRogd
5978ba9f850a   quay.io/biocontainers/wget:1.20.1   "/bin/bash -c 'eval …"   26 hours ago   Exited (137) 25 hours ago             nxf-SektalNi09DoDa2EXzCOOXQF


### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

1. conda install fastqc
2. fastqc <path/to/file.fastq>

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [12]:
# pull the container
!docker pull community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29

0.12.1--af7a5314d5015c29: Pulling from library/fastqc

[1Bc6865366: Pulling fs layer 
[1Bb700ef54: Pulling fs layer 
[1Ba01cff0b: Pulling fs layer 
[1Bd6c3110d: Pulling fs layer 
[1Ba16bbe82: Pulling fs layer 
[1B97a3ef36: Pulling fs layer 
[1Bacc3b8ff: Pulling fs layer 
[1Bb097362e: Pulling fs layer 
[8Bb700ef54: Pulling fs layer 
[1B7ea432cc: Pulling fs layer 
[1B47592a0a: Pulling fs layer 
[1B2b0c44d2: Pulling fs layer 
[13BDigest: sha256:b7f6caf359264cf86da901b0aa5f66735a6506fcfbf103c66db6987253ad44c1[1A[2K[13A[2K[6A[2K[13A[2K[13A[2K[11A[2K[11A[2K[11A[2K[11A[2K[11A[2K[11A[2K[11A[2K[2A[2K[13A[2K[2A[2K[13A[2K[2A[2K[13A[2K[2A[2K[11A[2K[2A[2K[2A[2K[2A[2K[2K[2A[2K[2A[2K[2A[2K[2A[2K[13A[2K[2A[2K[2A[2K[13A[2K[13A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[13A[2K[2A[2K[2A[2K[2A[2K[13A[2K[13A[2K[2A[2K[2A[2K[13A[2K[13A[2K[2A[2K[13A[2K[2A[2K[13A[2K[2A[2K[2A[2K[13A[2K[13A[2

In [14]:
# run the container and save the results to a new "fastqc_results" directory
!docker images --all

REPOSITORY                                                 TAG                        IMAGE ID       CREATED         SIZE
hello-world                                                latest                     54e66cc1dd1f   7 weeks ago     20.3kB
community.wave.seqera.io/library/fastqc                    0.12.1--af7a5314d5015c29   b7f6caf35926   11 months ago   1.37GB
quay.io/biocontainers/fq                                   0.12.0--h9ee0642_0         74b59572f1d0   14 months ago   20MB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0       e0de72408557   17 months ago   1.99GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0          099d0e113ec8   18 months ago   1.82GB
quay.io/nf-core/ubuntu                                     20.04                      59e9d08d8dc1   2 years ago     110MB
quay.io/biocontainers/pandas                               1.5.2                      cbb54fcf8730   2 years ago     493MB
quay.io/biocon

In [21]:
!docker run -v "/home/clara/computational-workflows-2025/notebooks/day_02/results/fastq/":/data -v "/home/clara/computational-workflows-2025/notebooks/day_03_part2":/out b7f6caf35926 fastqc "/data/SRX19144486_SRR23195516_1.fastq.gz" -o /out

application/gzip
Started analysis of SRX19144486_SRR23195516_1.fastq.gz
Approx 5% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 10% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 15% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 20% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 25% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 30% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 35% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 40% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 45% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 50% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 55% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 60% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 65% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 70% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 75% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 80% complete for SRX19144486_SRR23195

### Now that you know how to use a docker container, which approach was easier and which approach will be easier in the future?

It is equally quick to download the tool via command line vs using docker - however, if you plan to use the tool more regularly, having a container might be easier long-term. 

### What would you say, which approach is more reproducible?

Docker is more reproducible, since it conserves a specific version of the software

### Compare the file to last weeks fastqc results, are they identical?
### Is the fastqc version identical?

The two runs we just performed produced identical results - versions are also identical (0.12.1) here, but could have differed if the container had been older. 

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [None]:
# open the file "my_dockerfile" in a text editor

#The path needs to be specified so the packages are installed in the right place

### Explain the RUN and ENV lines you added to the file

RUN runs a command and is e.g. used to install software
ENV specifies environment variables, like the path to where things should be installed

In [33]:
# build the docker image
!docker build -f ./my_dockerfile ./docker

[1A[1B[0G[?25l
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                          docker:default
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 823B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                          docker:default
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 823B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5s (1/2)                                          docker:default
[34m => [internal] load build definition

In [34]:
# make sure that the image has been built
!docker images --all


REPOSITORY                                                 TAG                        IMAGE ID       CREATED         SIZE
<none>                                                     <none>                     ae0e68e7aa4f   3 seconds ago   232MB
<none>                                                     <none>                     00b2cf5e91b7   7 minutes ago   229MB
hello-world                                                latest                     54e66cc1dd1f   7 weeks ago     20.3kB
community.wave.seqera.io/library/fastqc                    0.12.1--af7a5314d5015c29   b7f6caf35926   11 months ago   1.37GB
quay.io/biocontainers/fq                                   0.12.0--h9ee0642_0         74b59572f1d0   14 months ago   20MB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0       e0de72408557   17 months ago   1.99GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0          099d0e113ec8   18 months ago   1.82GB
quay.io/nf-cor

In [2]:
# run the docker file 
!fortune | docker run -i ae0e68e7aa4f cowsay

 ________________________________
< You will get what you deserve. >
 --------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [None]:
# use the file "salmon_docker" in this directory to build a new docker image
# We got the info on dependencies from salmons' own docker file on github

In [8]:
# build the image
!docker build -f ./salmon_docker ./docker -t salmon

[1A[1B[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
 => [internal] load build definition from salmon_docker                    0.0s
[?25h[1A[1A[0G[?25l[+] Building 0.2s (1/2)                                          docker:default
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 792B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                          docker:default
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 792B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5

In [9]:
!docker images --all 

REPOSITORY                                                 TAG                        IMAGE ID       CREATED          SIZE
salmon                                                     latest                     84aca0d9c838   15 seconds ago   2.09GB
<none>                                                     <none>                     43ca0f015977   19 hours ago     2.08GB
<none>                                                     <none>                     bf2ca02d0dc7   20 hours ago     2.08GB
<none>                                                     <none>                     4afc13586387   20 hours ago     2.08GB
<none>                                                     <none>                     ae0e68e7aa4f   20 hours ago     232MB
<none>                                                     <none>                     00b2cf5e91b7   20 hours ago     229MB
hello-world                                                latest                     54e66cc1dd1f   7 weeks ago      20.3kB
comm

In [10]:
# run the docker image to give out the version of salmon
!docker run 84aca0d9c838 salmon --version

salmon 1.5.2


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Find the salmon docker image online and run it on your computer.

What is https://biocontainers.pro/ ?

In [None]:
!docker pull combinelab/salmon

Using default tag: latest
latest: Pulling from combinelab/salmon

[1B9485d7ab: Pulling fs layer 
[1B0bdd40c3: Pulling fs layer 
[1B1ed9ab84: Pulling fs layer 
[1B7f213c76: Pulling fs layer 
[4BDigest: sha256:cefd8bb0b2ed9b07f22b5f0fc317ddda540e5b0dc00810d1ff0d92fee5d80370[4A[2K[2A[2K[4A[2K[4A[2K[4A[2K[4A[2K[3A[2K[4A[2K[4A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[3A[2K[4A[2K[5A[2K
Status: Downloaded newer image for combinelab/salmon:latest
docker.io/combinelab/salmon:latest
docker: invalid reference format

Run 'docker run --help' for more information


In [3]:
!docker run --rm -it combinelab/salmon salmon --help


salmon v1.10.3

Usage:  salmon -h|--help or 
        salmon -v|--version or 
        salmon -c|--cite or 
        salmon [--no-version-check] <COMMAND> [-h | options]

Commands:
     index      : create a salmon index
     quant      : quantify a sample
     alevin     : single cell analysis
     swim       : perform super-secret operation
     quantmerge : merge multiple quantifications into a single file



Biocontainers.pro is a community project that provides pre-built containers for bioinformatics tools.

## Are there other ways to create Docker (or Apptainer) images?

What is https://seqera.io/containers/ ?

seqera.io/containers is a public registry offering containers optimized for use in nf-core and Nextflow pipelines