# A short introduction to containerized software

After spending time using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [5]:
!docker info

Client:
 Version:    28.4.0
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  ai: Docker AI Agent - Ask Gordon (Docker Inc.)
    Version:  v1.9.11
    Path:     /Users/peterbrederlow/.docker/cli-plugins/docker-ai
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.28.0-desktop.1
    Path:     /Users/peterbrederlow/.docker/cli-plugins/docker-buildx
  cloud: Docker Cloud (Docker Inc.)
    Version:  v0.4.29
    Path:     /Users/peterbrederlow/.docker/cli-plugins/docker-cloud
  compose: Docker Compose (Docker Inc.)
    Version:  v2.39.4-desktop.1
    Path:     /Users/peterbrederlow/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.42
    Path:     /Users/peterbrederlow/.docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Docker Inc.)
    Version:  v0.2.0
    Path:     /Users/peterbrederlow/.docker/cli-plugins/docker-desktop
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.31


### What is a container?

A cointainer is an isolated packaged application that runs with the hosts kernel.

Source: What is a container? | Docker Docs. (n.d.). Retrieved October 1, 2025, from https://docs.docker.com/get-started/docker-concepts/the-basics/what-is-a-container/

### Why do we use containers?

To prevent dependency issues due to differing software versions needed for different steps in the pipeline. Like that, all processes run with their needed versions.

### What is a docker image?

A docker image is a standardized package that includes all of the files, binaries, libraries, and configurations to run a container.

Source: What is an image? | Docker Docs. (n.d.). Retrieved October 1, 2025, from https://docs.docker.com/get-started/docker-concepts/the-basics/what-is-an-image/

### Let's run our first docker image:

### Run your first docker container

In [6]:
!docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (arm64v8)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



### Find the container ID

In [7]:
!docker ps -a

CONTAINER ID   IMAGE         COMMAND    CREATED          STATUS                      PORTS     NAMES
2d0d5a48311c   hello-world   "/hello"   1 second ago     Exited (0) 1 second ago               quirky_bohr
a8274af1edab   hello-world   "/hello"   12 minutes ago   Exited (0) 12 minutes ago             nervous_taussig
da03790567b5   hello-world   "/hello"   5 days ago       Exited (0) 5 days ago                 relaxed_spence


### Delete the container again, give prove its deleted

In [8]:
!docker rm 2d0d5a48311c

2d0d5a48311c


In [9]:
!docker ps -a

CONTAINER ID   IMAGE         COMMAND    CREATED          STATUS                      PORTS     NAMES
a8274af1edab   hello-world   "/hello"   16 minutes ago   Exited (0) 16 minutes ago             nervous_taussig
da03790567b5   hello-world   "/hello"   5 days ago       Exited (0) 5 days ago                 relaxed_spence


### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

1. first we looked for the correct nf-core pipeline
2. then we ran the pipeline using nextflow
...

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [10]:
# pull the container
!docker pull community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29

0.12.1--af7a5314d5015c29: Pulling from library/fastqc

[1B2b0c44d2: Pulling fs layer 
[1Ba01cff0b: Pulling fs layer 
[1B7ea432cc: Pulling fs layer 
[1B47592a0a: Pulling fs layer 
[1B74b0f85e: Pulling fs layer 
[1Bb700ef54: Pulling fs layer 
[1Bd6c3110d: Pulling fs layer 
[1B97a3ef36: Pulling fs layer 
[1Bc00c10a5: Pulling fs layer 
[1Bb097362e: Pulling fs layer 
[1Ba16bbe82: Pulling fs layer 
[7Bb700ef54: Pulling fs layer 
[2BDigest: sha256:b7f6caf359264cf86da901b0aa5f66735a6506fcfbf103c66db6987253ad44c113A[2K[13A[2K[13A[2K[2A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[2A[2K[13A[2K[13A[2K[13A[2K[2A[2K[13A[2K[2A[2K[2A[2K[13A[2K[2A[2K[13A[2K[13A[2K[13A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[13A[2K[2A[2K[13A[2K[13A[2K[2A[2K[2A[2K[13A[2K[2A[2K[13A[2K[13A[2K[13A[2K[13A[2K[2A[2K[13A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[13A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[

In [20]:
# run the container and save the results to a new "fastqc_results" directory
!docker run \
    -v "/Users/peterbrederlow/Documents/Uni/MasterBioinformatik/SS25/WorkflowCourse/computational-workflows-2025/":/data \
    community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29 \
    fastqc /data/SRFetch_results/fastq/SRX19144486_SRR23195516_2.fastq.gz \
    -o /data/dockertests

application/gzip
Started analysis of SRX19144486_SRR23195516_2.fastq.gz
Approx 5% complete for SRX19144486_SRR23195516_2.fastq.gz
Approx 10% complete for SRX19144486_SRR23195516_2.fastq.gz
Approx 15% complete for SRX19144486_SRR23195516_2.fastq.gz
Approx 20% complete for SRX19144486_SRR23195516_2.fastq.gz
Approx 25% complete for SRX19144486_SRR23195516_2.fastq.gz
Approx 30% complete for SRX19144486_SRR23195516_2.fastq.gz
Approx 35% complete for SRX19144486_SRR23195516_2.fastq.gz
Approx 40% complete for SRX19144486_SRR23195516_2.fastq.gz
Approx 45% complete for SRX19144486_SRR23195516_2.fastq.gz
Approx 50% complete for SRX19144486_SRR23195516_2.fastq.gz
Approx 55% complete for SRX19144486_SRR23195516_2.fastq.gz
Approx 60% complete for SRX19144486_SRR23195516_2.fastq.gz
Approx 65% complete for SRX19144486_SRR23195516_2.fastq.gz
Approx 70% complete for SRX19144486_SRR23195516_2.fastq.gz
Approx 75% complete for SRX19144486_SRR23195516_2.fastq.gz
Approx 80% complete for SRX19144486_SRR23195

### Now that you know how to use a docker container, which approach was easier and which approach will be easier in the future?

Definitely the nextflow path, since it automatically pulled the docker containers and specified the directories. 

### What would you say, which approach is more reproducible?

Also the nextflow pipeline using docker as a profile, since the installed containers are reproducible.

### Compare the file to last weeks fastqc results, are they identical?
### Is the fastqc version identical?

Yes, it is identical. The version is 0.12.1 in both cases. 

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [None]:
# open the file "my_dockerfile" in a text edit
# !TODO: add the command that is run to install the dependencies for the image. In this case, it should be updating the package list and installing curl and cowsay.
RUN apt-update
RUN apt-get --yes install curl cowsay

### Explain the RUN and ENV lines you added to the file

In [33]:
# build the docker image
!docker build -t cowshoutimage -f /Users/peterbrederlow/Documents/Uni/MasterBioinformatik/SS25/WorkflowCourse/computational-workflows-2025/notebooks/day_03_part2/my_dockerfile .

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.1s (1/3)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 901B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.1s
 => [auth] library/debian:pull token for registry-1.docker.io              0.0s
[?25h[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (2/3)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 901B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[34m => [auth] library/debian:pull token for registry-1.docker.io              0.0s
[0m

In [34]:
# make sure that the image has been built
!docker images

REPOSITORY                                                 TAG                        IMAGE ID       CREATED          SIZE
cowshoutimage                                              latest                     5aac37c50755   29 seconds ago   188MB
hello-world                                                latest                     54e66cc1dd1f   7 weeks ago      16.9kB
community.wave.seqera.io/library/fastqc                    0.12.1--af7a5314d5015c29   b7f6caf35926   11 months ago    1.37GB
quay.io/biocontainers/fq                                   0.12.0--h9ee0642_0         74b59572f1d0   14 months ago    20MB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0       e0de72408557   17 months ago    1.99GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0          099d0e113ec8   18 months ago    1.82GB
quay.io/nf-core/ubuntu                                     20.04                      59e9d08d8dc1   2 years ago      110MB
quay.i

In [35]:
# run the docker file 
!docker run --rm cowshoutimage cowsay "Is it running though?!"

 ________________________
< Is it running though?! >
 ------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [17]:
# use the file "salmon_docker" in this directory to build a new docker image
!docker build -t salmon_docker -f /Users/peterbrederlow/Documents/Uni/MasterBioinformatik/SS25/WorkflowCourse/computational-workflows-2025/notebooks/day_03_part2/salmon_docker .

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.0s (1/1) FINISHED                           docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 796B                                       0.0s
[0m[?25hsalmon_docker:18
--------------------
  16 |         mv salmon-${SALMON_VERSION}_linux_x86_64/bin/salmon /usr/local/bin/ && \
  17 |         rm -rf salmon.tar.gz salmon-${SALMON_VERSION}_linux_x86_64
  18 | >>>     --platform linux/amd64
  19 |     
  20 |     # Set the PATH environment variable (to /usr/bin)
--------------------
ERROR: failed to build: failed to solve: dockerfile parse error on line 18: unknown instruction: --platform


In [1]:
!docker run --interactive -t salmon_docker bash 

[?2004hroot@b85b14d58c04:/# ^C[?2004l
[?2004l
[?2004hroot@b85b14d58c04:/# 

In [12]:
# run the docker image to give out the version of salmon
!docker run --rm salmon_docker 

## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Find the salmon docker image online and run it on your computer.

What is https://biocontainers.pro/ ?

## Are there other ways to create Docker (or Apptainer) images?

What is https://seqera.io/containers/ ?