# A short introduction to containerized software

After spending time using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [2]:
!docker info

Client:
 Version:    27.2.0
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.16.2-desktop.1
    Path:     /usr/local/lib/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.2-desktop.2
    Path:     /usr/local/lib/docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.34
    Path:     /usr/local/lib/docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Alpha) (Docker Inc.)
    Version:  v0.0.15
    Path:     /usr/local/lib/docker/cli-plugins/docker-desktop
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /usr/local/lib/docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.25
    Path:     /usr/local/lib/docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.5
    Path:     

### What is a container?

A Docker container is a standardized, encapsulated environment that runs applications. It contains not only the code but all dependencies.

### Why do we use containers?
 Docker allows you to standardize the development and release cycle using a consisting and isolated environment. This allows for running even multiple containers parallel at the same time, since they do not influence one another and have all required parts isolated.
 It also uses less memory and is since most platforms support docker, it is eary to switch environments and ship&run them.

### What is a docker image?
A container image is a standardized package that includes all of the files, binaries, libraries, and configurations to run a container. They can be used to run one or even multiple instances of the application at once as containers.

### Let's run our first docker image:

### Login to docker

In [None]:
# This you need to do on the command line directly

### Run your first docker container

In [3]:
!docker run hello-world

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world

[1Bc7bbc9d7: Download complete [1A[2KDigest: sha256:54e66cc1dd1fcb1c3c58bd8017914dbed8701e2d8c74d9262e26bd9cc1642d31
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For

### Find the container ID

In [None]:
!docker container ls -a
# container ID: ef26577064b3

CONTAINER ID   IMAGE                                                                         COMMAND                  CREATED         STATUS                      PORTS                  NAMES
ef26577064b3   hello-world                                                                   "/hello"                 8 minutes ago   Exited (0) 8 minutes ago                           reverent_ganguly
59a6bf7c51bd   quay.io/biocontainers/bioconductor-deseq2:1.34.0--r41hc247a5b_3               "/usr/local/env-exec…"   2 hours ago     Exited (137) 2 hours ago                           nxf-AKsxTE5POqmtNuAp8XIPSAZD
52fb8fca6e44   community.wave.seqera.io/library/cutadapt_trim-galore_pigz:a98edd405b34582d   "/usr/local/bin/_ent…"   22 hours ago    Exited (0) 17 hours ago                            nxf-tl55d5l277A8Fmzqp83UtYlr
f714d42ce1d9   quay.io/biocontainers/fastqc:0.12.1--hdfd78af_0                               "/usr/local/env-exec…"   23 hours ago    Exited (255) 23 hours ago                    

### Delete the container again, give prove its deleted

In [None]:
!docker container rm ef26577064b3
!docker container ls -a
# container ID not visibile on the list anymore

Error response from daemon: No such container: ef26577064b3
CONTAINER ID   IMAGE                                                                         COMMAND                  CREATED        STATUS                      PORTS     NAMES
59a6bf7c51bd   quay.io/biocontainers/bioconductor-deseq2:1.34.0--r41hc247a5b_3               "/usr/local/env-exec…"   2 hours ago    Exited (137) 2 hours ago              nxf-AKsxTE5POqmtNuAp8XIPSAZD
52fb8fca6e44   community.wave.seqera.io/library/cutadapt_trim-galore_pigz:a98edd405b34582d   "/usr/local/bin/_ent…"   22 hours ago   Exited (0) 17 hours ago               nxf-tl55d5l277A8Fmzqp83UtYlr
f714d42ce1d9   quay.io/biocontainers/fastqc:0.12.1--hdfd78af_0                               "/usr/local/env-exec…"   23 hours ago   Exited (255) 23 hours ago             nxf-E1qatC4Kbh0P29dKJvwhPJT4


### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

In [46]:
#Install it and run it in bash with the fastqc file from tuesday:
!fastqc SRX19144486_SRR23195516_1.fastq.gz -o ./fastqc/

shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
/bin/bash: line 1: fastqc: command not found


### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [19]:
# pull the container
!docker image pull community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29

0.12.1--af7a5314d5015c29: Pulling from library/fastqc

[1Bacc3b8ff: Pulling fs layer 
[1Bc6865366: Download complete MB/368.2MB[2A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2

In [47]:
# run the container and save the results to a new "fastqc_results" directory
!docker run -v /mnt/c/Users/katja/Documents/GitHub/computational-workflows-2025:/mnt/c/Users/katja/Documents/GitHub/computational-workflows-2025 -w /mnt/c/Users/katja/Documents/GitHub/computational-workflows-2025 community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29 fastqc notebooks/day_02/output_ngs/fastq/SRX19144486_SRR23195516_1.fastq.gz -o notebooks/day_03_part2/fastqc


shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
application/gzip
Started analysis of SRX19144486_SRR23195516_1.fastq.gz
Approx 5% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 10% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 15% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 20% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 25% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 30% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 35% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 40% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 45% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 50% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 55% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 60% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 65% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 70% complete for SRX19144486_SRR2319551

### Now that you know how to use a docker container, which approach was easier and which approach will be easier in the future?

Bash scripts in my opinion were still easier to use. But they require more knowledge about the individual tools, which in result would require more time. If time is not available, pipelines and containers should be an easier solution.

### What would you say, which approach is more reproducible?
Docker contains feel more isolated, but flexible than pipelines. 

### Compare the file to last weeks fastqc results, are they identical?
### Is the fastqc version identical?

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [None]:
# open the file "my_dockerfile" in a text editor

### Explain the RUN and ENV lines you added to the file

RUN tells the instructor to run specific command, like installing packages.\
ENV sets an environment variable that will be used by a running container.

In [48]:
# build the docker image
!docker build .

shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
[1A[1B[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/0)                                          docker:default
[?25hpanic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
github.com/docker/buildx/util/gitutil.gitPath({0x27f1f38, 0x1})
	github.com/docker/buildx/util/gitutil/path.go:35 +0x133
github.com/docker/buildx/util/gitutil.New({0xc000790d18, 0x2, 0x0?})
	github.com/docker/buildx/util/gitutil/gitutil.go:51 +0x85
github.com/docker/buildx/build.getGitAttributes({0x2823828, 0xc00069a370}, {0x7fff89ad59ac, 0x1}, {0x0, 0x0})
	github.com/docker/buildx/build/git.go:54 +0x3bb
github.com/docker/buildx/build.BuildWithResultHandler({0x2823828, 0xc00069a370}, {0xc00070e7e0, 0x1, 0x1}, 0xc000792a08, 0xc000709950, {0xc000500480, 0x1c}, {0x28239b0, ...}, ...)
	github.com/docker/buildx/build/build.go:223 +0x7

In [None]:
# make sure that the image has been built


In [None]:
# run the docker file 


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [None]:
# use the file "salmon_docker" in this directory to build a new docker image

In [None]:
# build the image


In [None]:
# run the docker image to give out the version of salmon


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Find the salmon docker image online and run it on your computer.

What is https://biocontainers.pro/ ?

## Are there other ways to create Docker (or Apptainer) images?

What is https://seqera.io/containers/ ?