# A short introduction to containerized software

After spending using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [2]:
!docker info

Client:
 Version:    27.2.0
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.16.2-desktop.1
    Path:     /Users/Jessie/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.2-desktop.2
    Path:     /Users/Jessie/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.34
    Path:     /Users/Jessie/.docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Alpha) (Docker Inc.)
    Version:  v0.0.15
    Path:     /Users/Jessie/.docker/cli-plugins/docker-desktop
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /Users/Jessie/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.25
    Path:     /Users/Jessie/.docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    V

### What is a container?

It is a standard software unit that packages up code with all its dependencies so that the program runs the same on any OS. The container includes all code, libraries, settings, etc. needed to run an application.

### Why do we use containers?

The containers include everything needed to run an application and ensure that the application runs quickly and reliably on any computing environment.

### What is a docker image?

It is an executable file that is used to create the Docker container. It includes all of the files, binaries, libraries, and configurations to run a container.

### Let's run our first docker image:

### Login to docker

In [1]:
# This you need to do on the command line directly

### Run your first docker container

In [6]:
!docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (arm64v8)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



### Find the container ID

cf1e890891187e7636d49bcd3120af3530a08cc53a1ac01b630a9ee9495e8c42

### Delete the container again, give prove its deleted

In [7]:
!docker rm /cf1e890891187e7636d49bcd3120af3530a08cc53a1ac01b630a9ee9495e8c42

/cf1e890891187e7636d49bcd3120af3530a08cc53a1ac01b630a9ee9495e8c42


In [9]:
!docker logs cf1e890891187e7636d49bcd3120af3530a08cc53a1ac01b630a9ee9495e8c42

Error response from daemon: No such container: cf1e890891187e7636d49bcd3120af3530a08cc53a1ac01b630a9ee9495e8c42


### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

1. downloaded and expanded fastqc_v0.12.1.zip
2. run chmod +x fastqc to make the file executable
3. added fastqc to my path (/usr/local/bin/fastqc)
4. ran fastqc SRX19144488_SRR23195511_1.fastq.gz
...

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [18]:
# pull the container
!docker pull community.wave.seqera.io/library/fastqc:0.12.1--d3caca66b4f3d3b0

0.12.1--d3caca66b4f3d3b0: Pulling from library/fastqc

[1B4a5e885d: Pulling fs layer 
[1Bf7ad9b3c: Pulling fs layer 
[1Bf16d8faf: Pulling fs layer 
[1Bb700ef54: Pulling fs layer 
[1B8fb245f5: Pulling fs layer 
[1B96a0b3ed: Pulling fs layer 
[1B21e027fa: Pulling fs layer 
[1B98df307d: Pulling fs layer 
[1B81345b94: Pulling fs layer 
[1B89cafbe9: Pulling fs layer 
[4B98df307d: Waiting fs layer 
[4B81345b94: Waiting fs layer 
[1B7c3c4e61: Pull complete 361MB/361MBBBB[13A[2K[11A[2K[11A[2K[11A[2K[11A[2K[13A[2K[11A[2K[13A[2K[11A[2K[13A[2K[11A[2K[13A[2K[11A[2K[13A[2K[11A[2K[13A[2K[11A[2K[13A[2K[11A[2K[13A[2K[11A[2K[13A[2K[13A[2K[13A[2K[8A[2K[7A[2K[13A[2K[13A[2K[6A[2K[13A[2K[5A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[4A[2K[13A[2K[13A[2K[2A[2K[1A[2K[1A[2K[13A[2K[13A[2K[1A[2K[13A[2K[1A[2K[13A[2K[13A[2K[1A[2K[13A[2K[1A[2K[1A[2K[12A[2K[1A[2K[11A[2K[7A[2K[4A[2

In [25]:
# run the container and save the results to a new "fastqc_results" directory
!mkdir /Users/Jessie/PycharmProjects/comp_workflows/day4/fastqc_results

mkdir: /Users/Jessie/PycharmProjects/comp_workflows/day4/fastqc_results: File exists


In [35]:
!docker tag community.wave.seqera.io/library/fastqc:0.12.1--d3caca66b4f3d3b0 fastqc-image

In [37]:
!docker run -v /Users/Jessie/PycharmProjects/comp_workflows/day2/files/fastq:/data -v /Users/Jessie/PycharmProjects/comp_workflows/day4/fastqc_results:/output fastqc-image fastqc /data/SRX19144488_SRR23195511_1.fastq.gz -o /output

application/gzip
Started analysis of SRX19144488_SRR23195511_1.fastq.gz
Approx 5% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 10% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 15% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 20% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 25% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 30% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 35% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 40% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 45% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 50% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 55% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 60% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 65% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 70% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 75% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 80% complete for SRX

In [41]:
# Alternatively, from the command line, I also tried:
!docker run --rm -it -v /Users/Jessie/PycharmProjects/comp_workflows/day2/files/fastq:/data fastqc-image

[?2004h(base) ]0;root@a3042d8e9183: /tmproot@a3042d8e9183:/tmp# ^C[?2004l
[?2004l
[?2004h(base) ]0;root@a3042d8e9183: /tmproot@a3042d8e9183:/tmp# zsh:cd:1: no such file or directory: /data
Skipping 'SRX19144488_SRR23195511_1.fastq.gz' which didn't exist, or couldn't be read


### Now that you know how to use a docker container, which approach between running everything manually and using docker was easier and which approach will be easier in the future?

Running FastQC manually was easy since I already has the tool installed on my laptop and could simply run the command "fastqc SRX19144486_SRR23195516_1.fastq.gz". With Docker there is a steeper learning curve, so it took me longer since I had to figure out the command to run the correct Docker image with my input file. In the future it would probably be easier to run Docker, especially for tools that I don't already have locally installed.

### What would you say, which approach is more reproducible?

Using the Docker container is more reproducible since everything is containerized and packaged, and therefore the results are independent of my computing environment. The manual installation of FastQC is dependent on my specific OS.

### Compare the file to last weeks fastqc results, are they identical?
### Is the fastqc version identical?

For me, both results from the previous analysis (Sham_oxy_2_1.gz) and the Docker run are identical, and the same fastqc version was used for both (version 0.12.1).

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [2]:
# open the file "my_dockerfile" in a text editor

### Explain the RUN and ENV lines you added to the file

The RUN line first updates the package list to make sure that the latest versions of the packages will be installed. Then it installs curl and cowsay (-y is used to automatically answer "yes" to installation prompts. I also added apt-get clean to clean up the image.

The ENV line is necessary because cowsay will be installed in the /usr/games directory, but this directory isn't included in the default PATH. So unless we add this directory to our PATH, we would have to explicitly provide the full path to cowsay (/usr/games/cowsay) if we wanted to run it. But the ENV line ensures that we can execute the "cowsay" command from any location.

In [44]:
# build the docker image
!docker build -t cowsay_image -f my_dockerfile .

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 1.03kB                                     0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 1.03kB                                     0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from my_do

In [45]:
# make sure that the image has been built
!docker images

REPOSITORY                                                 TAG                        IMAGE ID       CREATED          SIZE
cowsay_image                                               latest                     f4798f6b5fbf   37 seconds ago   144MB
community.wave.seqera.io/library/fastqc                    0.12.1--d3caca66b4f3d3b0   0685c8e726ea   3 months ago     925MB
fastqc-image                                               latest                     0685c8e726ea   3 months ago     925MB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0       3ae022b36dce   5 months ago     1.34GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0          db9ec43ce403   5 months ago     1.25GB
docker/welcome-to-docker                                   latest                     648f93a1ba7d   11 months ago    19MB
hello-world                                                latest                     ee301c921b8a   17 months ago    9.14kB

In [51]:
# run the docker file 
!docker run --rm cowsay_image cowsay "mooo!"

 _______
< mooo! >
 -------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [3]:
# use the file "salmon_docker" in this directory to build a new docker image

In [106]:
# build the image
!docker build -t salmon_image -f salmon_docker .

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 922B                                       0.0s
[0m => WARN: FromPlatformFlagConstDisallowed: FROM --platform flag should no  0.0s
 => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 922B                                       0.0s
[0m => WARN: FromPlatformFlagConstDisallowed: FROM --platform flag should no  0.0s
 => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s

In [107]:
# run the docker image to give out the version of salmon

!docker run --rm salmon_image salmon --version

salmon 1.5.2


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Find the salmon docker image online and run it on your computer.

What is https://biocontainers.pro/ ?

In [108]:
!docker pull combinelab/salmon:latest

latest: Pulling from combinelab/salmon

[1B7f213c76: Pulling fs layer 
[1B1ed9ab84: Pulling fs layer 
[1B0bdd40c3: Pulling fs layer 
[1B893c1bc1: Pulling fs layer 
[1B9485d7ab: Pull complete  185B/185B2MBB[3A[2K[4A[2K[3A[2K[3A[2K[5A[2K[3A[2K[3A[2K[5A[2K[3A[2K[3A[2K[2A[2K[5A[2K[3A[2K[3A[2K[5A[2K[3A[2K[3A[2K[5A[2K[3A[2K[3A[2K[3A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[5A[2K[3A[2K[5A[2K[5A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[5A[2K[3A[2K[5A[2K[3A[2K[3A[2K[3A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[4A[2K[3A[2KDigest: sha256:cefd8bb0b2ed9b07f22b5f0fc317ddda540e5b0dc00810d1ff0d92fee5d80370
Status: Downloaded newer image for combinelab/salmon:latest
docker.io/combinelab/salmon:latest
[1m
What's next:[0m
    View a summary of image vulnerabilities and recommendations → [36

In [109]:
!docker run --rm combinelab/salmon salmon --version

salmon 1.10.3


It is not necessary to write a docker image yourself, since they can be found on Dockerhub or Seqera.
BioContainers also hosts pre-built containers for many different bioinformatics tools/workflows, which can be directly pulled as images. The containers are pre-configured with all the necessary dependencies, so that you don't need to set up the environment yourself. This also makes sure that workflows run consistently across different environments. BioContainers focuses specifically on bioinformatics tools.