# A short introduction to containerized software

After spending using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [5]:
!docker info

Client:
 Version:    27.2.0
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.16.2-desktop.1
    Path:     /usr/local/lib/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.2-desktop.2
    Path:     /usr/local/lib/docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.34
    Path:     /usr/local/lib/docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Alpha) (Docker Inc.)
    Version:  v0.0.15
    Path:     /usr/local/lib/docker/cli-plugins/docker-desktop
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /usr/local/lib/docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.25
    Path:     /usr/local/lib/docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version

### What is a container?

A container is a package of software that includes code and all its dependencies needed to run an application [[Source]](https://www.docker.com/resources/what-container/).

### Why do we use containers?

Using a container makes it easy to run an application because your do not have to take care of installing all dependencies yourself. It also makes sure that the computations are not influenced by the computing environment and will always deliver the same results regardless of the infrastructure [[Source]](https://www.docker.com/resources/what-container/).

### What is a docker image?

A docker image is an unchangeable template from which docker containers are created. The image contains all the files, binaries, libraries, and configurations required to run a container [[Source]](https://docs.docker.com/get-started/docker-concepts/the-basics/what-is-an-image/).

### Let's run our first docker image:

### Login to docker

In [None]:
# This you need to do on the command line directly

### Run your first docker container

In [6]:
!docker run hello-world

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world

[1BDigest: sha256:91fb4b041da273d5a3273b6d587d62d518300a6ad268b28628f74997b93171b2
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examp

### Find the container ID

In [7]:
!docker --help


Usage:  docker [OPTIONS] COMMAND

A self-sufficient runtime for containers

Common Commands:
  run         Create and run a new container from an image
  exec        Execute a command in a running container
  ps          List containers
  build       Build an image from a Dockerfile
  pull        Download an image from a registry
  push        Upload an image to a registry
  images      List images
  login       Log in to a registry
  logout      Log out from a registry
  search      Search Docker Hub for images
  version     Show the Docker version information
  info        Display system-wide information

Management Commands:
  builder     Manage builds
  buildx*     Docker Buildx
  compose*    Docker Compose
  container   Manage containers
  context     Manage contexts
  debug*      Get a shell into any image or container
  desktop*    Docker Desktop commands (Alpha)
  dev*        Docker Dev Environments
  extension*  Manages Docker extensions
  feedbac

In [13]:
!docker ps -a

CONTAINER ID   IMAGE                                                 COMMAND                  CREATED         STATUS                     PORTS     NAMES
31b020bd3e8f   hello-world                                           "/hello"                 9 minutes ago   Exited (0) 9 minutes ago             charming_goldberg
0a005d86b72b   quay.io/biocontainers/trim-galore:0.6.7--hdfd78af_0   "/usr/local/env-exec…"   2 days ago      Exited (137) 2 days ago              nxf-wk0TJxRGXheirSoX5kelNjbQ
68e212e12352   quay.io/biocontainers/python:3.9--1                   "/usr/local/env-exec…"   3 days ago      Exited (0) 3 days ago                nxf-mj8T5jSkBbftWpzbiyROgMZC
071fd6a33431   quay.io/biocontainers/python:3.9--1                   "/usr/local/env-exec…"   3 days ago      Exited (0) 3 days ago                nxf-up0Hnsry2UyHx6wbxoVfm4pw


The container id is `31b020bd3e8fd38e3e91beb110a9882993f4c9c203d5d723539c91e6893b1887`

### Delete the container again, give prove its deleted

In [14]:
!docker rm 31b020bd3e8fd38e3e91beb110a9882993f4c9c203d5d723539c91e6893b1887

31b020bd3e8fd38e3e91beb110a9882993f4c9c203d5d723539c91e6893b1887


In [15]:
!docker ps -a

CONTAINER ID   IMAGE                                                 COMMAND                  CREATED      STATUS                    PORTS     NAMES
0a005d86b72b   quay.io/biocontainers/trim-galore:0.6.7--hdfd78af_0   "/usr/local/env-exec…"   2 days ago   Exited (137) 2 days ago             nxf-wk0TJxRGXheirSoX5kelNjbQ
68e212e12352   quay.io/biocontainers/python:3.9--1                   "/usr/local/env-exec…"   3 days ago   Exited (0) 3 days ago               nxf-mj8T5jSkBbftWpzbiyROgMZC
071fd6a33431   quay.io/biocontainers/python:3.9--1                   "/usr/local/env-exec…"   3 days ago   Exited (0) 3 days ago               nxf-up0Hnsry2UyHx6wbxoVfm4pw


### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

1. Download the Win/Linux zip file
2. Take a look at the INSTALL.txt
3. Unzip the zip file it comes in into a suitable location
4. Run `./fastqc /mnt/c/Users/julia/Documents/Uni/Master/Semester_2/CompWorkflows/day_02/results/fastq/SRX19144488_SRR23195511_1.fastq.gz /mnt/c/Users/julia/Documents/Uni/Master/Semester_2/CompWorkflows/day_02/results/fastq/SRX19144488_SRR23195511_2.fastq.gz /mnt/c/Users/julia/Documents/Uni/Master/Semester_2/CompWorkflows/day_02/results/fastq/SRX19144486_SRR23195516_1.fastq.gz /mnt/c/Users/julia/Documents/Uni/Master/Semester_2/CompWorkflows/day_02/results/fastq/SRX19144486_SRR23195516_2.fastq.gz  --outdir=/mnt/c/Users/julia/Documents/Uni/Master/Semester_2/CompWorkflows/day_04/fastqc_results_local`

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [16]:
# pull the container
!docker container create community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42

Unable to find image 'community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42' locally
0.12.1--5cfd0f3cb6760c42: Pulling from library/fastqc

[1Bb3717211: Pulling fs layer 
[1Bf7ad9b3c: Pulling fs layer 
[1Bca300600: Pulling fs layer 
[1Bb700ef54: Pulling fs layer 
[1Bd418774c: Pulling fs layer 
[1Be77ff45c: Pulling fs layer 
[1Bf787139d: Pulling fs layer 
[1B55536720: Pulling fs layer 
[1B62c12ca7: Pulling fs layer 
[1Bcbe24e91: Pulling fs layer 
[1B1e94977b: Pulling fs layer 
[2B1e94977b: Waiting fs layer 
[1B6e1d0b98: Pull complete 4.5MB/404.5MBB[13A[2K[11A[2K[13A[2K[11A[2K[13A[2K[11A[2K[13A[2K[10A[2K[13A[2K[11A[2K[13A[2K[13A[2K[13A[2K[11A[2K[11A[2K[11A[2K[13A[2K[8A[2K[13A[2K[13A[2K[7A[2K[6A[2K[13A[2K[4A[2K[13A[2K[13A[2K[13A[2K[2A[2K[13A[2K[13A[2K[1A[2K[1A[2K[13A[2K[13A[2K[1A[2K[13A[2K[13A[2K[1A[2K[13A[2K[1A[2K[13A[2K[1A[2K[13A[2K[1A[2K[13A[2K[1A[2K[13A[

In [17]:
!docker ps -a

CONTAINER ID   IMAGE                                                              COMMAND                  CREATED         STATUS                    PORTS     NAMES
922cfc0ff8c3   community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42   "/usr/local/bin/_ent…"   3 minutes ago   Created                             ecstatic_bartik
0a005d86b72b   quay.io/biocontainers/trim-galore:0.6.7--hdfd78af_0                "/usr/local/env-exec…"   2 days ago      Exited (137) 2 days ago             nxf-wk0TJxRGXheirSoX5kelNjbQ
68e212e12352   quay.io/biocontainers/python:3.9--1                                "/usr/local/env-exec…"   3 days ago      Exited (0) 3 days ago               nxf-mj8T5jSkBbftWpzbiyROgMZC
071fd6a33431   quay.io/biocontainers/python:3.9--1                                "/usr/local/env-exec…"   3 days ago      Exited (0) 3 days ago               nxf-up0Hnsry2UyHx6wbxoVfm4pw


In [3]:
# run the container and save the results to a new "fastqc_results" directory
!docker container run -v /mnt/c/Users/julia/Documents/Uni/Master/Semester_2/CompWorkflows:/tmp community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42 fastqc /tmp/day_02/results/fastq/SRX19144486_SRR23195516_1.fastq.gz /tmp/day_02/results/fastq/SRX19144486_SRR23195516_2.fastq.gz /tmp/day_02/results/fastq/SRX19144488_SRR23195511_1.fastq.gz /tmp/day_02/results/fastq/SRX19144488_SRR23195511_2.fastq.gz --outdir=/tmp/day_04/fastqc_results

application/gzip
application/gzip
application/gzip
application/gzip
Started analysis of SRX19144486_SRR23195516_1.fastq.gz
Approx 5% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 10% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 15% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 20% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 25% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 30% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 35% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 40% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 45% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 50% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 55% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 60% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 65% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 70% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 75% complete for SRX191444

In [40]:
!docker container run community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42 fastqc --version

FastQC v0.12.1


### Now that you know how to use a docker container, which approach between running everything manually and using docker was easier and which approach will be easier in the future?

The container approach was easier than installing the application locally. It will become even easier to use in the future because running an application with docker always requires the same steps.

### What would you say, which approach is more reproducible?

Using the docker container is more reproducible.

### Compare the file to last weeks fastqc results, are they identical?
### Is the fastqc version identical?

The results of the local run of fastqc (FastQC v0.12.1) and the results of the docker container (FastQC v0.12.1) run are identical.

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [None]:
# open the file "my_dockerfile" in a text editor

### Explain the RUN and ENV lines you added to the file

- The `RUN` instruction `RUN [OPTIONS] <command> ...` will execute any commands to create a new layer on top of the current image.
It begins by updating the package list to ensure that the most recent versions of the packages will be installed. Next, it installs curl and cowsay, with the `-y` option used to automatically confirm installation prompts.
- The `ENV` instruction `ENV <key>=<value> ...` sets the environment variable `<key>` to the value `<value>`. Cowsay will be installed in the `/usr/games` directory, which is not part of the default `PATH`. Therefore, without adding this directory to the `PATH`, we would need to specify the full path to run it. 

[[Source]](https://docs.docker.com/reference/dockerfile/)

In [8]:
# build the docker image
!docker build -f ./my_dockerfile -t julia938/cowsay .

[1A[1B[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
 => [internal] load build definition from my_dockerfile                    0.0s
[?25h[1A[1A[0G[?25l[+] Building 0.2s (1/2)                                          docker:default
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 853B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.1s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                          docker:default
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 853B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A

In [39]:
# make sure that the image has been built
!docker image ls

REPOSITORY                                                 TAG                        IMAGE ID       CREATED          SIZE
julia938/salmon                                            latest                     7bbcf36ae2dd   10 minutes ago   596MB
julia938/cowsay                                            latest                     db22ac292928   41 minutes ago   153MB
julia938/ubuntu_sftp                                       latest                     397bd1ef1fa9   4 weeks ago      95MB
ubuntu_sftp                                                latest                     397bd1ef1fa9   4 weeks ago      95MB
bash                                                       latest                     bd4206c5bc03   2 months ago     14.4MB
python                                                     latest                     ab363ab21d7b   3 months ago     1.02GB
ubuntu                                                     latest                     17c0145030df   4 months ago     76.2MB


In [33]:
# run the docker file 
!docker run julia938/cowsay cowsay Moo

 _____
< Moo >
 -----
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [31]:
# use the file "salmon_docker" in this directory to build a new docker image
!docker build -f ./salmon_docker -t julia938/salmon .

[1A[1B[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                          docker:default
[34m => [internal] load build definition from salmon_docker                    0.1s
[0m[34m => => transferring dockerfile: 610B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.1s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.4s (1/2)                                          docker:default
[34m => [internal] load build definition from salmon_docker                    0.1s
[0m[34m => => transferring dockerfile: 610B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Buildin

In [32]:
# run the docker image to give out the version of salmon
!docker run julia938/salmon salmon --version

salmon 1.5.2


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Find the salmon docker image online and run it on your computer.

In bioinformatics, there are many Docker images available for various use cases. Bioinformaticians do not need to create Docker images from scratch each time. Instead, they can retrieve pre-built images from repositories like Seqera.

In [36]:
!docker pull combinelab/salmon

Using default tag: latest
latest: Pulling from combinelab/salmon

[1B7f213c76: Pulling fs layer 
[1B1ed9ab84: Pulling fs layer 
[1B0bdd40c3: Pulling fs layer 
[1B893c1bc1: Pulling fs layer 
[1BDigest: sha256:cefd8bb0b2ed9b07f22b5f0fc317ddda540e5b0dc00810d1ff0d92fee5d80370[5A[2K[3A[2K[3A[2K[5A[2K[3A[2K[5A[2K[2A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[1A[2K[3A[2K[3A[2K[5A[2K[3A[2K[5A[2K[3A[2K[3A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[4A[2K[3A[2K[3A[2K[3A[2K[2A[2K
Status: Downloaded newer image for combinelab/salmon:latest
docker.io/combinelab/salmon:latest


In [37]:
!docker run combinelab/salmon salmon --version

salmon 1.10.3


What is https://biocontainers.pro/ ?

BioContainers is a community-driven project that offers the infrastructure and guidelines for creating, managing, and distributing bioinformatics packages (e.g., Conda) and containers (e.g., Docker, Singularity). It is built on widely used frameworks such as Conda, Docker, and Singularity. [[Source]](https://biocontainers-edu.readthedocs.io/en/latest/what_is_biocontainers.html).