# A short introduction to containerized software

After spending using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



##### 1. Check if Docker is installed.

In [2]:
!docker info

Client:
 Version:    27.2.0
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.16.2-desktop.1
    Path:     /usr/local/lib/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.2-desktop.2
    Path:     /usr/local/lib/docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.34
    Path:     /usr/local/lib/docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Alpha) (Docker Inc.)
    Version:  v0.0.15
    Path:     /usr/local/lib/docker/cli-plugins/docker-desktop
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /usr/local/lib/docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.25
    Path:     /usr/local/lib/docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.5
    Path:     

### What is a container?

A container runs on top of an host operating system's kernel.<br>
It holds everything needed to run a software application, including the code, libraries, and system tools.<br>
It ensures that the application runs the same way, no matter where it is deployed, making it easier to share and manage software across different environments.

**Source:**<br>
Ibm, Susnjara, S., & Smalley, I. (2024, May 9). Containers. What are containers? https://www.ibm.com/topics/containers


### Why do we use containers?

The packaging of tools in containers avoid installation of single tools and allow to track the
versions used for the analysis.<br>
Further, you can manage the degree of separation between a container's network, storage, and other underlying subsystems from both other containers and the host machine, allowing for precise control over their isolation.<br><br>
**Source:**<br>
“What is Docker?” (2024, September 10). Docker Documentation. https://docs.docker.com/get-started/docker-overview/

### What is a docker image?

A docker image is a standardized package that includes all of the files, binaries, libraries, and configurations to run a container. <br>
Docker images can't be modified once it is created. You can only make a new image or add changes on top of it.<br><br>
**Source:**<br>
“What is Docker?” (2024, September 10). Docker Documentation. https://docs.docker.com/get-started/docker-concepts/the-basics/what-is-an-image/

### Let's run our first docker image:

### Login to docker

In [None]:
# This you need to do on the command line directly

I run "docker login" from the command line to login. 

### Run your first docker container

In [7]:
!docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



### Find the container ID

In [8]:
! docker ps -a
#ps: list running containers
#-a: list all

CONTAINER ID   IMAGE         COMMAND    CREATED          STATUS                      PORTS     NAMES
bbcf397901aa   hello-world   "/hello"   18 seconds ago   Exited (0) 17 seconds ago             eager_lumiere


### Delete the container again, give prove its deleted

In [10]:
#remove container with the respective iD
! docker rm bbcf397901aa

bbcf397901aa


In [11]:
! docker ps -a

CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES


##### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

##### Please describe the steps you took to download and run the software for the example fastq file from last week below:

1. Install FastQC using Conda: conda install -c bioconda fastqc
2. Verify installation with "conda list"
3. Check out "fastq --help" for required parameters
4. Create an Output Directory
5. Run "fastqc *path to file*"


##### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [12]:
# pull the container
! docker pull community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42

0.12.1--5cfd0f3cb6760c42: Pulling from library/fastqc
Digest: sha256:0c524d3abe2642c09c5852299bd79bf78ba0ee2ef040473324caab0826f64d44
Status: Image is up to date for community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42
community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42


In [20]:
# run the container and save the results to a new "fastqc_results" directory
! mkdir -p './fastqc_results'
! docker run -v "/home/maikenaegele/ComputationalWorkflow/Sheet 4/Maike_Naegele_sheet4/fastq":/data -v "/home/maikenaegele/ComputationalWorkflow/Sheet 4/Maike_Naegele_sheet4/fastqc_results":/output community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42 fastqc -o /output /data/SRX19144486_SRR23195516_1.fastq.gz


# -v "...:/data": The -v flag mounts a volume from your local machine into the Docker container.
#                 Here, it's mounting the local directory to the /data directory inside the 
#                 container. 

#-v "...:/output: Another volume is being mounted, but this one maps the local directory to
#                 the /output directory inside the container.
#                 This is where FastQC will store the results of its analysis.

#community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42: 
# This is the Docker image being used. The image contains FastQC version 0.12.1 and
# comes from the community.wave.seqera.io/library repository.
# Docker will download and run this image if it’s not already on your local machine.

#fastqc -o /output /data/SRX19144486_SRR23195516_1.fastq.gz:
# This is the command being executed inside the container.

# -o /output specifies the output directory, where FastQC will place its results. 
    
# /data/SRX19144486_SRR23195516_1.fastq.gz is the input FASTQ file.

application/gzip
Started analysis of SRX19144486_SRR23195516_1.fastq.gz
Approx 5% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 10% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 15% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 20% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 25% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 30% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 35% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 40% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 45% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 50% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 55% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 60% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 65% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 70% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 75% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 80% complete for SRX19144486_SRR23195

### Now that you know how to use a docker container, which approach between running everything manually and using docker was easier and which approach will be easier in the future?

It was really easy to setup Docker.<br>
Everything is pre-configured inside the container and we can run the tool immediately.  
Also, we don't have to worry about conflicts considering versions, dependencies or the machine on which we run the command.

### What would you say, which approach is more reproducible?

Docker is more reproducible. <br>
All necessary software, dependencies, and configurations are included in the container. This means that the software will run the same way regardless of where the container is executed.<br>
The execution of certain tools behaves on all machines and environments identical. 
<br><br>

### Compare the file to last weeks fastqc results, are they identical?


Yes, the results are identical.<br><br>

### Is the fastqc version identical?

Both, the container as well as I (run manually) used bioconda:fastqc=0.12.1 (version 0.12.1)<br><br>

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [21]:
# open the file "my_dockerfile" in a text editor

### Explain the RUN and ENV lines you added to the file

*RUN apt-get update && apt-get install -y curl cowsay*: <br>
- RUN: Execute build commands.<br>
- apt-get update: This updates the package manager's list of available software, ensuring that it has the most current information about which packages can be installed or updated.
- apt-get install -y curl cowsay: This installs the 2 software packages curl and cowsay. <br> The -y flag automatically confirms any prompts during installation.<br><br>

*ENV PATH="/usr/games:$PATH"*: <br>
This command modify the system's PATH variable, making it easier for the container to locate executable files in specified directories <br><br>

In [22]:
# build the docker image
! docker build -f my_dockerfile -t my_dockerimage  .

[1A[1B[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                          docker:default
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 831B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                          docker:default
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 831B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5s (1/2)                                          docker:default
[

In [23]:
# make sure that the image has been built
! docker images

REPOSITORY                                                                 TAG                                          IMAGE ID       CREATED         SIZE
broadinstitute/picard                                                      latest                                       93a0c41fca9d   8 days ago      661MB
my_dockerimage                                                             latest                                       1124f534ce8f   12 days ago     151MB
quay.io/nf-core/bedtools_coreutils                                         a623c13f66d5262b                             02b83cd419b0   3 months ago    154MB
quay.io/biocontainers/star                                                 2.7.11b--h43eeafb_2                          8838afaa4d11   3 months ago    92MB
community.wave.seqera.io/library/fastqc                                    0.12.1--5cfd0f3cb6760c42                     1df9a8700d59   4 months ago    908MB
quay.io/biocontainers/samtools                              

In [24]:
# run the docker file 
! docker run my_dockerimage cowsay "Hello from Cowsay"

 ___________________
< Hello from Cowsay >
 -------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [None]:
# use the file "salmon_docker" in this directory to build a new docker image

In [25]:
# build the image

! docker build -f salmon_docker -t my_salmon_image  .

[1A[1B[0G[?25l[+] Building 0.0s (0/0)  docker:default
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                          docker:default
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 680B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                          docker:default
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 680B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5s (1/2)                                          docker:default
[

In [26]:
# run the docker image to give out the version of salmon
#! docker run my_salmon_dockerimage ls
#! docker run my_salmon_dockerimage ls salmon-1.5.2_linux_x86_64/bin
! docker run my_salmon_image salmon-1.5.2_linux_x86_64/bin/salmon --version

salmon 1.5.2


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

No, bioinformaticians use pre-built Docker images available on public repositories that can be pulled and directly used with the necessary tools and dependencies included.

Find the salmon docker image online and run it on your computer.

In [27]:
! docker pull combinelab/salmon:latest

latest: Pulling from combinelab/salmon

[1B7f213c76: Pulling fs layer 
[1B1ed9ab84: Pulling fs layer 
[1B0bdd40c3: Pulling fs layer 
[1B893c1bc1: Pulling fs layer 
[1B9485d7ab: Pull complete  185B/185B2MBB[4A[2K[4A[2K[3A[2K[3A[2K[3A[2K[5A[2K[3A[2K[3A[2K[3A[2K[2A[2K[3A[2K[5A[2K[3A[2K[3A[2K[5A[2K[3A[2K[5A[2K[5A[2K[1A[2K[5A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[5A[2K[3A[2K[5A[2K[3A[2K[3A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[5A[2K[4A[2K[4A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[2A[2KDigest: sha256:cefd8bb0b2ed9b07f22b5f0fc317ddda540e5b0dc00810d1ff0d92fee5d80370
Status: Downloaded newer image for combinelab/salmon:latest
docker.io/combinelab/salmon:latest


In [28]:
! docker run combinelab/salmon salmon --version

salmon 1.10.3



What is https://biocontainers.pro/ ?

BioContainers is a community project that facilitates the creation, management, and distribution of bioinformatics software packages <br>
and containers using Conda, Docker, and Singularity by packaging bioinformatics tools in container images.<br>
Its goals include providing specifications for developing software, offering ready-to-use containers, creating reproducible workflows, and integrating best practices in documentation and development.<br>
Also it includes Docker and Conda guidelines for building containers, a registry of BioContainers images, and guidelines for contributions.<br><br>
**Source:**<br>
da Veiga Leprevost F, Grüning BA, Alves Aflitos S, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Vera Alvarez R, Griss J, Nesvizhskii AI, Perez-Riverol Y. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. BioContainers: an open-source and community-driven framework for software standardization.
