# A short introduction to containerized software

After spending time using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [1]:
!docker info

Client:
 Version:    28.3.3-1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  0.27.0-1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.39.2
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 2
  Running: 1
  Paused: 0
  Stopped: 1
 Images: 10
 Server Version: 24.0.9-2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: false
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 05044ec0a9a75232cad458027ca83437aae3f4da
 runc version: bc20cb4497af

### What is a container?

A container is a standard unit of software containing one or more tools and their dependencies with specific versions, so that the tool can be run in a fixed and reproducible environment.

### Why do we use containers?

Containers are used to ensure that the environment in which a tool operates is fixed, so that there are no changes in the results that stem from different versions of tools or dependency issues

### What is a docker image?

A docker image is a template that includes certain software from which a container containing those tools can be spun up. They are used to share containers and to ensure that every container for the same use is set up the same.

### Let's run our first docker image:

### Login to docker

In [None]:
# This you need to do on the command line directly

### Run your first docker container

In [2]:
!docker run hello-world

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world

[1Bc7bbc9d7: Pull complete .38kB/2.38kBB[1A[2K[1A[2KDigest: sha256:54e66cc1dd1fcb1c3c58bd8017914dbed8701e2d8c74d9262e26bd9cc1642d31
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hu

### Find the container ID

In [None]:
!docker ps -a
# The ID of the hello-world container is 768ad48732b4

CONTAINER ID   IMAGE                                 COMMAND                  CREATED         STATUS                      PORTS     NAMES
768ad48732b4   hello-world                           "/hello"                 3 minutes ago   Exited (0) 3 minutes ago              nostalgic_margulis
4e7a74671a2c   quay.io/biocontainers/python:3.9--1   "/usr/local/env-exec…"   26 hours ago    Exited (127) 26 hours ago             nxf-xIQ8T3qr9Plk1haTMaFWBAz4
a013b78c311b   nfcore/devcontainer:latest            "/bin/sh -c 'echo Co…"   2 days ago      Up 33 minutes                         vigilant_jennings


### Delete the container again, give proof its deleted

In [None]:
# The container can be deleted using the container rm command in conjunction with the container ID
!docker container rm 768ad48732b4

768ad48732b4


In [None]:
# Checking the containers again reveals that the hello world container has been removed
!docker ps -a

CONTAINER ID   IMAGE                                 COMMAND                  CREATED        STATUS                      PORTS     NAMES
4e7a74671a2c   quay.io/biocontainers/python:3.9--1   "/usr/local/env-exec…"   26 hours ago   Exited (127) 26 hours ago             nxf-xIQ8T3qr9Plk1haTMaFWBAz4
a013b78c311b   nfcore/devcontainer:latest            "/bin/sh -c 'echo Co…"   2 days ago     Up 37 minutes                         vigilant_jennings


### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

1. I searched for a fastqc container online 
2. Go to Seqera and add fastqc to the container
3. Click get container and copy the adress
4. Input >docker run adress< into the console 

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [16]:
# pull the container
!docker run community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29
!docker ps -a
!pwd

CONTAINER ID   IMAGE                                                              COMMAND                  CREATED              STATUS                              PORTS     NAMES
a1dc8ae375d0   community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29   "/usr/local/bin/_ent…"   1 second ago         Exited (0) Less than a second ago             magical_murdock
16eee06a5cb1   community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29   "/usr/local/bin/_ent…"   13 seconds ago       Exited (0) 12 seconds ago                     thirsty_shtern
79ee53208ce2   community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29   "/usr/local/bin/_ent…"   About a minute ago   Exited (2) About a minute ago                 elated_haslett
d1486612c962   community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29   "/usr/local/bin/_ent…"   3 minutes ago        Exited (0) 3 minutes ago                      gallant_goldwasser
19bbef3ce134   bash                                        

In [33]:
# run the container and save the results to a new "fastqc_results" directory
!docker container run -v $pwd:$pwd community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29 fastqc $pwd/test_fastq.fq -o $pwd:$pwd/fastqc_results

Specified output directory ':/fastqc_results' does not exist


### Now that you know how to use a docker container, which approach was easier and which approach will be easier in the future?

### What would you say, which approach is more reproducible?

### Compare the file to last weeks fastqc results, are they identical?
### Is the fastqc version identical?

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [None]:
# open the file "my_dockerfile" in a text editor

### Explain the RUN and ENV lines you added to the file

In [44]:
# build the docker image
# I renamed the docker file because it didn't get found otherwise
!docker rmi cowsay_image
!docker build -t cowsay_image .

Error response from daemon: conflict: unable to remove repository reference "cowsay_image" (must force) - container 9a55eeb26d97 is using its referenced image e969e909289a
[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.2s (2/3)                                          docker:default
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 850B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (2/3)                                          docker:default
[34m => [internal] load build definition from Dockerfile                 

In [45]:
# make sure that the image has been built
!docker images
# We can see that the cowsay container has been created

REPOSITORY                                                 TAG                        IMAGE ID       CREATED          SIZE
cowsay_image                                               latest                     1219891ed21d   30 minutes ago   152MB
<none>                                                     <none>                     e969e909289a   30 minutes ago   152MB
cowsay_container                                           latest                     743c799b185a   30 minutes ago   152MB
nfcore/devcontainer                                        latest                     3302bb600cbe   4 weeks ago      2.77GB
hello-world                                                latest                     1b44b5a3e06a   7 weeks ago      10.1kB
bash                                                       latest                     736ead5a0e94   2 months ago     15.5MB
community.wave.seqera.io/library/fastqc                    0.12.1--af7a5314d5015c29   57ed62363d5f   11 months ago    922MB
quay.i

In [46]:
# run the docker file 
!docker rm container_test_cowsay
!docker run --name container_test_cowsay 1219891ed21d

container_test_cowsay
 _______
< hello >
 -------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [None]:
# use the file "salmon_docker" in this directory to build a new docker image

In [None]:
# build the image


In [1]:
# run the docker image to give out the version of salmon


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Find the salmon docker image online and run it on your computer.

What is https://biocontainers.pro/ ?

Biocontainers is a project to provide guidelines and infrastructure to distribute bioinformatics software and containers for ease of use.

## Are there other ways to create Docker (or Apptainer) images?

What is https://seqera.io/containers/ ?

Seqera is a website that allows one to build container images easily online, by automatically generating a container image with the user's chosen software.