# A short introduction to containerized software

After spending time using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [None]:
!docker info

### What is a container?

A Docker container is a standardized, encapsulated environment that runs applications. It contains not only the code but all dependencies.

### Why do we use containers?
 Docker allows you to standardize the development and release cycle using a consisting and isolated environment. This allows for running even multiple containers parallel at the same time, since they do not influence one another and have all required parts isolated.
 It also uses less memory and is since most platforms support docker, it is eary to switch environments and ship&run them.\
 Simply, we remove all outside influences. 

### What is a docker image?
A container image is a standardized package that includes all of the files, binaries, libraries, and configurations to run a container. They can be used to run one or even multiple instances of the application at once as containers.

### Let's run our first docker image:

### Login to docker

In [None]:
# This you need to do on the command line directly

### Run your first docker container

In [None]:
!docker run hello-world

### Find the container ID

In [None]:
!docker container ls -a
# container ID: ef26577064b3

### Delete the container again, give prove its deleted

In [None]:
!docker container rm ef26577064b3
!docker container ls -a
# container ID not visibile on the list anymore

### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

In [None]:
#Install it and run it in bash with the fastqc file from tuesday:
!fastqc SRX19144486_SRR23195516_1.fastq.gz -o ./fastqc/

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [None]:
# pull the container
!docker image pull community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29

In [None]:
# run the container and save the results to a new "fastqc_results" directory
!docker run -v /mnt/c/Users/katja/Documents/GitHub/computational-workflows-2025:/mnt/c/Users/katja/Documents/GitHub/computational-workflows-2025 -w /mnt/c/Users/katja/Documents/GitHub/computational-workflows-2025 community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29 fastqc notebooks/day_02/output_ngs/fastq/SRX19144486_SRR23195516_1.fastq.gz -o notebooks/day_03_part2/fastqc


### Now that you know how to use a docker container, which approach was easier and which approach will be easier in the future?

Bash scripts in my opinion were still easier to use. But they require more knowledge about the individual tools, which in result would require more time. If time is not available, pipelines and containers should be an easier solution.

### What would you say, which approach is more reproducible?
Docker contains feel more isolated, but flexible than pipelines. 

### Compare the file to last weeks fastqc results, are they identical?
### Is the fastqc version identical?

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [None]:
# open the file "my_dockerfile" in a text editor

### Explain the RUN and ENV lines you added to the file

RUN tells the instructor to run specific command, like installing packages.\
ENV sets an environment variable that will be used by a running container.

In [None]:
# build the docker image
!docker build --file ./my_dockerfile -t cowsaycurl:csc .

In [None]:
# make sure that the image has been built
!docker image ls

In [None]:
# run the docker file 
!docker run cowsaycurl:csc


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [None]:
# use the file "salmon_docker" in this directory to build a new docker image
!docker build -f ./salmon_docker -t salmon:salmon .

In [None]:
# build the image


In [None]:
# run the docker image to give out the version of salmon


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Find the salmon docker image online and run it on your computer.

What is https://biocontainers.pro/ ?

## Are there other ways to create Docker (or Apptainer) images?

What is https://seqera.io/containers/ ?