# 1 - Dependencies management

## Theory [2]

We will start with a few theoretical questions:

* [0.5] What is conda? How it differs from apt, yarn, and others? Additionally, discuss the role of Mamba in the Conda ecosystem. How does Mamba improve upon Conda, and what are the potential benefits and drawbacks of using Mamba over Conda?

* [0.5] What is Docker, and how it differs from dependencies management systems? From virtual machines?

* [0.5] What are the advantages and disadvantages of using containers over other approaches?

* [0.5] Explain how Docker works: what are Dockerfiles, how are containers created, and how are they run and destroyed?




## Problem [6.25]

The problem itself is relatively simple.

Imagine that you developed an excellent RNA-seq analysis pipeline and want to share it with the world. Based on your experience, you are confident that the popularity of the pipeline will be proportional to its ease of use. So, you decided to help your future users and to pack all dependencies in a Conda environment and a Docker container.

Here is the list of tools and their versions that are used in your work:
* [fastqc](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), v0.11.9
* [bcftools](https://github.com/samtools/bcftools), v1.18
* [samtools](https://github.com/samtools/samtools), v1.16.1
* [multiqc](https://github.com/ewels/MultiQC), v1.13



**Anaconda**:

* [1] Install conda (or mamba it will be better), create a new virtual environment, and install all necessary packages.

*You won't be able to install some tools - that's fine. List their names, and explain what should be done to make them conda-friendly ([conda-forge](https://conda-forge.org/docs/maintainer/adding_pkgs.html) channel, [bioconda](https://bioconda.github.io/contributor/workflow.html) channel).*
* [0.5] Download the full human genome (e.g., GRCh38 or hg19) and index it using samtools. Write a simple script (should be executable) using samtools and bedtools:

    1.   Download a BAM file from the [1000 Genomes Project](https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/). Download only BAM file!
    2.   Index the downloaded BAM file.
    3.   Convert the BAM file to a sorted BAM file.
    4.   From the header of BAM file extract the info that you think is important and describe why.
* [0.25] Export the environment  to the file and verify that it can be rebuilt from the file without problems.


**Docker**:
* [3.5] Create a Dockerfile for a container with **all** required dependencies. Don't forget about comments; test that all tools are accessible and work inside the container. Repeat steps 1 -- 4 from Conda section, write a script for it. Hints:
 - You are not allowed to use conda or conda image here.
 - If needed, grant rights to execute downloaded/compiled binaries using chmod (`chmod a+x BINARY_NAME`)
 - Move all executables to $PATH folders (e.g.`/usr/local/bin`) to make them accessible without specifying the full path.
 - Typical command to run a container interactively and delete on exit.
* [1] Use [hadolint](https://hadolint.github.io/hadolint/) and remove as many reported warnings as possible.
* [0.5] Add relevant [labels](https://docs.docker.com/engine/reference/builder/#label), e.g. maintainer, version, etc. ([hint](https://medium.com/@chamilad/lets-make-your-docker-image-better-than-90-of-existing-ones-8b1e5de950d))

## Extra points [1.75]

You will be awarded extra points for the following:
* [1.25] Minimizing the size of the final Docker image. That is, removing all intermediates, unnecessary binaries/caches, etc. Don't forget to compare & report the final size before and after all the optimizations.

* [0.4] Create an extra Dockerfile that starts from [a conda base image](https://hub.docker.com/r/continuumio/anaconda3) and builds everything from your conda environment file.

Hint: `conda env create --quiet -f environment.yml && conda clean -a` ([example](https://github.com/nf-core/clipseq/blob/master/Dockerfile))

* [0.1] Share a meme about Docker or your impression of this assignment
