<a href="https://colab.research.google.com/github/AskelaAsk/infr/blob/dependencies/Depend.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1 - Dependencies management

***git branch name:*** dependencies

## Theory [2]

As usual, we will start with a few theoretical questions:

* [0.5] What is Docker, and how it differs from dependencies management systems? From virtual machines?
* [0.5] What are the advantages and disadvantages of using containers over other approaches?
* [0.5] Explain how Docker works: what are Dockerfiles, how are containers created, and how are they run and destroyed?
* [0.25] Name and describe at least one Docker competitor (i.e., a tool based on the same containerization technology).
* [0.25] What is conda? How it differs from apt, yarn, and others?

## Problem [6.5]

The problem itself is relatively simple. 

Imagine that you developed an excellent RNA-seq analysis pipeline and want to share it with the world. Based on your experience, you are confident that the popularity of the pipeline will be proportional to its ease of use. So, you decided to help your future users and to pack all dependencies in a Conda environment and a Docker container.

Here is the list of tools and their versions that are used in your work:
* [fastqc](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), v0.11.9
* [STAR](https://github.com/alexdobin/STAR), v2.7.10b
* [samtools](https://github.com/samtools/samtools), v1.16.1
* [picard](https://github.com/broadinstitute/picard), v2.27.5
* [salmon](https://github.com/COMBINE-lab/salmon), commit tag 1.9.0
* [bedtools](https://github.com/arq5x/bedtools2), v2.30.0
* [multiqc](https://github.com/ewels/MultiQC), v1.13



**Anaconda**:

* [1] Install conda, create a new virtual environment, and install all necessary packages. 
* [0.75] You won't be able to install some tools - that's fine. List their names, and explain what should be done to make them conda-friendly ([conda-forge](https://conda-forge.org/docs/maintainer/adding_pkgs.html) channel, [bioconda](https://bioconda.github.io/contributor/workflow.html) channel). 
* [0.25] [Export](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#exporting-the-environment-yml-file) the environment ([example](https://github.com/nf-core/clipseq/blob/master/environment.yml)) to the file and verify that it can be [rebuilt](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file) from the file without problems.



In [None]:
##Install conda
%%bash
MINICONDA_INSTALLER_SCRIPT=Miniconda3-4.5.4-Linux-x86_64.sh
MINICONDA_PREFIX=/usr/local
wget https://repo.continuum.io/miniconda/$MINICONDA_INSTALLER_SCRIPT
chmod +x $MINICONDA_INSTALLER_SCRIPT
./$MINICONDA_INSTALLER_SCRIPT -b -f -p $MINICONDA_PREFIX

In [None]:
## Добавляю каналы Forge && Bioconda
%%bash
conda install --channel defaults conda python=3.9 --yes
conda update --channel defaults --all --yes
conda update -n base -c defaults conda
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict

In [None]:
## init conda env
%%bash
conda create --name hw_env python=3.9
conda activate hw_env

In [None]:
# install conda packages
%%bash
conda install -y star=2.7.10b  
conda install -y samtools=1.16.1 
conda install -y bedtools=2.30.0 
conda install -y salmon=1.9.0
conda install -y fastqc=0.11.9

In [None]:
!pip install multiqc

In [None]:
git clone -b 2.27.5 https://github.com/broadinstitute/picard.git
cd picard
./gradlew shadowJar
java -jar build/libs/picard.jar
./gradlew clean 

In [None]:
!conda env export > hw_env.yml  


**Docker**:
* [3] Create a Dockerfile for a container with **all** required dependencies. Conda usage is not allowed, don't forget about comments; test that all tools are accessible and work inside the container. Hints:
 - If needed, grant rights to execute downloaded/compiled binaries using chmod (`chmod a+x BINARY_NAME`)
 - Move all executables to $PATH folders (e.g.`/usr/local/bin`) to make them accessible without specifying the full path.
 - Typical command to run a container interactively (`-it`) and delete on exit(`--rm`): `docker run --rm -it name:tag`
* [1] Use [hadolint](https://hadolint.github.io/hadolint/) and remove as many reported warnings as possible.
* [0.5] Add relevant [labels](https://docs.docker.com/engine/reference/builder/#label), e.g. maintainer, version, etc. ([hint](https://medium.com/@chamilad/lets-make-your-docker-image-better-than-90-of-existing-ones-8b1e5de950d))

In [None]:
FROM ubuntu:18.04
RUN useradd --create-home --shell /bin/bash app_user
WORKDIR /home/app_user
USER root
RUN 	apt update	&& \
	apt-get update 	&& \
	apt-get install -y wget && 	\
	apt-get install -y make &&	\
	apt install -y openjdk-11-jdk &&\
	apt-get install -y unzip &&	\
	apt-get install -y git 	&&	\
	apt-get install -y python3-pip

# install fastqc rdy
RUN  	echo 'FastQC installation process begin'
RUN  	echo 'FastQC need java to run. So make sure u have it'
RUN	  wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.9.zip
RUN  	unzip fastqc_v0.11.9.zip && rm fastqc_v0.11.9.zip 
RUN   chmod a+x FastQC/fastqc
RUN   echo 'alias fastqc="/FastQC/fastqc"' >> /.bashrc
RUN  	echo 'U can run it now with "fastqc" cmd in terminal'


#install samtools rdy
RUN 	git clone -b 1.16.1 https://github.com/samtools/samtools.git
RUN	  mv samtools/misc samtools_tools && rm -R samtools 
RUN	  echo 'alias samtools="/samtools_tools/samtools.pl"' >> /.bashrc
RUN	  echo "samtools installed"

#gradle done
RUN 	wget https://services.gradle.org/distributions/gradle-7.6-bin.zip
RUN	  mkdir /opt/gradle && unzip -d /opt/gradle gradle-7.6-bin.zip &&\
	    rm gradle-7.6-bin.zip
RUN	  export PATH=$PATH:/opt/gradle/gradle-7.6/bin

#picard rdy
RUN	  git clone -b 2.27.5 https://github.com/broadinstitute/picard.git
RUN	  cd picard/ && ./gradlew shadowJar
RUN	  echo 'alias picard="java -jar ./picard/bin/picard.jar"' >> /.bashrc


#salmon v.1.9.0	
RUN   wget https://github.com/COMBINE-lab/salmon/releases/download/v1.9.0/salmon-1.9.0_linux_x86_64.tar.gz 
RUN   tar -zxvf salmon-1.9.0_linux_x86_64.tar.gz && rm salmon-1.9.0_linux_x86_64.tar.gz 
RUN   chmod a+x salmon-1.9.0_linux_x86_64/bin/salmon && \
      mv salmon-1.9.0_linux_x86_64/bin/salmon /bin/salmon && \
      rm -r salmon-1.9.0_linux_x86_64 
    
#STAR v.2.7.10b
RUN   wget https://github.com/alexdobin/STAR/releases/download/2.7.10b/STAR_2.7.10b.zip
RUN   unzip STAR_2.7.10b.zip && rm STAR_2.7.10b.zip 
RUN   chmod a+x STAR_2.7.10b/Linux_x86_64_static/STAR
RUN   mv STAR_2.7.10b/Linux_x86_64_static/STAR /bin/STAR && \
      rm -r STAR_2.7.10b


RUN pip3 install --upgrade pip    
RUN pip3 install multiqc==1.13

## Extra points [1.5]

You will be awarded extra points for the following:
* [0.5] Using [multi-stage builds](https://docs.docker.com/build/building/multi-stage/) in Docker. E.g. to build STAR and copy only the executable to the final image.

* [0.75] Minimizing the size of the final Docker image. That is, removing all intermediates, unnecessary binaries/caches, etc. Don't forget to compare & report the final size before and after all the optimizations.

* [0.25] Create an extra Dockerfile that starts from [a conda base image](https://hub.docker.com/r/continuumio/anaconda3) and builds everything from your conda environment file. 

Hint: `conda env create --quiet -f environment.yml && conda clean -a` ([example](https://github.com/nf-core/clipseq/blob/master/Dockerfile))
