# miRNA Detection using miRDeep2

## 1. Installation

Followed step by step tutorial: https://drmirdeep.github.io/mirdeep2_tutorial.html

In [None]:
git clone https://github.com/rajewsky-lab/mirdeep2.git
git clone https://github.com/Drmirdeep/mirdeep2_patch.git
cd mirdeep2
perl install.pl
miRDeep2.pl 
cd mirdeep2_patch
bash patchme.sh

$PATH should contain mirdeep2/bin before mirdeep_patch
Download the miRBase reference files for version 22 (used the newest 22 instead of the old version 21 of the tutorial)

In [None]:
perl mirbase.pl 22

This will download the hairpin.fa.gz and mature.fa.gz for version 22 to directory mirbase/22/
If you want the gff files as well then you need to type

In [None]:
perl mirbase.pl 22 1

Extract the mature sequences from the mirbase file downloaded before and get the hairpin sequences. I used mature miRNA information from rat(rno) and human (hsa), which should improve the performance, according to the tutorial.

In [None]:
extract_miRNAs.pl ~/mirbase/22/mature.fa.gz mmu > mature_ref.fa
extract_miRNAs.pl ~/mirbase/22/hairpin.fa.gz mmu > hairpin_ref.fa
extract_miRNAs.pl ~/mirbase/22/mature.fa.gz rno,hsa > mature_other.fa

In the next step, the small RNA sequencing adapter has to be clipped. According to the supplementary material TruSeq Small RNA
Library Prep Kit (Illumina). Then searched the Illumina Website (https://support.illumina.com/bulletins/2016/12/what-sequences-do-i-use-for-adapter-trimming.html) for the corresponding adapter and came up to 
TruSeq Small RNA: TGGAATTCTCGGGTGCCAAGG
Do a simple check to see how many of the sequences in a sample contain the adapter:

In [None]:
grep -c TGGAATTC SRR5144155_1.fastq

## 2. Data preprocessing for novel miRNA prediction

For novel miRNA prediction we need to map the reads against a reference database which has to be indexed by bowtie 1.

For this I will take the indexes already built for the circRNA detection.
Reference Database file: mm10.fa
Index files: mm10.1.ebwt …

The mapper.pl module will do preprocessing (adapter clipping,...) and mapping.

Example command for first sample:

In [None]:
mapper.pl /nfs/home/students/ciora/data/mouse_brain_GSE100265/fastq/miRNA_SRP096019/SRR5144155_1.fastq -e -h -i -j -k TGGAATTC -l 18 -m -p /data/home/students/ciora/methods/miRDeep2/reference/mm10 -s /data/home/students/ciora/methods/miRDeep2/output/mapping/SRR5144155_1_reads_collapsed.fa -t /data/home/students/ciora/methods/miRDeep2/output/mapping/SRR5144155_1_reads_vs_ref.arf -v -o 4

Use a script for all samples:

In [None]:
cd ~/methods/miRDeep2/output/
bash  /nfs/home/students/ciora/circRNA-detection/scripts/preprocesing_miRNA_miRDeep2.sh

## 3. Identification of known and novel miRNAs

The module miRDeep2.pl will call all necessar modules for the identification step.

Example command for first sample:

In [None]:
miRDeep2.pl /nfs/home/students/ciora/methods/miRDeep2/output/mapping/SRR5144155_1_reads_collapsed.fa /data/home/students/ciora/methods/miRDeep2/reference/mm10.fa /data/home/students/ciora/methods/miRDeep2/output/mapping/SRR5144155_1_reads_vs_ref.arf /data/home/students/ciora/methods/miRDeep2/reference/mature_ref.fa /data/home/students/ciora/methods/miRDeep2/reference/mature_other.fa /data/home/students/ciora/methods/miRDeep2/reference/hairpin_ref.fa -t mmu 2>/data/home/students/ciora/methods/miRDeep2/output/SRR5144155_report.log

Use a script for more all samples:

In [None]:
cd ~/methods/miRDeep2/output/
bash  /nfs/home/students/ciora/circRNA-detection/scripts/miRNA_identification_miRDeep2.sh 1 1