# <span style="color:green">Formation South Green 2022</span> - Structural Variants Detection by using short and long reads 

# __DAY 1 : How to map reads against a reference genome ?__ 

Created by C. Tranchant (DIADE-IRD), J. Orjuela (DIADE-IRD), F. Sabot (DIADE-IRD) and A. Dereeper (PHIM-IRD)


***

# <span style="color: #006E7F">Table of contents</span>
<a class="anchor" id="home"></a>

[I - Preparing data](#data)

* [Download sequencing data (SR & LR) for Simulated clones](#download)

[II - Mapping Practice](#mapping) 
  
[2.1. Mapping short reads vs a reference with  `bwa mem`](#bwamem)

   * [ Reference indexation](#refindex)
   * [Run the mapping with `bwa mem`](#bwamem2-cmd)
   * [Calculate stats from mapping `samtools flagstat`](#flagstats)
   * [Convert sam into bam `samtools view`](#samtoolsview)
   * [Generate a bam file that contains only the reads correctly paired mapped `samtools view`](#corrmap)
   * [Indexing bam fil](#indexbam) 
   * [EXERCICE : MAP ALL SR WITH BWAMEM2](#mapallmem)

[2.2. Mapping long reads vs a reference with  `minimap2`](#minimapé)
   * [EXERCICE : MAP ALL LR WITH MINIMAP2](#mapallminimap)

[III - Centralize final mapping data into a single bam directory](#reorder)

</span>

***


# <span style="color:#006E7F">__I - Preparing data__ <a class="anchor" id="data"></span>  

### <span style="color: #4CACBC;"> Download sequencing data we will use</span>  


In [8]:
# go to work directory and download data
cd /home/jovyan/work/
ls

myFirstJupyterBook.ipynb  SV_DATA  training_SV_teaching


### <span style="color: #4CACBC;"> Download sequencing data (SR & LR) for Simulated clones <a class="anchor" id="download"> </span>  

Before starting, please download special data created for this practical training. Data are available on the from I-Trop server.

Each participant will analyse a Clone, results will be complete in this shared file

To generate Clone data, a 1Mb contig was extracted from chromosome 1 of rice.

20 levels of variation were generated and long reads were simulated for each.

We have introduced different variations (SNP, indel, indel+translocations) and also some contaminations.

In [None]:
# download available compressed DATA 
wget --no-check-certificat -rm -nH --cut-dirs=1 --reject="index.html*" https://itrop.ird.fr/sv-training/SV_DATA.tar.gz
# decompress data
tar zxvf SV_DATA.tar.gz
rm SV_DATA.tar.gz

--2022-06-20 14:45:22--  https://itrop.ird.fr/sv-training/SV_DATA.tar.gz
Resolving itrop.ird.fr (itrop.ird.fr)... 91.203.35.184
Connecting to itrop.ird.fr (itrop.ird.fr)|91.203.35.184|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4134064863 (3.8G) [application/x-gzip]
Saving to: ‘SV_DATA.tar.gz’


2022-06-20 14:46:38 (52.0 MB/s) - ‘SV_DATA.tar.gz’ saved [4134064863/4134064863]

FINISHED --2022-06-20 14:46:38--
Total wall clock time: 1m 16s
Downloaded: 1 files, 3.8G in 1m 16s (52.0 MB/s)
SV_DATA/
SV_DATA/REF/
SV_DATA/REF/reference.fasta
SV_DATA/LONG_READS/
SV_DATA/LONG_READS/Clone10.fastq.gz
SV_DATA/LONG_READS/Clone11.fastq.gz
SV_DATA/LONG_READS/Clone12.fastq.gz
SV_DATA/LONG_READS/Clone13.fastq.gz
SV_DATA/LONG_READS/Clone14.fastq.gz
SV_DATA/LONG_READS/Clone15.fastq.gz
SV_DATA/LONG_READS/Clone16.fastq.gz
SV_DATA/LONG_READS/Clone17.fastq.gz
SV_DATA/LONG_READS/Clone18.fastq.gz
SV_DATA/LONG_READS/Clone19.fastq.gz
SV_DATA/LONG_READS/Clone1.fastq.gz
SV_DATA/LONG_RE

### <span style="color: #4CACBC;"> List the content of the directory work and check that the directory SV_DATA have been created</span>  


In [None]:
# check data 
ls -l

### <span style="color: #4CACBC;"> List the content of the directory SV_DATA</span>  

In [None]:
ls -lRt SV_DATA

# <span style="color:#006E7F">__II -  MAPPING PRACTICE__ <a class="anchor" id="mapping"></span>  

Read congruency is an important measure in determining assembly accuracy.

Clusters of read pairs or single long reads that align incorrectly are strong indicators of mis-assembly.

Reads mapping is usually the firt step before SNP or Variant calling.

### <span style="color: #4CACBC;"> Make a folder for your results</span>  

In [22]:
mkdir -p ~/work/MAPPING-ILL
cd ~/work/MAPPING-ILL

### <span style="color: #4CACBC;"> Declare important variables</span>  

We are going to set up bash variables with the path to our data.We set a bash variable like this : `var="value"`
and call it as: `echo $var`


In [23]:
# REFERENCE 
REF_DIR="/home/jovyan/work/SV_DATA/REF/"
REF="/home/jovyan/work/SV_DATA/REF/reference.fasta"

# ONT DATA
ONT="/home/jovyan/work/SV_DATA/LONG_READS/Clone${i}.fastq.gz"

# ILLUMINA DATA
ILL_R1="/home/jovyan/work/SV_DATA/SHORT_READS/Clone${i}_R1.fastq.gz"
ILL_R2="/home/jovyan/work/SV_DATA/SHORT_READS/Clone${i}_R2.fastq.gz"

#CLONE NUMBER THAT YOU ARE GOING TO ANALYZE 
i=10 

##### Print the variable i, REF, ILL_R1 & ILL_R2

In [24]:
echo "Clone${i} $REF" 
echo $ILL_R1 $ILL_R2

Clone10 /home/jovyan/work/SV_DATA/REF/reference.fasta
/home/jovyan/work/SV_DATA/SHORT_READS/Clone10_R1.fastq.gz /home/jovyan/work/SV_DATA/SHORT_READS/Clone10_R2.fastq.gz


-------------
# <span style="color: #4CACBC;"> 2.1. Mapping short reads vs a reference with  `bwa mem` <a class="anchor" id="bwamem"></span>  

In this practice, we are going to map short reads against a reference. To know, how well do the reads align back to the reference, we use bwa-mem2 and samtools to assess the basic alignment statistics.

In this exercise, we will use reference.fasta assembly as well ILLUMINA READS from your favorite CLONE.

The tool bwa needs 2 steps: 
- **Reference indexing**: `bwa index reference`
- **Mapping in itself**: `bwa mem  -R READGROUP [options] reference fastq1 fastq2 > out.sam`

## <span style="color: #4CACBC;"> Reference indexation  <a class="anchor" id="refindex"></span>  

Before mapping we need index reference file! Check bwa-mem2 index command line.

In [12]:
cd $REF_DIR

In [13]:
echo -e "\nIndexing reference $REF\n"
bwa-mem2 index $REF


Indexing reference /home/jovyan/work/SV_DATA/REF/reference.fasta

[bwa_index] Pack FASTA... 0.01 sec
init ticks = 162934945
ref seq len = 2040002
binary seq ticks = 61204851
build index ticks = 604754568
ref_seq_len = 2040002
count = 0, 576483, 1020001, 1463519, 2040002
BWT[1932441] = 4
CP_SHIFT = 5, CP_MASK = 31
sizeof CP_OCC = 64
max_occ_ind = 63750
ref_seq_len = 2040002
count = 0, 576483, 1020001, 1463519, 2040002
BWT[1932441] = 4
CP_SHIFT = 6, CP_MASK = 63
sizeof CP_OCC = 64
max_occ_ind = 31875


### <span style="color: #4CACBC;">Check that the indexes have been created </span>  

In [14]:
ls

reference.fasta       reference.fasta.ann          reference.fasta.pac
reference.fasta.0123  reference.fasta.bwt.2bit.64
reference.fasta.amb   reference.fasta.bwt.8bit.32


## <span style="color: #4CACBC;"> => Let's map now but only WITH READS FROM ONLY ONE CLONE </span>  

* Go into the directory MAPPING-ILL
* Create a subdirectory to save the files generated by the mapping step. 
Eg: If you are going to analyze the `clone1`, create the subdirectory `dirClone1`. 

In [25]:
cd ~/work/MAPPING-ILL
echo -e "\n>>>>>>>>>> Creation directory for Clone$i\n"
mkdir -p dirClone$i
cd dirClone$i


>>>>>>>>>> Creation directory for Clone10



## <span style="color: #4CACBC;"> Run the mapping with `bwa mem` <a class="anchor" id="bwamem2-cmd"></span>  

In [26]:
echo -e "\n>>>>>>>>>> Mapping Clone$i\n"
bwa-mem2 mem -M -t 8 $REF $ILL_R1 $ILL_R2 > Clone$i.sam


>>>>>>>>>> Mapping Clone10

-----------------------------
Executing in AVX2 mode!!
-----------------------------
Ref file: /home/jovyan/work/SV_DATA/REF/reference.fasta
Entering FMI_search
reference seq len = 2040003
count
0,	1
1,	576484
2,	1020002
3,	1463520
4,	2040003

Reading other elements of the index from files /home/jovyan/work/SV_DATA/REF/reference.fasta
prefix: /home/jovyan/work/SV_DATA/REF/reference.fasta
[M::bwa_idx_load_ele] read 0 ALT contigs
Done reading Index!!
Reading reference genome..
Binary seq file = /home/jovyan/work/SV_DATA/REF/reference.fasta.0123
Reference genome size: 2040002 bp
Done readng reference genome !!

[0000] 1: Calling process()

Threads used (compute): 8
Info: projected #read in a task: 529811
------------------------------------------
Memory pre-allocation for chaining: 1114.7223 MB
Memory pre-allocation for BSW: 1916.9362 MB
Memory pre-allocation for BWT: 618.5134 MB
------------------------------------------
No. of pipeline threads: 2
[0000] read

### <span style="color: #4CACBC;">Check that the file `.sam` have been created by `bwa mem` </span>  


In [28]:
ls -l

total 118940
-rw-r--r-- 1 jovyan users 121793468 Jun 20 14:56 Clone10.sam


### <span style="color: #4CACBC;">Display the first and the end of the sam file just created </span>  

In [29]:
head Clone10.sam

@SQ	SN:Reference	LN:1020001
@PG	ID:bwa	PN:bwa	VN:2.0pre2	CL:bwa-mem2 mem -M -t 8 /home/jovyan/work/SV_DATA/REF/reference.fasta /home/jovyan/work/SV_DATA/SHORT_READS/Clone10_R1.fastq.gz /home/jovyan/work/SV_DATA/SHORT_READS/Clone10_R2.fastq.gz
Reference-Clone10295760	77	*	0	0	*	*	0	0	GTATAAGTACCCGGTCGAATCAAAGGTAACGTTAAATAGGTACTCCGCCAGGGCAGATTTCAACAGCCAAACTGCCCCCCAGGGGTATCTTACAGGCAATGGCTTAGAAGCGTTCCTAAGTGGACGACTCTCTGGAAACTCGCCAATGAG	CC=G=GGGGGGG=IIIIGGIIGICGIIIICIGIIIIIGGCIIGIGIG=CIGIIIICIGICGGGGGCCCGGCGCGCGGGGG8CGG8CGGGGGCCCGGGGGGIGCGGGG=GCGGCCGCGG55GGCGCG8GGGGCGG=CCGCGG5GGCGCGC=	AS:i:0	XS:i:0
Reference-Clone10295760	141	*	0	0	*	*	0	0	GCACCCAAGGTGATCAACCCGGCGCTGCATGAGTATGCAACATGTTCGGCAGATGCCGTCAGTTTGGCATGCGTAATTCAATGTCGCAAGGAGGATATCCCGCTGGGATTACATTCGCGTATAGTTTATGGGCCTTCATTCGTTTTTACG	CC=GGGGGGGGGGIGIIIIIICIGICI5IGCGIGGGICIGIIIICCIIG=IICGC=G==GGCGGGIGI=CCGGCGGCIGG=CGCG5CG=CGCGGG5GCGGGCCCII=GGGGGCG==GGGGGGGCCGCGGCCCGGGGGGGCCCCGGCGCGC	AS:i:0	XS:i:0
Reference-Clone10295758	99	Reference	12451

In [30]:
tail Clone10.sam

Reference-Clone1010	353	Reference	571519	60	83H67M	=	571653	284	ATTACCTAATGCATACATAGTTCTACAAACATCTTAGTTCAGATCAGATGCATCATCACATTGTTAC	GGCCGGGGGCCGGGG=GIC5CGC5GGGG5GCGGG8GGGGGGGGGG5GCGGG5GG5GC5CCGG=GGCC	NM:i:0	MD:Z:67	AS:i:67	XS:i:0	SA:Z:Reference,569030,+,82M68S,60,3;
Reference-Clone1010	145	Reference	571653	60	150M	=	569030	-2773	TCAGAAGCAGATCAACAACTGGTTCATCAACCAGAGGAAACGGCACTGGAAGCCATCGGAGGACATGCCGTTCGTCATGATGGAAGGTTTTCACCCACAGAATGCTGCTGCATTGTACATGGATGGCCCGTTCATGGCAGATGGAATGTA	CGGCCGCG8CGCGGCGG8CCCCGGGCGCGGCGG=GGGCGGCGG=ICI=GGGGGCGGCCC8G==GCG=GGGGCCGGGGGCCGG5CICCCG=IIGGIGICCIGCIIG=GCGIGGIIIIIGGIGICGIIGGI8IIIIIGIGGGGGGG=GGCCC	NM:i:0	MD:Z:150	AS:i:150	XS:i:0
Reference-Clone108	99	Reference	263937	60	150M	=	264214	427	TTTAGTTGATGAACACAAATAATAATTGATTAAAGGGAACTTTCCATTCGGTCGTTTCCTGTCTCCTTCTTTGGGTACTACTATCATTTTCTTTTTCTGAAATTCCTTTTGCTGTATATCATTTCAGCATGCAATACTTAATCTGACAAA	CCCGGGGGGCGGGIIGCICIGIIGCGCCIIGIIIGI5IGIIIGIGI=GGCGGGIIGIIGIGIGIGIGGGGG5CGGGCIGG=IGGGGCGGGGGGCGC5=GGIGCGGCCGC=CG8C5GGCCGGGCG

## <span style="color: #4CACBC;"> Convert sam into bam `samtools view` <a class="anchor" id="samtoolsview"></span>  


In [31]:
samtools view -@4 -bh -S -o  Clone$i.bam Clone$i.sam 

[samopen] SAM header is present: 1 sequences.


#### Check that the bam file have been created 

* Have a look at the filesize of the sam and bam files.
* Remove the sam file 

In [32]:
ls -lh
rm Clone$i.sam

total 155M
-rw-r--r-- 1 jovyan users  39M Jun 20 14:56 Clone10.bam
-rw-r--r-- 1 jovyan users 117M Jun 20 14:56 Clone10.sam


## <span style="color: #4CACBC;"> Calculate stats from mapping `samtools flagstat`<a class="anchor" id="flagstats"></span>   

In [33]:
samtools flagstat Clone$i.bam >Clone$i.flagstat

### <span style="color: #4CACBC;"> Display the content of the flagstat file</span>  


In [34]:
cat Clone10.flagstat

296107 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
221301 + 0 mapped (74.74%:-nan%)
296107 + 0 paired in sequencing
148037 + 0 read1
148070 + 0 read2
218229 + 0 properly paired (73.70%:-nan%)
219681 + 0 with itself and mate mapped
1620 + 0 singletons (0.55%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)


## <span style="color: #4CACBC;"> Generate a bam file that contains only the reads correctly paired mapped `samtools view`<a class="anchor" id="corrmap"></span>   

https://broadinstitute.github.io/picard/explain-flags.html

In [35]:
samtools view -bh -@4 -f 0x02 -o Clone$i.mappedpaired.bam Clone$i.bam 

## <span style="color: #4CACBC;"> Sorting final bam </span>  

* Generate the bam file ordered
* Check that the new bam file have been created
* Remove the bam file previously created (Clone$i.mappedpaired.bam)

In [36]:
samtools sort -@4 Clone$i.mappedpaired.bam Clone$i.SORTED 
rm Clone$i.mappedpaired.bam

## <span style="color: #4CACBC;"> Indexing bam file<a class="anchor" id="indexbam"></span>   

In [37]:
samtools index Clone$i.SORTED.bam

In [38]:
ls -lrt

total 56880
-rw-r--r-- 1 jovyan users 40018974 Jun 20 14:56 Clone10.bam
-rw-r--r-- 1 jovyan users      381 Jun 20 14:56 Clone10.flagstat
-rw-r--r-- 1 jovyan users 18213243 Jun 20 14:56 Clone10.SORTED.bam
-rw-r--r-- 1 jovyan users     2856 Jun 20 14:56 Clone10.SORTED.bam.bai


## <span style="color: #4CACBC;"> => Let's map with data from all clones using a loop for mapping, with a single folder per sample<a class="anchor" id="mapallminimap"></span>   

In [None]:
for i in {1..20}
    do
        cd ~/work/MAPPING-ILL
        echo -e "\n\n>>>>>>>>>> Creation directory for Clone$i"
        mkdir -p dirClone$i
        cd dirClone$i
        
        echo -e "\n>>>> Declare variables$i"
        REF="/home/jovyan/work/SV_DATA/REF/reference.fasta"
        ILL_R1="/home/jovyan/work/SV_DATA/SHORT_READS/Clone${i}_R1.fastq.gz"
        ILL_R2="/home/jovyan/work/SV_DATA/SHORT_READS/Clone${i}_R2.fastq.gz"

        echo -e "\n>>>> Mapping Clone$i\n"
        bwa-mem2 mem -M -t 8 $REF $ILL_R1 $ILL_R2 > Clone$i.sam
        
        echo -e "\n>>>> convert sam to bam for Clone$i"
        samtools view -@4 -bh -S -o  Clone$i.bam Clone$i.sam 
        rm Clone$i.sam
        echo -e "\n>>>> Flagstats from all reads $i"
        samtools flagstat Clone$i.bam >Clone$i.flagstat
        
        echo -e "\n>>>> Extract only correctly mapped and calculate flagstats $i"
        samtools view -bh -@4 -f 0x02 -o Clone$i.mappedpaired.bam Clone$i.bam 
        
        echo -e "\n>>>> Sort mappedpaired bam file $i"
        samtools sort -@4 Clone$i.mappedpaired.bam Clone$i.SORTED 
        rm Clone$i.mappedpaired.bam
    done

# <span style="color: #4CACBC;"> 2.2 Mapping Long reads vs a Reference `minimap2` <a class="minimap2" id=""></span> 


Similar process such as SR is done in LR. In this case mapper is minimap2.

In [41]:
# Declare variables
i=10
REF_DIR="/home/jovyan/work/SV_DATA/REF/"
REF="/home/jovyan/work/SV_DATA/REF/reference.fasta"
ONT="/home/jovyan/work/SV_DATA/LONG_READS/Clone${i}.fastq.gz"

## <span style="color: #4CACBC;"> Mapping with `minimap2`</span> 

## <span style="color: #4CACBC;"> => Let's map now but only WITH READS FROM ONLY ONE CLONE</span>  

In [42]:
mkdir -p ~/work/MAPPING-ONT
cd ~/work/MAPPING-ONT
echo -e "\nCreation directory for Clone$i\n"
echo Clone$i
mkdir -p dirClone$i
cd dirClone$i


Creation directory for Clone10

Clone10


In [44]:
echo -e "\nMapping Clone$i minimap2 \n"
minimap2 -ax map-ont -t 12 ${REF} ${ONT} > Clone${i}_ONT.sam 


Mapping Clone10 minimap2 

[M::mm_idx_gen::0.084*2.42] collected minimizers
[M::mm_idx_gen::0.113*4.01] sorted minimizers
[M::main::0.113*4.01] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.121*3.80] mid_occ = 10
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.126*3.69] distinct minimizers: 165344 (91.75% are singletons); average occurrences: 1.156; average spacing: 5.336
[M::worker_pipeline::9.120*7.21] mapped 11235 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -ax map-ont -t 12 /home/jovyan/work/SV_DATA/REF/reference.fasta /home/jovyan/work/SV_DATA/LONG_READS/Clone10.fastq.gz
[M::main] Real time: 9.135 sec; CPU: 65.813 sec; Peak RSS: 0.583 GB


In [45]:
head -n3 Clone${i}_ONT.sam

@SQ	SN:Reference	LN:1020001
@PG	ID:minimap2	PN:minimap2	VN:2.17-r941	CL:minimap2 -ax map-ont -t 12 /home/jovyan/work/SV_DATA/REF/reference.fasta /home/jovyan/work/SV_DATA/LONG_READS/Clone10.fastq.gz
86f59255-6632-405c-a329-62d9dba8a95f	0	Reference	495672	60	3157S38M1D8M1D49M1I3M1D8M1I37M2I5M2D52M1D10M2I8M1D11M1I18M1I17M1I62M1D56M1D27M2D32M3D16M1D3M3I41M1I5M1D3M4I6M1I3M1I2M1I6M1D164M4I13M1I29M1D2M1I22M1I3M2I6M2D14M1D15M1D47M1I61M1D32M3D13M2I17M1I60M1D22M1I62M1D17M2I15M1D8M1I2M1I21M1D22M1D2M1D37M1D5M2I3M1D5M3D3M2D27M1I8M1D17M1I8M1I4M1I5M1I13M2I42M1I2M1I24M1I6M1I1M1I39M1D38M3D11M1I9M2D12M1I1M1I44M1I36M1D11M1D48M2I4M2I5M2I103M1I2M2D20M2D45M1I15M1I37M1I10M1I5M1I42M2I23M1D17M7S	*	0	0	TATTTTCAAATACTAAATGATTTCAACTGAAAACGTCATCAATAACTCAAAGTTGTATTAATCATCAAGATCTATAACTTTCATTTTGGTCAGTTCTTCATCGGACAAAGTAATTTGTAAGATTGTTCCACAAGATGTACTTATCTTTTATATAGTTAATAAAAACTATAAGAGTGGTTACATTTTGTGAACAGTCTTATTAATAACTTTGTCGGATGAAGAAATGTCTAAATGGGTTATAGATCTTGCTGAGTTATACAACTACGTTCTTGATGACTTTTCAGCCGAAATCAAATACTACTGCAAAATATTG

## <span style="color: #4CACBC;"> Convert sam to bam</span>  

In [46]:
echo -e "\nConvert samtobam and filter it \n"
samtools view -@8 -bh -S -F 0x904 -o Clone${i}_ONT.bam Clone${i}_ONT.sam
rm Clone${i}_ONT.sam


Convert samtobam and filter it 

[samopen] SAM header is present: 1 sequences.


## <span style="color: #4CACBC;"> Sort and index bam</span>  

In [47]:
echo -e "\nSort and index bam \n"
samtools sort -@8 Clone${i}_ONT.bam Clone${i}_ONT_SORTED 
samtools index Clone${i}_ONT_SORTED.bam


Sort and index bam 



## <span style="color: #4CACBC;"> Calculate stats from mapping</span>  

In [48]:
echo -e "\nCalculate stats from mapping\n"
samtools flagstat Clone${i}_ONT_SORTED.bam >Clone${i}_ONT.flagstats


Calculate stats from mapping



## <span style="color: #4CACBC;"> Display the content of the flagstat file

In [49]:
head Clone10_ONT.flagstats

9281 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
9281 + 0 mapped (100.00%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr


## <span style="color: #4CACBC;"> => Let's map with data from all clones using a loop for mapping, with a single folder per sample and ONT reads<a class="anchor" id="mapallminimap"></span> 

In [None]:
for i in {1..20}
    do
        ONT="/home/jovyan/work/SV_DATA/LONG_READS/Clone${i}.fastq.gz"
        mkdir -p ~/work/MAPPING-ONT
        cd ~/work/MAPPING-ONT
        echo -e "\n>>>>>>>>>> Creation directory for Clone$i\n"
        mkdir -p dirClone$i
        cd dirClone$i
        
        echo -e ">>>> Mapping Clone$i minimap2\n"
        minimap2 -ax map-ont -t 12 ${REF} ${ONT} > Clone${i}_ONT.sam 
        
        # Convert samtobam 
        echo -e ">>> Convert samtobam and filter it \n"
        samtools view -@8 -bh -S -F 0x904 -o Clone${i}_ONT.bam Clone${i}_ONT.sam
        rm Clone${i}_ONT.sam

        echo -e ">>>> Sort and index bam \n"
        # sort and index bam
        samtools sort -@8 Clone${i}_ONT.bam Clone${i}_ONT_SORTED 
        samtools index Clone${i}_ONT_SORTED.bam

        # Calculate stats from mapping
        echo -e ">>>> Calculate stats from mapping\n"
        samtools flagstat Clone${i}_ONT_SORTED.bam >Clone${i}_ONT.flagstats
    done

In [54]:
ls

Clone20_ONT.bam        Clone20_ONT_SORTED.bam
Clone20_ONT.flagstats  Clone20_ONT_SORTED.bam.bai


## <span style="color:#006E7F">__III -  Centralize final mapping data into a single bam directory__ <a class="anchor" id="reorder"></span>  

### <span style="color: #4CACBC;"> Reorder BAM files into a folder only for Illumina</span>  


In [56]:
mkdir -p ~/work/MAPPING-ILL/BAM
cd ~/work/MAPPING-ILL/

for i in {1..20}
    do
         ln -s ~/work/MAPPING-ILL/dirClone$i/Clone$i.SORTED.bam BAM/
    done

In [57]:
ls /home/jovyan/work/MAPPING-ILL/BAM -l

total 0
lrwxrwxrwx 1 jovyan users 59 Jun 20 16:37 Clone10.SORTED.bam -> /home/jovyan/work/MAPPING-ILL/dirClone10/Clone10.SORTED.bam
lrwxrwxrwx 1 jovyan users 59 Jun 20 16:37 Clone11.SORTED.bam -> /home/jovyan/work/MAPPING-ILL/dirClone11/Clone11.SORTED.bam
lrwxrwxrwx 1 jovyan users 59 Jun 20 16:37 Clone12.SORTED.bam -> /home/jovyan/work/MAPPING-ILL/dirClone12/Clone12.SORTED.bam
lrwxrwxrwx 1 jovyan users 59 Jun 20 16:37 Clone13.SORTED.bam -> /home/jovyan/work/MAPPING-ILL/dirClone13/Clone13.SORTED.bam
lrwxrwxrwx 1 jovyan users 59 Jun 20 16:37 Clone14.SORTED.bam -> /home/jovyan/work/MAPPING-ILL/dirClone14/Clone14.SORTED.bam
lrwxrwxrwx 1 jovyan users 59 Jun 20 16:37 Clone15.SORTED.bam -> /home/jovyan/work/MAPPING-ILL/dirClone15/Clone15.SORTED.bam
lrwxrwxrwx 1 jovyan users 59 Jun 20 16:37 Clone16.SORTED.bam -> /home/jovyan/work/MAPPING-ILL/dirClone16/Clone16.SORTED.bam
lrwxrwxrwx 1 jovyan users 59 Jun 20 16:37 Clone17.SORTED.bam -> /home/jovyan/work/MAPPING-ILL/dirClone17/Clone17.SORTED.bam


### <span style="color: #4CACBC;"> Reorder BAM files into a folder only for ONT</span>  

In [58]:
mkdir -p ~/work/MAPPING-ONT/BAM
cd ~/work/MAPPING-ONT/

for i in {1..20}
    do
         ln -s ~/work/MAPPING-ONT/dirClone$i/Clone${i}_ONT_SORTED.bam BAM/
    done

In [59]:
ls ~/work/MAPPING-ONT/BAM -l

total 80
lrwxrwxrwx 1 jovyan users 63 Jun 20 16:37 Clone10_ONT_SORTED.bam -> /home/jovyan/work/MAPPING-ONT/dirClone10/Clone10_ONT_SORTED.bam
lrwxrwxrwx 1 jovyan users 63 Jun 20 16:37 Clone11_ONT_SORTED.bam -> /home/jovyan/work/MAPPING-ONT/dirClone11/Clone11_ONT_SORTED.bam
lrwxrwxrwx 1 jovyan users 63 Jun 20 16:37 Clone12_ONT_SORTED.bam -> /home/jovyan/work/MAPPING-ONT/dirClone12/Clone12_ONT_SORTED.bam
lrwxrwxrwx 1 jovyan users 63 Jun 20 16:37 Clone13_ONT_SORTED.bam -> /home/jovyan/work/MAPPING-ONT/dirClone13/Clone13_ONT_SORTED.bam
lrwxrwxrwx 1 jovyan users 63 Jun 20 16:37 Clone14_ONT_SORTED.bam -> /home/jovyan/work/MAPPING-ONT/dirClone14/Clone14_ONT_SORTED.bam
lrwxrwxrwx 1 jovyan users 63 Jun 20 16:37 Clone15_ONT_SORTED.bam -> /home/jovyan/work/MAPPING-ONT/dirClone15/Clone15_ONT_SORTED.bam
lrwxrwxrwx 1 jovyan users 63 Jun 20 16:37 Clone16_ONT_SORTED.bam -> /home/jovyan/work/MAPPING-ONT/dirClone16/Clone16_ONT_SORTED.bam
lrwxrwxrwx 1 jovyan users 63 Jun 20 16:37 Clone17_ONT_SORTED.bam ->