# <span style="color:green">Formation South Green 2022</span> - Structural Variants Detection by using short and long reads 

# __DAY 1 : How to map reads against a reference genome ?__ 

Created by C. Tranchant (DIADE-IRD), J. Orjuela (DIADE-IRD), F. Sabot (DIADE-IRD) and A. Dereeper (PHIM-IRD)

## __1. Preparing the working environment__ 

### First create a dedicated folder to work 


In [50]:
# go to work directory and download data
cd /home/jovyan/work/
ls

MyFirsrtJupyterBook.ipynb  SV_DATA  training_SV_teaching


## Download sequencing data (SR & LR) for Simulated clones

Before starting, please download special data created for this practical training. Data are available on the from I-Trop server.

Each participant will analyse a Clone, results will be complete in this shared file

To generate Clone data, a 1Mb contig was extracted from chromosome 1 of rice.

20 levels of variation were generated and long reads were simulated for each.

We have introduced different variations (SNP, indel, indel+translocations) and also some contaminations.

In [None]:
# download available compressed DATA 
wget --no-check-certificat -rm -nH --cut-dirs=1 --reject="index.html*" https://itrop.ird.fr/sv-training/SV_DATA.tar.gz
# decompress data
tar zxvf SV_DATA.tar.gz
rm SV_DATA.tar.gz

### List the content of the directory work and check that the directory SV_DATA have been created

In [52]:
# check data 
ls -l

total 12
-rw-r--r-- 1 jovyan users 2017 Jun 19 17:34 MyFirsrtJupyterBook.ipynb
drwxr-s--- 5 jovyan users 4096 Jun 14 05:44 SV_DATA
drwxr-sr-x 4 jovyan users 4096 Jun 19 19:06 training_SV_teaching


### List the content of the directory SV_DATA

In [8]:
ls -lRt SV_DATA

SV_DATA:
total 12
drwxr-s--- 2 jovyan users 4096 Jun 14 05:41 REF
drwxr-s--- 2 jovyan users 4096 Jun 14 05:38 LONG_READS
drwxr-s--- 2 jovyan users 4096 Jun 14 05:37 SHORT_READS

SV_DATA/REF:
total 1000
-rw-r----- 1 jovyan users 1020013 Jun 14 05:40 reference.fasta

SV_DATA/LONG_READS:
total 3434928
-rw-r----- 1 jovyan users 166242371 Sep 13  2021 Clone11.fastq.gz
-rw-r----- 1 jovyan users 152904101 Sep 13  2021 Clone1.fastq.gz
-rw-r----- 1 jovyan users 167744275 Sep 13  2021 Clone10.fastq.gz
-rw-r----- 1 jovyan users 169317131 Sep 13  2021 Clone5.fastq.gz
-rw-r----- 1 jovyan users 153330738 Sep 13  2021 Clone4.fastq.gz
-rw-r----- 1 jovyan users 168418218 Sep 13  2021 Clone18.fastq.gz
-rw-r----- 1 jovyan users 175381821 Sep 13  2021 Clone6.fastq.gz
-rw-r----- 1 jovyan users 177768476 Sep 13  2021 Clone15.fastq.gz
-rw-r----- 1 jovyan users 167532884 Sep 13  2021 Clone14.fastq.gz
-rw-r----- 1 jovyan users 189926682 Sep 13  2021 Clone16.fastq.gz
-rw-r----- 1 jovyan users 161560783 Sep 13  

-----------------------
# 2. MAPPING PRACTICE

Read congruency is an important measure in determining assembly accuracy.

Clusters of read pairs or single long reads that align incorrectly are strong indicators of mis-assembly.

Reads mapping is usually the firt step before SNP or Variant calling.

### 2.1 Make a folder for your results

In [54]:
mkdir -p ~/work/MAPPING-ILL
cd ~/work/MAPPING-ILL

### 2.2 Declare important variables

We are going to set up bash variables with the path to our data.We set a bash variable like this : `var="value"`
and call it as: `echo $var`


In [69]:
# REFERENCE 
REF_DIR="/home/jovyan/work/SV_DATA/REF/"
REF="/home/jovyan/work/SV_DATA/REF/reference.fasta"

# ONT DATA
ONT="/home/jovyan/work/SV_DATA/LONG_READS/Clone${i}.fastq.gz"

# ILLUMINA DATA
ILL_R1="/home/jovyan/work/SV_DATA/SHORT_READS/Clone${i}_R1.fastq.gz"
ILL_R2="/home/jovyan/work/SV_DATA/SHORT_READS/Clone${i}_R2.fastq.gz"

#CLONE NUMBER THAT YOU ARE GOING TO ANALYZE 
i=10 

##### Print the variable i, REF, ILL_R1 & ILL_R2

In [56]:
echo "Clone${i} $REF" 
echo $ILL_R1 $ILL_R2

Clone10 /home/jovyan/work/SV_DATA/REF/reference.fasta
/home/jovyan/work/SV_DATA/SHORT_READS/Clone5_R1.fastq.gz /home/jovyan/work/SV_DATA/SHORT_READS/Clone5_R2.fastq.gz


## 2.1 Mapping short reads vs a reference with `bwa mem`

In this practice, we are going to map short reads against a reference. To know, how well do the reads align back to the reference, we use bwa-mem2 and samtools to assess the basic alignment statistics.

In this exercise, we will use reference.fasta assembly as well ILLUMINA READS from your favorite CLONE.

The tool bwa needs 2 steps: 
- **Reference indexing**: `bwa index reference`
- **Mapping in itself**: `bwa mem  -R READGROUP [options] reference fastq1 fastq2 > out.sam`

## Reference indexation 

Before mapping we need index reference file! Check bwa-mem2 index command line.

In [57]:
cd $REF_DIR

In [58]:
echo -e "\nIndexing reference $REF\n"
bwa-mem2 index $REF


Indexing reference /home/jovyan/work/SV_DATA/REF/reference.fasta

[bwa_index] Pack FASTA... 0.01 sec
init ticks = 169114249
ref seq len = 2040002
binary seq ticks = 60890156
build index ticks = 592072544
ref_seq_len = 2040002
count = 0, 576483, 1020001, 1463519, 2040002
BWT[1932441] = 4
CP_SHIFT = 5, CP_MASK = 31
sizeof CP_OCC = 64
max_occ_ind = 63750
ref_seq_len = 2040002
count = 0, 576483, 1020001, 1463519, 2040002
BWT[1932441] = 4
CP_SHIFT = 6, CP_MASK = 63
sizeof CP_OCC = 64
max_occ_ind = 31875


#### Check that the indexes have been created

In [16]:
ls

reference.fasta       reference.fasta.ann          reference.fasta.pac
reference.fasta.0123  reference.fasta.bwt.2bit.64
reference.fasta.amb   reference.fasta.bwt.8bit.32


##  Let's map now but only WITH READS FROM ONLY ONE CLONE

* Go into the directory MAPPING-ILL
* Create a subdirectory to save the files generated by the mapping step. 
Eg: If you are going to analyze the `clone1`, create the subdirectory `dirClone1`. 

In [68]:
cd ~/work/MAPPING-ILL
echo -e "\n>>>>>>>>>> Creation directory for Clone$i\n"
mkdir -p dirClone$i
cd dirClone$i


>>>>>>>>>> Creation directory for Clone10

/home/jovyan/work/MAPPING-ILL/dirClone10


#### Run the mapping with `bwa mem`

In [70]:
echo -e "\n>>>>>>>>>> Mapping Clone$i\n"
bwa-mem2 mem -M -t 4 $REF $ILL_R1 $ILL_R2 > Clone$i.sam


>>>>>>>>>> Mapping Clone10

-----------------------------
Executing in AVX2 mode!!
-----------------------------
Ref file: /home/jovyan/work/SV_DATA/REF/reference.fasta
Entering FMI_search
reference seq len = 2040003
count
0,	1
1,	576484
2,	1020002
3,	1463520
4,	2040003

Reading other elements of the index from files /home/jovyan/work/SV_DATA/REF/reference.fasta
prefix: /home/jovyan/work/SV_DATA/REF/reference.fasta
[M::bwa_idx_load_ele] read 0 ALT contigs
Done reading Index!!
Reading reference genome..
Binary seq file = /home/jovyan/work/SV_DATA/REF/reference.fasta.0123
Reference genome size: 2040002 bp
Done readng reference genome !!

[0000] 1: Calling process()

Threads used (compute): 4
Info: projected #read in a task: 264910
------------------------------------------
Memory pre-allocation for chaining: 557.3706 MB
Memory pre-allocation for BSW: 958.4681 MB
Memory pre-allocation for BWT: 309.2567 MB
------------------------------------------
No. of pipeline threads: 2
[0000] read_c

#### Check that the file `.sam` have been created by `bwa mem`

In [72]:
ls

Clone10.sam


In [None]:
#### Display the first and the end of the sam file just created

In [71]:
head Clone10.sam

@SQ	SN:Reference	LN:1020001
@PG	ID:bwa	PN:bwa	VN:2.0pre2	CL:bwa-mem2 mem -M -t 4 /home/jovyan/work/SV_DATA/REF/reference.fasta /home/jovyan/work/SV_DATA/SHORT_READS/Clone10_R1.fastq.gz /home/jovyan/work/SV_DATA/SHORT_READS/Clone10_R2.fastq.gz
Reference-Clone10295760	77	*	0	0	*	*	0	0	GTATAAGTACCCGGTCGAATCAAAGGTAACGTTAAATAGGTACTCCGCCAGGGCAGATTTCAACAGCCAAACTGCCCCCCAGGGGTATCTTACAGGCAATGGCTTAGAAGCGTTCCTAAGTGGACGACTCTCTGGAAACTCGCCAATGAG	CC=G=GGGGGGG=IIIIGGIIGICGIIIICIGIIIIIGGCIIGIGIG=CIGIIIICIGICGGGGGCCCGGCGCGCGGGGG8CGG8CGGGGGCCCGGGGGGIGCGGGG=GCGGCCGCGG55GGCGCG8GGGGCGG=CCGCGG5GGCGCGC=	AS:i:0	XS:i:0
Reference-Clone10295760	141	*	0	0	*	*	0	0	GCACCCAAGGTGATCAACCCGGCGCTGCATGAGTATGCAACATGTTCGGCAGATGCCGTCAGTTTGGCATGCGTAATTCAATGTCGCAAGGAGGATATCCCGCTGGGATTACATTCGCGTATAGTTTATGGGCCTTCATTCGTTTTTACG	CC=GGGGGGGGGGIGIIIIIICIGICI5IGCGIGGGICIGIIIICCIIG=IICGC=G==GGCGGGIGI=CCGGCGGCIGG=CGCG5CG=CGCGGG5GCGGGCCCII=GGGGGCG==GGGGGGGCCGCGGCCCGGGGGGGCCCCGGCGCGC	AS:i:0	XS:i:0
Reference-Clone10295758	99	Reference	12451

In [73]:
tail Clone10.sam

Reference-Clone1010	353	Reference	571519	60	83H67M	=	571653	284	ATTACCTAATGCATACATAGTTCTACAAACATCTTAGTTCAGATCAGATGCATCATCACATTGTTAC	GGCCGGGGGCCGGGG=GIC5CGC5GGGG5GCGGG8GGGGGGGGGG5GCGGG5GG5GC5CCGG=GGCC	NM:i:0	MD:Z:67	AS:i:67	XS:i:0	SA:Z:Reference,569030,+,82M68S,60,3;
Reference-Clone1010	145	Reference	571653	60	150M	=	569030	-2773	TCAGAAGCAGATCAACAACTGGTTCATCAACCAGAGGAAACGGCACTGGAAGCCATCGGAGGACATGCCGTTCGTCATGATGGAAGGTTTTCACCCACAGAATGCTGCTGCATTGTACATGGATGGCCCGTTCATGGCAGATGGAATGTA	CGGCCGCG8CGCGGCGG8CCCCGGGCGCGGCGG=GGGCGGCGG=ICI=GGGGGCGGCCC8G==GCG=GGGGCCGGGGGCCGG5CICCCG=IIGGIGICCIGCIIG=GCGIGGIIIIIGGIGICGIIGGI8IIIIIGIGGGGGGG=GGCCC	NM:i:0	MD:Z:150	AS:i:150	XS:i:0
Reference-Clone108	99	Reference	263937	60	150M	=	264214	427	TTTAGTTGATGAACACAAATAATAATTGATTAAAGGGAACTTTCCATTCGGTCGTTTCCTGTCTCCTTCTTTGGGTACTACTATCATTTTCTTTTTCTGAAATTCCTTTTGCTGTATATCATTTCAGCATGCAATACTTAATCTGACAAA	CCCGGGGGGCGGGIIGCICIGIIGCGCCIIGIIIGI5IGIIIGIGI=GGCGGGIIGIIGIGIGIGIGGGGG5CGGGCIGG=IGGGGCGGGGGGCGC5=GGIGCGGCCGC=CG8C5GGCCGGGCG

##  Convert sam into bam - `samtools view`

In [75]:
samtools view -@4 -bh -S -o  Clone$i.bam Clone$i.sam 

[samopen] SAM header is present: 1 sequences.


#### Check that the bam file have been created 

* Have a look at the filesize of the sam and bam files.
* Remove the sam file 

In [81]:
ls -lh
rm Clone$i.sam

total 39M
-rw-r--r-- 1 jovyan users 39M Jun 19 19:36 Clone10.bam
rm: cannot remove 'Clone10.sam': No such file or directory


: 1

## Calculate stats from mapping `samtools flagstat`

In [82]:
samtools flagstat Clone$i.bam >Clone$i.flagstat

#### Display the content of the flagstat file

In [83]:
cat Clone10.flagstat

296107 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
221305 + 0 mapped (74.74%:-nan%)
296107 + 0 paired in sequencing
148037 + 0 read1
148070 + 0 read2
218251 + 0 properly paired (73.71%:-nan%)
219689 + 0 with itself and mate mapped
1616 + 0 singletons (0.55%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)


##  Generate a bam file that contains only the reads correctly paired mapped ``samtools view

https://broadinstitute.github.io/picard/explain-flags.html

In [84]:
samtools view -bh -@4 -f 0x02 -o Clone$i.mappedpaired.bam Clone$i.bam 

##  Sorting final bam 

* Generate the bam file ordered
* Check that the new bam file have been created
* Remove the bam file previously created (Clone$i.mappedpaired.bam)

In [85]:
samtools sort -@4 Clone$i.mappedpaired.bam Clone$i.SORTED 
rm Clone$i.mappedpaired.bam

##  Indexing bam file

In [86]:
samtools index Clone$i.SORTED.bam

In [87]:
ls -lrt

total 56884
-rw-r--r-- 1 jovyan users 40019082 Jun 19 19:36 Clone10.bam
-rw-r--r-- 1 jovyan users      381 Jun 19 19:39 Clone10.flagstat
-rw-r--r-- 1 jovyan users 18215336 Jun 19 19:43 Clone10.SORTED.bam
-rw-r--r-- 1 jovyan users     2856 Jun 19 19:43 Clone10.SORTED.bam.bai


## Let's map with data from all clones using a loop for mapping, with a single folder per sample

In [88]:
for i in {1..5}
    do
        cd ~/work/MAPPING-ILL
        echo -e "\n\n>>>>>>>>>> Creation directory for Clone$i"
        mkdir -p dirClone$i
        cd dirClone$i
        
        echo -e "\n>>>> Declare variables$i"
        REF="/home/jovyan/work/SV_DATA/REF/reference.fasta"
        ILL_R1="/home/jovyan/work/SV_DATA/SHORT_READS/Clone${i}_R1.fastq.gz"
        ILL_R2="/home/jovyan/work/SV_DATA/SHORT_READS/Clone${i}_R2.fastq.gz"

        echo -e "\n>>>> Mapping Clone$i\n"
        bwa-mem2 mem -M -t 8 $REF $ILL_R1 $ILL_R2 > Clone$i.sam
        
        echo -e "\n>>>> convert sam to bam for Clone$i"
        samtools view -@4 -bh -S -o  Clone$i.bam Clone$i.sam 
        rm Clone$i.sam
        echo -e "\n>>>> Flagstats from all reads $i"
        samtools flagstat Clone$i.bam >Clone$i.flagstat
        
        echo -e "\n>>>> Extract only correctly mapped and calculate flagstats $i"
        samtools view -bh -@4 -f 0x02 -o Clone$i.mappedpaired.bam Clone$i.bam 
        
        echo -e "\n>>>> Sort mappedpaires bam file $i"
        samtools sort -@4 Clone$i.mappedpaired.bam Clone$i.SORTED 
        rm Clone$i.mappedpaired.bam
    done


>>>>>>>>>> Creation directory for Clone1

>>>> Declare variables1

>>>> Mapping Clone1

-----------------------------
Executing in AVX2 mode!!
-----------------------------
Ref file: /home/jovyan/work/SV_DATA/REF/reference.fasta
Entering FMI_search
reference seq len = 2040003
count
0,	1
1,	576484
2,	1020002
3,	1463520
4,	2040003

Reading other elements of the index from files /home/jovyan/work/SV_DATA/REF/reference.fasta
prefix: /home/jovyan/work/SV_DATA/REF/reference.fasta
[M::bwa_idx_load_ele] read 0 ALT contigs
Done reading Index!!
Reading reference genome..
Binary seq file = /home/jovyan/work/SV_DATA/REF/reference.fasta.0123
Reference genome size: 2040002 bp
Done readng reference genome !!

[0000] 1: Calling process()

Threads used (compute): 4
Info: projected #read in a task: 264910
------------------------------------------
Memory pre-allocation for chaining: 557.3706 MB
Memory pre-allocation for BSW: 958.4681 MB
Memory pre-allocation for BWT: 309.2567 MB
-----------------------

## 2.2 Mapping Long reads vs a Reference

Similar process such as SR is done in LR. In this case mapper is minimap2.

In [89]:
# Declare variables
i=10
REF_DIR="/home/jovyan/work/SV_DATA/REF/"
REF="/home/jovyan/work/SV_DATA/REF/reference.fasta"
ONT="/home/jovyan/work/SV_DATA/LONG_READS/Clone${i}.fastq.gz"

##  Let's map now but only WITH READS FROM ONLY ONE CLONE

In [90]:
mkdir -p ~/work/MAPPING-ONT
cd ~/work/MAPPING-ONT
echo -e "\nCreation directory for Clone$i\n"
echo Clone$i
mkdir -p dirClone$i
cd dirClone$i


Creation directory for Clone10

Clone10


In [92]:
echo -e "\nMapping Clone$i minimap2 \n"
minimap2 -ax map-ont -t 8 ${REF} ${ONT} > Clone${i}_ONT.sam 


Mapping Clone10 minimap2 

[M::mm_idx_gen::0.079*1.05] collected minimizers
[M::mm_idx_gen::0.109*2.36] sorted minimizers
[M::main::0.109*2.36] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.118*2.26] mid_occ = 10
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.123*2.21] distinct minimizers: 165344 (91.75% are singletons); average occurrences: 1.156; average spacing: 5.336
[M::worker_pipeline::12.493*5.52] mapped 11235 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -ax map-ont -t 8 /home/jovyan/work/SV_DATA/REF/reference.fasta /home/jovyan/work/SV_DATA/LONG_READS/Clone10.fastq.gz
[M::main] Real time: 12.507 sec; CPU: 68.945 sec; Peak RSS: 0.530 GB


In [93]:
head -n3 Clone${i}_ONT.sam

@SQ	SN:Reference	LN:1020001
@PG	ID:minimap2	PN:minimap2	VN:2.17-r941	CL:minimap2 -ax map-ont -t 8 /home/jovyan/work/SV_DATA/REF/reference.fasta /home/jovyan/work/SV_DATA/LONG_READS/Clone10.fastq.gz
86f59255-6632-405c-a329-62d9dba8a95f	0	Reference	495672	60	3157S38M1D8M1D49M1I3M1D8M1I37M2I5M2D52M1D10M2I8M1D11M1I18M1I17M1I62M1D56M1D27M2D32M3D16M1D3M3I41M1I5M1D3M4I6M1I3M1I2M1I6M1D164M4I13M1I29M1D2M1I22M1I3M2I6M2D14M1D15M1D47M1I61M1D32M3D13M2I17M1I60M1D22M1I62M1D17M2I15M1D8M1I2M1I21M1D22M1D2M1D37M1D5M2I3M1D5M3D3M2D27M1I8M1D17M1I8M1I4M1I5M1I13M2I42M1I2M1I24M1I6M1I1M1I39M1D38M3D11M1I9M2D12M1I1M1I44M1I36M1D11M1D48M2I4M2I5M2I103M1I2M2D20M2D45M1I15M1I37M1I10M1I5M1I42M2I23M1D17M7S	*	0	0	TATTTTCAAATACTAAATGATTTCAACTGAAAACGTCATCAATAACTCAAAGTTGTATTAATCATCAAGATCTATAACTTTCATTTTGGTCAGTTCTTCATCGGACAAAGTAATTTGTAAGATTGTTCCACAAGATGTACTTATCTTTTATATAGTTAATAAAAACTATAAGAGTGGTTACATTTTGTGAACAGTCTTATTAATAACTTTGTCGGATGAAGAAATGTCTAAATGGGTTATAGATCTTGCTGAGTTATACAACTACGTTCTTGATGACTTTTCAGCCGAAATCAAATACTACTGCAAAATATTGT

In [94]:
## Convert samtobam 
echo -e "\nConvert samtobam and filter it \n"
samtools view -@4 -bh -S -F 0x904 -o Clone${i}_ONT.bam Clone${i}_ONT.sam
rm Clone${i}_ONT.sam


Convert samtobam and filter it 

[samopen] SAM header is present: 1 sequences.


In [96]:
echo -e "\nSort and index bam \n"
# sort and index bam
samtools sort -@8 Clone${i}_ONT.bam Clone${i}_ONT_SORTED 
samtools index Clone${i}_ONT_SORTED.bam


Sort and index bam 



In [97]:
# Calculate stats from mapping
echo -e "\nCalculate stats from mapping\n"
samtools flagstat Clone${i}_ONT_SORTED.bam >Clone${i}_ONT.flagstats


Calculate stats from mapping



In [98]:
head Clone10_ONT.flagstats

9281 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
9281 + 0 mapped (100.00%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr


## Let's map with data from all clones using a loop for mapping, with a single folder per sample and ONT reads

In [45]:
for i in {1..5}
    do
        ONT="/home/jovyan/work/SV_DATA/LONG_READS/Clone${i}.fastq.gz"
        mkdir -p ~/work/MAPPING-ONT
        cd ~/work/MAPPING-ONT
        echo -e "\n>>>>>>>>>> Creation directory for Clone$i\n"
        mkdir -p dirClone$i
        cd dirClone$i
        
        echo -e ">>>> Mapping Clone$i minimap2\n"
        minimap2 -ax map-ont -t 4 ${REF} ${ONT} > Clone${i}_ONT.sam 
        
        # Convert samtobam 
        echo -e ">>> Convert samtobam and filter it \n"
        samtools view -@4 -bh -S -F 0x904 -o Clone${i}_ONT.bam Clone${i}_ONT.sam
        rm Clone${i}_ONT.sam

        echo -e ">>>> Sort and index bam \n"
        # sort and index bam
        samtools sort -@4 Clone${i}_ONT.bam Clone${i}_ONT_SORTED 
        samtools index Clone${i}_ONT_SORTED.bam

        # Calculate stats from mapping
        echo -e ">>>> Calculate stats from mapping\n"
        samtools flagstat Clone${i}_ONT_SORTED.bam >Clone${i}_ONT.flagstats
    done


>>>>>>>>>> Creation directory for Clone1

>>>> Mapping Clone1 minimap2

[M::mm_idx_gen::0.082*1.05] collected minimizers
[M::mm_idx_gen::0.109*1.69] sorted minimizers
[M::main::0.110*1.69] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.118*1.64] mid_occ = 10
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.123*1.62] distinct minimizers: 165344 (91.75% are singletons); average occurrences: 1.156; average spacing: 5.336
[M::worker_pipeline::17.557*3.31] mapped 10241 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -ax map-ont -t 4 /home/jovyan/work/SV_DATA/REF/reference.fasta /home/jovyan/work/SV_DATA/LONG_READS/Clone1.fastq.gz
[M::main] Real time: 17.572 sec; CPU: 58.115 sec; Peak RSS: 0.417 GB
>>> Convert samtobam and filter it 

[samopen] SAM header is present: 1 sequences.
>>>> Sort and index bam 

>>>> Calculate stats from mapping


>>>>>>>>>> Creation directory for Clone2

>>>> Mapping Clone2 minimap2

[M::mm_id

In [46]:
ls

Clone5_ONT.bam        Clone5_ONT_SORTED.bam
Clone5_ONT.flagstats  Clone5_ONT_SORTED.bam.bai


# REORDER BAM FILES INTO A FOLDER ONLY TO ILLUMINA

In [42]:
mkdir -p ~/work/MAPPING-ILL/BAM
cd ~/work/MAPPING-ILL/

for i in {1..20}
    do
         ln -s ~/work/MAPPING-ILL/dirClone$i/Clone${i}.bam BAM/
    done

In [49]:
ls /home/jovyan/work/MAPPING-ILL/BAM -l

total 0
lrwxrwxrwx 1 jovyan users 52 Jun 19 18:16 Clone10.bam -> /home/jovyan/work/MAPPING-ILL/dirClone10/Clone10.bam
lrwxrwxrwx 1 jovyan users 52 Jun 19 18:16 Clone11.bam -> /home/jovyan/work/MAPPING-ILL/dirClone11/Clone11.bam
lrwxrwxrwx 1 jovyan users 52 Jun 19 18:16 Clone12.bam -> /home/jovyan/work/MAPPING-ILL/dirClone12/Clone12.bam
lrwxrwxrwx 1 jovyan users 52 Jun 19 18:16 Clone13.bam -> /home/jovyan/work/MAPPING-ILL/dirClone13/Clone13.bam
lrwxrwxrwx 1 jovyan users 52 Jun 19 18:16 Clone14.bam -> /home/jovyan/work/MAPPING-ILL/dirClone14/Clone14.bam
lrwxrwxrwx 1 jovyan users 52 Jun 19 18:16 Clone15.bam -> /home/jovyan/work/MAPPING-ILL/dirClone15/Clone15.bam
lrwxrwxrwx 1 jovyan users 52 Jun 19 18:16 Clone16.bam -> /home/jovyan/work/MAPPING-ILL/dirClone16/Clone16.bam
lrwxrwxrwx 1 jovyan users 52 Jun 19 18:16 Clone17.bam -> /home/jovyan/work/MAPPING-ILL/dirClone17/Clone17.bam
lrwxrwxrwx 1 jovyan users 52 Jun 19 18:16 Clone18.bam -> /home/jovyan/work/MAPPING-ILL/dirClone18/Clone18.bam
lr

# REORDER BAM FILES INTO A FOLDER NOW FOR ONT

In [43]:
mkdir -p ~/work/MAPPING-ONT/BAM
cd ~/work/MAPPING-ONT/

for i in {1..20}
    do
         ln -s ~/work/MAPPING-ONT/dirClone$i/Clone${i}_ONT_SORTED.bam BAM/
    done