# <span style="color:#006E7F">__Introduction to Oxford Nanopore Data Analysis__ <a class="anchor"></span>  


Created by J. Orjuela (DIADE-IRD), F. Sabot (DIADE-IRD) and G. Sarah (AGAP-INRAE) - Septembre 2021 Formation SouthGreen

Adapted by J. Orjuela (DIADE-IRD), F. Sabot (DIADE-IRD) - Novembre 2022


# <span style="color:#006E7F">__TP2 - Assembly and correction__ <a class="anchor" id="data"></span>  
    
## <span style="color: #4CACBC;"> 1. Assemblies </span>  

More contiguous genome assemblies can be generated using long sequencing read but assembly is not a quiet pace. Eukaryotic genomes assembly is a complex task (large genome sizes, high rates of repeated sequences, high heterozygosity levels and even polyploidy). While prokaryotic genomes may appear less challenging, specific features such as circular DNA molecules, must be taken into consideration to achieve high quality assembly.

* For assembly, ONT recommend sequencing a human genome to a minimum depth of 30x of 25–35 kb reads.
However, sequencing to a depth of 60x is advisable to obtain the best assembly metrics. We also recommend basecalling in high accuracy mode. Greatest contig N50 is usually obtained with Shasta and Flye. Polishing/Correction is also recommended (Racon and Medaka). https://nanoporetech.com/sites/default/files/s3/literature/human-genome-assembly-workflow.pdf


* Long reads simplify genome assembly, with the ability to span repeat-rich sequences (characteristic of  antimicrobial resistance genes) and structural variants. Nanopore sequencing also shows a lack of
bias in GC-rich regions, in contrast to other sequencing platforms. To perform microbial genome assembly, we suggest using the third-party de novo assembly tool Flye. We also recommend one round of polishing with Medaka. https://nanoporetech.com/sites/default/files/s3/literature/microbial-genome-assembly-workflow.pdf 

Flye  https://github.com/fenderglass/Flye 

Canu  https://canu.readthedocs.io/en/latest/quick-start.html

Miniasm  https://github.com/lh3/miniasm + Minipolish version https://github.com/rrwick/Minipolish

Shasta  https://github.com/chanzuckerberg/shasta

Smartdenovo  https://github.com/ruanjue/smartdenovo

Raven  https://github.com/lbcb-sci/raven

### <span style="color: #4CACBD;"> 1.1 Assembly using Flye </span>

We are going to assembly some Clones by using Flye https://github.com/fenderglass/Flye

Flye generates the concatenation of multiple disjoint genomic segments called disjointigs to build a repeat graph. Reads are mapped to this repeat graph to resolve conflicts (unbridged repeats) and output contigs.

In [None]:
# create a repertory for save flye assembly results
mkdir -p ~/work/RESULTS/FLYE
cd ~/work/RESULTS/
pwd

In [None]:
# Run flye
time flye --nano-raw ~/work/DATA/real_Hh/H_M1C132_1.fastq.gz --genome-size 2100000 --out-dir FLYE --threads 8

### How many contigs were obtained by Flye ? 

### What do you think about N50 and lenght contig mean.

## <span style="color: #4CACBC;"> 2. Polishing assemblies with Racon </span>  

Racon corrects raw contigs generated by rapid assembly methods with original ONT reads.

From 2 to 4 racon rounds are usually used by the community. 

Polish contigs assembled by Flye using two rounds of racon !.

In [None]:
racon

## Racon first round

In [None]:
mkdir -p ~/work/RESULTS/FLYE_RACON
cd ~/work/RESULTS/FLYE_RACON

In [None]:
time minimap2 -t 6 ../FLYE/assembly.fasta  ~/work/DATA/real_Hh/H_M1C132_1.fastq.gz 1> assembly.minimap4racon1.paf
time racon -t 6 ~/work/DATA/real_Hh/H_M1C132_1.fastq.gz assembly.minimap4racon1.paf ../FLYE/assembly.fasta > assembly.racon1.fasta

## Racon second round

In [None]:
# second round
cd ~/work/RESULTS/FLYE_RACON
time minimap2 -t 4 assembly.racon1.fasta ~/work/DATA/real_Hh/H_M1C132_1.fastq.gz 1> assembly.minimap4racon2.paf
time racon -t 4 ~/work/DATA/real_Hh/H_M1C132_1.fastq.gz assembly.minimap4racon2.paf assembly.racon1.fasta > assembly.racon2.fasta