# Nanopore sequencing processing and assembly




## Guppy Basecaller

Guppy Basecaller is a software tool to convert the raw electrical signal ("Squiggle") data from nanopore sequencing into DNA sequences. 
Here's an example of a commonly used command with Guppy Basecaller:


In [None]:
guppy_basecaller -i /input/directory -s /output/directory --config configuration.cfg

In this command:

- -i /input/directory specifies the input directory where the raw data files (.fast5 files) are located.
- -s /output/directory specifies the output directory where the basecalled reads will be written.
- --config configuration.cfg is used to specify a configuration file that contains the basecalling model and other settings. Here we used dna_r9.4.1_450bps_hac. 

This command will basecall the raw data in the specified input directory using the settings from the configuration file, and write the output to the specified output directory.

## Porechop

Porechop is a tool developed for Oxford Nanopore sequencing data. It is used for finding and removing adapters from Oxford Nanopore reads. Adapters on the ends of reads are trimmed off, and when a read has an adapter in its middle, it is treated as chimeric and chopped into separate reads.

Here's an example of a commonly used command with Porechop:


In [None]:
porechop -i input.fastq -o output.fastq


In this command:

- -i input.fastq specifies the input file (in FASTQ format) that you want to trim adapters from.
- -o output.fastq specifies the output file where the trimmed reads will be written.

This command reads the input FASTQ file, trims adapters from the reads, and writes the resulting trimmed reads to the output file.

## NanoFilt

NanoFilt is a tool to filter Oxford Nanopore sequencing data. It reads in a FASTQ file (or stdin), filters reads based on a minimum quality and/or a minimum length, and writes out the filtered reads to stdout.

Here's an example of a commonly used command with NanoFilt:


In [None]:
gunzip -c input.fastq.gz | NanoFilt -q 9 -l 200 | gzip > output.fastq.gz


In this command:

- gunzip -c input.fastq.gz is used to decompress the input FASTQ file.
- NanoFilt -q 9 -l 200 filters reads based on a minimum quality of 9 and a minimum length of 200.
- gzip > output.fastq.gz compresses the filtered reads and writes them to the output file.

This command decompresses the input FASTQ file, filters the reads based on the specified minimum quality and length, and writes the filtered reads to the output file in compressed format.

## Flye

Flye is a de novo assembler for single-molecule sequencing reads. The Flye assembler provides accurate, fast, and scalable solutions to assembly problems.

Here's an example of a commonly used command with Flye:


In [None]:
flye --nano-corr input.fastq --out-dir output_directory 


In this command:

- --nano-corr input.fastq specifies the input file containing the reads for filtered reads.
- --out-dir output_directory specifies the directory where the assembly result will be written.

This command runs Flye on the input reads, assembles the reads, and writes the assembly result to the specified output directory.

## Minimap2

Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database. 

Here's an example of a commonly used command with Minimap2:


In [None]:
minimap2 -ax map-ont /path/to/flye/output/assembly.fasta /path/to/nanofilt/output/reads.fastq | samtools sort -o /path/to/minimap2/output/reads.sorted.bam


In this command:

- -ax map-ont tells minimap2 that the input are Oxford Nanopore reads.
- /path/to/flye/output/assembly.fasta is the file path to the assembled sequences.
- /path/to/nanofilt/output/reads.fastq is the file path to the nanopore reads.
- samtools sort -o /path/to/minimap2/output/reads.sorted.bam sorts the output and specifies the output file path.

This command runs minimap2 and realigns the initial reads to the assemblye assembled by flye.

## Racon

Racon is a consensus module to correct raw contigs generated by rapid assembly methods which do not include a consensus step, such as Flye.

Here's an example of a commonly used command with Racon:


In [None]:
racon /path/to/nanofilt/output/reads.fastq /path/to/minimap2/output/reads.sorted.bam /path/to/flye/output/assembly.fasta > /path/to/racon/output/assembly.polished.fasta


In this command:

- /path/to/nanofilt/output/reads.fastq is the file path to the nanopore reads.
- /path/to/minimap2/output/reads.sorted.bam is the file path to the sorted alignment file.
- /path/to/flye/output/assembly.fasta is the file path to the assembled sequences.
- /path/to/racon/output/assembly.polished.fasta specifies the output file path.