# Part 3. Hands-on over the genome assembly  

Welcome to the hands-on tutorial! Here we will assemble two bacteria, an *E.coli* used for comparative genomics and a *Vibrio alginolyticus*, which plays a role in mangroves and was originally isolated in the Colombian Pacific (doi: 10.1128/spectrum.02928-23), it has the hability to create symbiotic relationships with other organisms, it is used as an indicator of species for monitoring pollution and eutrophication in coastal and estuarine waters, including mangroves.

Please before start, be organized with your files, create a Google Drive folder to save all the material of this course and upload all the ipynb downloaded from github.

### Download data

In [None]:
!wget https://zenodo.org/records/14885059/files/long_reads_tutorial.tar.gz

### Extract the .tar.gz file

In [None]:
!tar -xvf long_reads_tutorial.tar.gz

In [None]:

%cd long_reads_tutorial

In [None]:
#We can list the files in that folder
!ls

## Genome assembly using NGSEP

Para ejecutar el jar se requiere, previamente, cargar el módulo de Java. De esta manera podrá hacer la ejecución del siguiente comando, el cual permitirá ver los diferentes algoritmos que integran NGSEP


In [None]:
#This takes about 3 seconds
!java -jar NGSEPcore_5.0.0.jar

You will get the following output:

![image](./images/ngsep.png)

To visualize the assembler options, run:

In [None]:
!java -jar NGSEPcore_5.0.0.jar Assembler

You will get the following output:

![image](./images/ngsep_2.png)

## *Vibrio alginolitycus* Assembly - Nanopore sequencing from a Colombian Sample   

In [None]:
#Runs in about 35 minutes
!java -XX:+UseSerialGC -Xmx12g -jar NGSEPcore_5.0.0.jar Assembler -i data/SRR31094202_m10k_q15_Valginolyticus_nanopore.fastq.gz -o Valginolyticus_nanopore_ngsep

After 35 minutes You will get the following output:

![image](./images/ngsep_3.png)

## Quality Evaluation

The results obtained by assemblers may incur errors that undermine the quality of the assemblies, which is why it is necessary to review the quality of the results. In this section, the QUAST and BUSCO tools will be used, which will allow the quality of the genomic assemblies to be analyzed.

In [None]:
#Takes around 30 seconds
!pip install quast

In [None]:
!quast.py -t 4 Valginolyticus_nanopore_ngsep.fa data/Vibrioalginolyticus_ASM2365091v1.fna

You will get the following output:

![image](./images/quast.png)

## Let's install BUSCO

First, we need to install a Colab version of Conda:

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install()

In [None]:
#This takes about 4 minutes
!conda install bioconda::busco

In [None]:
#Takes about 20 seconds
!run_BUSCO.py -i Valginolyticus_nanopore_ngsep.fa -m genome -l bacteria_odb10 -o valginolyticus_ngsep_busco

## Gene annotation
We will use prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm) a gene prediction software, that is utilized for prokaryotic genomes, it identifys genes and translation initiation sites creating a gff3 format file.

In [None]:
!conda install bioconda::prodigal

In [None]:
#takes about 15 seconds
!prodigal -f gff -i Valginolyticus_nanopore_ngsep.fa -o Valginolyticus_nanopore_ngsep.gff3

## Alignment and genome comparison
We will use NGSEP GenomesAligner, it allows to align genomes based on gene synteny

In [None]:
#Takes about 1 minute
!java -XX:+UseSerialGC -Xmx12g -jar NGSEPcore_5.0.0.jar GenomesAligner -o galn Valginolyticus_nanopore_ngsep.fa Valginolyticus_nanopore_ngsep.gff3 data/Vibrioalginolyticus_ASM2365091v1.fna data/Vibrioalginolyticus_ASM2365091v1.gff3
