# Genome annotation
Genome annotation is essential for understanding an organism's genetic blueprint, divided into two main processes: structural and functional annotation. Structural annotation identifies the locations of genes and genomic elements like coding regions, exons, and regulatory sequences, providing a map of the genome's architecture. Functional annotation assigns roles to these genes, linking them to biological processes, protein functions, and pathways. Together, these approaches offer a comprehensive view, crucial for research on complex pathogen-host interactions, as in phytoplasma studies, where understanding both gene locations and functions is key to advancing diagnostics and disease management.

In [None]:
# @title
!wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
!chmod +x Miniconda3-latest-Linux-x86_64.sh
!bash ./Miniconda3-latest-Linux-x86_64.sh -b -f -p /usr/local
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')
!conda config --add channels defaults
!conda config --add channels bioconda
!conda config --add channels conda-forge

# Genome Annotation: Tailoring Approaches for Prokaryotes and Eukaryotes

Annotation approaches vary based on the type of organism, with specific gene models suited to either prokaryotic or eukaryotic genomes. For prokaryotes, specialized annotation tools are essential to capture their unique genomic features, such as densely packed genes and operons, which differ significantly from the complex structures found in eukaryotic genomes. Certain annotators are designed exclusively for prokaryotic genomes and SHOULD NOT be used for eukaryotes, as they lack the capacity to handle introns and the intricate regulatory regions characteristic of eukaryotic genes. Choosing the right annotation tool is crucial for accurate genome analysis and meaningful biological insights.

**Lets annotate a prokaryote genome - Prokka**

Prokka only handles prokaryotic genomes!!

Install ncbi-datasets to fetch genomes form Genbank

In [None]:
!conda install conda-forge::ncbi-datasets-cli -y

Download a Xanthomonas oryzae genome. Unzip the file, create a new folder named "genomes," and move the FASTA file containing the genome to the "genomes" folder.

In [None]:
!datasets download genome accession GCF_004355885.3 --include genome,seq-report
!unzip ncbi_dataset.zip
!mkdir genomes
!mv ncbi_dataset/data/GCF_004355885.3/*.fna genomes/

Install prokka tool

In [None]:
!conda install -c bioconda prokka -y

Run prokka to annotate the Xathomonas oryzae genome

In [None]:
!prokka --locustag xoo --outdir prokka_results genomes/GCF_004355885.3_ASM435588v3_genomic.fna

Your results will be saved in prokka_results. This folder contains sevreal files with the gene predictions, protein sequences, funational annotation , genbank file, anf gff file.

**Lets annotatte a Eukaryte genome - Augutus**

Augustus is an annotator that can handle both eukaryotic and prokaryotic genomes

Download a fungal genome, Fusarium oxysporum. Unzip the file, create a new folder named "genomes," and move the FASTA file containing the genome to the "genomes" folder.

In [None]:
!datasets download genome accession GCA_013085055.1 --include genome,seq-report

In [None]:
!unzip -o ncbi_dataset.zip

In [None]:
!cp ncbi_dataset/data/GCA_013085055.1/*.fna genomes/

Install Augutus annotation tool

In [None]:
!apt-get update
!apt-get install augustus

Run Augutus

In [None]:
!augustus --species=fusarium --codingseq=on --protein=on genomes/GCA_013085055.1_ASM1308505v1_genomic.fna > augutus_annot.gff

Augutus will oproduce a gff file that contain sthe structurla annotation. This gff file cna be use to extrcta the gene and protein sequences.

In [None]:
!perl /usr/share/augustus/scripts/getAnnoFasta.pl augutus_annot.gff

Check the number of protein and genes

In [None]:
!grep -c '>' augutus_annot.aa
!grep -c '>' augutus_annot.codingseq