Skip to content
Permalink
master
Go to file
 
 
Cannot retrieve contributors at this time
51 lines (39 sloc) 3.37 KB

#Annotating metagenomes using Prokka Authored by Jackson Sorensen
EDAMAME-2015 wiki


EDAMAME tutorials have a CC-BY license. Share, adapt, and attribute please!


##Overarching Goal

  • This tutorial will contribute towards understanding microbial metagenome analysis.

##Learning Objectives

  • Use awk to change fasta headers
  • Use prokka to annotate metagenome
  • Investigate the outputs prokka annotation
  • Use grep to find annotated sequences of interested

Now that our assembly finished, we are going to work on annotation. Annotation is the process of identifying coding sequences, RNA's and other important features from raw (meta)genomic fasta files. There are several annotation programs available but we will only be using Prokka for this course. We are using Prokka in part because it is fast, but also because running the actual command is quite simple. Prokka makes us of a set of software in order to provide a quick and robust annotation of contigs from genomes/metagenomes. Prokka can identify coding regions, rRNA, tRNA, signal peptides, and noncoding RNA.

Before we can run the annotation we need to make some changes to our fasta headers from the final.contigs.fa. The fasta headers produced by megahit are too long and cause an error in prokka. Take a look at them by using head. We will use a quick awk script adapted from Pierre Lindenbaum to change our fasta headers and write them to a new file. Pierre's original Script.

awk '/^>/{print ">contig" ++i; next}{print}' < final.contigs.fa > New_Headers.fa

Take a look at New_Headers.fa using head and you will see that the fasta headers are shorter and labeled sequentially. Now that is taken care of we can start our annotation. This step will take several hours, so be sure you are using tmux when you start the command.

prokka --outdir CentraliaMG_Prokka New_Headers.fa

Make sure that you do not have a directory called "CentraliaMG_Prokka" before running this script as prokka will fail if you do. Once prokka finishes, it will output 11 files in total. You can find out more about each of these output files from the prokka paper here.

#Resources and help

General bioinformatic help

Pierre Lindebaum

##Prokka

##Softwares Prokka uses

You can’t perform that action at this time.