# Gaius-Augustus/BRAKER

No description, website, or topics provided.
Switch branches/tags
Nothing to show
Latest commit 84d5c71 Sep 17, 2018
 Failed to load latest commit information. docs Sep 17, 2018 example Sep 10, 2018 scripts Sep 10, 2018 README.TXT Sep 10, 2018 README.md Sep 17, 2018

# BRAKER2 User Guide {#braker2-user-guide .unnumbered}

## Authors and Contact Information {#authors-and-contact-information .unnumbered}

Katharina J. Hoff (), Simone Lange, Alexandre Lomsadze, Mark Borodovsky, Mario Stanke

bibliography: - ‘refs.bib’ title: BRAKER2 User Guide —

If you are viewing this file as README.md, figures will not displayed, properly. We recommend viewing the file docs/userguide.pdf.

# Introduction

## What is BRAKER2?

The rapidly growing number of sequenced genomes requires fully automated methods for accurate gene structure annotation. With this goal in mind, we have developed BRAKER1 (Hoff et al. 2015), a combination of GeneMark-ET (Lomsadze, Burns, and Borodovsky 2014) and AUGUSTUS (Stanke et al. 2008; Stanke et al. 2006), that uses genomic and RNA-Seq data to automatically generate full gene structure annotations in novel genomes.

However, the quality of RNA-Seq data that is available for annotating a novel genome is variable, and in some cases, RNA-Seq data is not available, at all.

BRAKER2 is an extension of BRAKER1 which allows for fully automated training of the gene prediction tools GeneMark-EX (Lomsadze et al. 2005; Ter-Hovhannisyan et al. 2008; Lomsadze, Burns, and Borodovsky 2014)[^1] and AUGUSTUS from RNA-Seq and/or protein homology information, and that integrates the extrinsic evidence from RNA-Seq and protein homology information into the prediction.

In contrast to other available methods that rely on protein homology information, BRAKER2 reaches high gene prediction accuracy even in the absence of the annotation of very closely related species and in the absence of RNA-Seq data.

BRAKER2 can also combine RNA-Seq and protein homology information.

## Keys to successful gene prediction

• Use a high quality genome assembly. If you have a huge number of very short scaffolds in your genome assembly, those short scaffolds will likely increase runtime dramatically but will not increase prediction accuracy.

• Use simple scaffold names in the genome file (e.g. >contig1 will work better than >contig1my custom species namesome putative function /more/information/  and lots of special characters %&!*(){}). Make the scaffold names in all your fasta files simple before running any alignment program.

• In order to predict genes accurately in a novel genome, the genome should be masked for repeats. This will avoid the prediction of false positive gene structures in repetitive and low complexitiy regions. Repeat masking is also essential for mapping RNA-Seq data to a genome. In case of GeneMark-EX and AUGUSTUS, softmasking (i.e. putting repeat regions into lower case letters and all other regions into upper case letters) leads to better results than hardmasking (i.e. replacing letters in repetitive regions by the letter N for unknown nucleotide). If the genome is masked, use the –softmasking flag of braker.pl.

• Many genomes have gene structures that will be predicted accurately with standard parameters of GeneMark-EX and AUGUSTUS within BRAKER2. However, some genomes have clade-specific features, i.e. special branch point model in fungi, or non-standard splice-site patterns. Please read the options section [options] in order to determine whether any of the custom options may improve gene prediction accuracy in the genome of your target species.

• Always check gene prediction results before further usage! You can e.g. use a genome browser for visual inspection of gene models in context with extrinsic evidence data.

## Overview of modes for running BRAKER2

BRAKER2 mainly features semi-unsupervised, extrinsic evidence data (RNA-Seq and/or protein spliced alignment information) supported training of GeneMark-EX^2 and subsequent training of AUGUSTUS with integration of extrinsic evidence in the final gene prediction step. However, there are now a number of additional pipelines included in BRAKER2. In the following, we give an overview of possible input files and pipelines:

• genome file, only. In this mode, GeneMark-ES is trained on the genome sequence, alone. Long genes predicted by GeneMark-ES are selected for training AUGUSTUS. Final predictions by AUGUSTUS are ab initio. This approach will likely yield lower prediction accuracy than all other here described pipelines. (see figure [braker-main-a]),

• genome and RNA-Seq file from the same species (see figure [braker-main-b]); this approach is suitable for RNA-Seq libraries with a good coverage of the transcriptome, important: this approach requires that each intron is covered by many alignments, i.e. it does not work with assembled transcriptome mappings,

• genome file and database of proteins that may be of longer evolutionary distance to the target species (see figure [braker-main-c]); this approach is suitable if no RNA-Seq data is available, and if no protein data from a very closely related species is available, important: this approach requires a database of protein families, i.e. many representatives of each protein family must be present in the database, please contact Alexandre Lomsadze for information about the required external GaTech protein mapping pipeline,

• genome and RNA-Seq file from the same species, and proteins that may be of longer evolutionary distance to the target species (see figure [braker-main-d]); important: this approach requires a database of protein families, i.e. many representatives of each protein family must be present in the database,

• genome file and file with proteins of short evolutionary distance (see figure [braker2-sidetrack-b]); this approach is suitable if RNA-Seq data is not available and if the reference species is very closely related,

• genome and RNA-Seq file and proteins of short evolutionary distance (see figures [braker2-sidetrack-a] and [braker2-sidetrack-c]). In both cases, GeneMark-ET is trained supported by RNA-Seq data, and the resulting gene predictions are used for training AUGUSTUS. In approach A), protein alignment information is used in the gene prediction step with AUGUSTUS, only. In approach C), protein spliced alignment data is used to complement the training set for AUGUSTUS. The latter approach is in particular suitable if RNA-Seq data does not produce a sufficiently high number of training gene structures for AUGUSTUS, and if a very closely related and already annotated species is available.

# Installation

## Supported software versions

At the time of release, this BRAKER2 version was tested with:

• AUGUSTUS 3.3.1[^3]

• GeneMark-ET 4.33

• BAMTOOLS 2.5.1 (Barnett et al. 2011)

• SAMTOOLS 1.7-4-g93586ed (Li et al. 2009)

• (Spaln 2.3.1 (Gotoh 2008b; Gotoh 2008a; Iwata and Gotoh 2012))[^4]

• (Exonerate 2.2.0 (Slater and Birney 2005))[^5]

• NCBI BLAST+ 2.2.31+ (Altschul et al. 1990; [???]{.citeproc-not-found data-reference-id="camacho2009blast"} +)

## BRAKER2

### Perl pipeline dependencies

Running BRAKER2 requires a Linux-system with bash and Perl. Furthermore, BRAKER2 requires the following CPAN-Perl modules to be installed:

• File::Spec::Functions

• Hash::Merge

• List::Util

• Logger::Simple

• Module::Load::Conditional

• Parallel::ForkManager

• POSIX

• Scalar::Util::Numeric

• YAML

On Ubuntu, for example, install the modules with CPANminus[^6]: sudo cpanm Module::Name, e.g. sudo cpanm Hash::Merge.

BRAKER2 also uses a Perl module helpMod.pm that is not available on CPAN. This module is part of the BRAKER2 release and does not require separate installation.

### BRAKER2 components {#Executability}

BRAKER2 is a collection of Perl scripts and a Perl module. The main script that will be called in order to run BRAKER2 is braker.pl. Additional Perl components are:

• align2hints.pl

• filterGenemark.pl

• filterIntronsFindStrand.pl

• startAlign.pl

• helpMod.pm

• findGenesInIntrons.pl

• downsample_traingenes.pl

All Perl scripts (files ending with *.pl) that are part of BRAKER2 must be executable in order to run BRAKER2. This should already be the case if you download BRAKER2 from our website. Executability may be overwritten if you e.g. transfer BRAKER2 on a USB-stick to anothre computer. In order to check whether required files are executable, run the following command in the directory that contains BRAKER2 Perl scripts:

ls -l *.pl


The output should be similar to this:

-rwxr-xr-x 1 katharina katharina  18191 Mai  7 10:25 align2hints.pl
-rwxr-xr-x 1 katharina katharina 408782 Aug 17 18:24 braker.pl
-rwxr-xr-x 1 katharina katharina   5024 Mai  7 10:25 downsample_traingenes.pl
-rwxr-xr-x 1 katharina katharina  30453 Mai  7 10:25 filterGenemark.pl
-rwxr-xr-x 1 katharina katharina   5754 Mai  7 10:25 filterIntronsFindStrand.pl
-rwxr-xr-x 1 katharina katharina   7765 Mai  7 10:25 findGenesInIntrons.pl
-rwxr-xr-x 1 katharina katharina  41674 Mai  7 10:25 startAlign.pl


It is important that the x in -rwxr-xr-x is present for each script. If that is not the case, run

chmod a+x *.pl


in order to change file attributes.

You may find it helpful to add the directory in which BRAKER2 perl scripts reside to your $PATH environment variable. For a single bash session, enter: PATH=/your_path_to_braker/:$PATH
export PATH


To make this $PATH modification available to all bash sessions, add the above lines to a startup script (e.g.\sim/.bashrc). ## Bioinformatics software dependencies BRAKER2 calls upon various bioinformatics software tools that are not part of BRAKER2. Some tools are obligatory, i.e. BRAKER2 will not run at all if these tools are not present on your system. Other tools are optional. Please install all tools that are required for running BRAKER2 in the mode of your choice. ### Mandatory tools #### GeneMark-EX Download GeneMark-EX[^7] from http://exon.gatech.edu/GeneMark/license_download.cgi. Unpack and install GeneMark-EX as described in GeneMark-EX’s README file. If already contained in your $PATH variable, BRAKER2 will guess the location of gmes_petap.pl, automatically. Otherwise, BRAKER2 can find GeneMark-EX executables either by locating them in an environment variable GENEMARK_PATH, or by taking a command line argument
(–GENEMARK_PATH=/your_path_to_GeneMark-EX/gmes_petap/).

In order to set the environment variable for your current Bash session, type:

export GENEMARK_PATH=/your_path_to_GeneMark-ET/gmes_petap/


Add the above lines to a startup script (e.g. \sim/.bashrc) in order to make it available to all bash sessions.[^8]

#### AUGUSTUS

Download AUGUSTUS from https://github.com/Gaius-Augustus/Augustus. Unpack AUGUSTUS and install AUGUSTUS according to AUGUSTUS README.TXT.

You should compile AUGUSTUS on your own system in order to avoid problems with versions of libraries used by AUGUSTUS. Compilation instructions are provided in the AUGUSTUS README.TXT file (Augustus/README.txt).

AUGUSTUS consists of augustus, the gene prediction tool, additional C++ tools located in
augustus/auxprogs and Perl scripts located in augustus/scripts. Perl scripts must be executable (see instructions in section [Executability].

The C++ tool bam2hints is an essential component of BRAKER2. Sources are located in
Augustus/auxprogs/bam2hints. Make sure that you compile bam2hints on your system (it should be automatically compiled when AUGUSTUS is compiled, but in case of problems with bam2hints, please read troubleshooting instructions in Augustus/auxprogs/bam2hints/README).

If you would like to train UTR parameters and integrate RNA-Seq coverage information into gene prediction with BRAKER2 (which is possible only if an RNA-Seq bam-file is provided as extrinsic evidence), utrrnaseq and bam2wig in the auxprogs directory are also required. If compilation with the default Makefile fails, please read troubleshooting instructions in Augustus/auxprogs/bam2wig/README.txt and Augustus/auxprogs/utrrnaseq/README, respectively.

Since BRAKER2 is a pipeline that trains AUGUSTUS, i.e. writes species specific parameter files, BRAKER2 needs writing access to the configuration directory of AUGUSTUS that contains such files (Augustus/config/). If you install AUGUSTUS globally on your system, the config folder will typically not be writable by all users. Either make the directory where config resides recursively writable to users of AUGUSTUS, or copy the config/ folder (recursively) to a location where users have writing permission.

AUGUSTUS will locate the config folder by looking for an environment variable $AUGUSTUS_CONFIG_PATH. If the $AUGUSTUS_CONFIG_PATH environment variable is not set, then BRAKER2 will look in the path ../config relative to the directory in which it finds an AUGUSTUS executable. Alternatively, you can supply the variable as a command line argument to BRAKER2
(–AUGUSTUS_CONFIG_PATH=/your_path_to_AUGUSTUS/augustus/config/). We recommend that you export the variable e.g. for your current bash session:

export AUGUSTUS_CONFIG_PATH=/your_path_to_AUGUSTUS/augustus/config/


In order to make the variable available to all Bash sessions, add the above line to a startup script, e.g. \sim/.bashrc.

BRAKER2 expects the entire config directory of AUGUSTUS at $AUGUSTUS_CONFIG_PATH, i.e. the subfolders species with its contents (at least generic) and extrinsic! Providing an writable but empty folder at $AUGUSTUS_CONFIG_PATH will not work for BRAKER. If you need to separate augustus binary and $AUGUSTUS_CONFIG_PATH, we recommend that you recursively copy the un-writable config contents to a writable location. You have a system-wide installation of AUGUSTUS at /usr/bin/augustus, an unwritable copy of config sits at /usr/bin/augustus_config/. The folder /home/yours/ is writable to you. Copy with the following command (and additionally set the then required variables): cp -r \texttt{/usr/bin/augustus_config/ /home/yours/ export AUGUSTUS_CONFIG_PATH=/home/yours/augustus_config export AUGUSTUS_BIN_PATH=/usr/bin export AUGUSTUS_SCRIPTS_PATH=/usr/bin/augustus_scripts ##### Modification of$PATH.

Adding adding directories of AUGUSTUS binaries and scripts to your $PATH variable enables your system to locate these tools, automatically. It is not a requirement for running BRAKER2 to do this, because BRAKER2 will try to guess them from the location of another environment variable ($AUGUSTUS_CONFIG_PATH), or both directories can be supplied as command line arguments to braker.pl, but we recommend to add them to your $PATH variable. For your current bash session, type: PATH=:/your_path_to_augustus/bin/:/your_path_to_augustus/scripts/:$PATH
export PATH


For all your BASH sessions, add the above lines to a startup script (e.g.\sim/.bashrc).

#### Bamtools

Download BAMTOOLS (e.g. git clone https://github.com/pezmaster31/bamtools.git). Install BAMTOOLS by typing the following in your shell:
cd your-bamtools-directory mkdir build cd build cmake .. make

If already in your $PATH variable, BRAKER2 will find bamtools, automatically. Otherwise, BRAKER2 can locate the bamtools binary either by using an environment variable $BAMTOOLS_PATH, or by taking a command line argument (–BAMTOOLS_PATH=/your_path_to_bamtools/bin/[^9]). In order to set the environment variable e.g. for your current bash session, type:

export BAMTOOLS_PATH=/your_path_to_bamtools/bin/


Add the above line to a startup script (e.g. \sim/.bashrc) in order to set the environment variable for all bash sessions.

#### NCBI BLAST+

On Ubuntu, install with sudo apt-get install ncbi-blast+.

If already in your $PATH variable, BRAKER2 will find blastp, automatically. Otherwise, BRAKER2 can locate the blastp binary either by using an environment variable $BLAST_PATH, or by taking a command line argument (–BLAST_PATH=/your_path_to_blast/). In order to set the environment variable e.g. for your current bash session, type:

export BLAST_PATH=/your_path_to_blast/


Add the above line to a startup script (e.g. \sim/.bashrc) in order to set the environment variable for all bash sessions.

### Optional tools

#### Samtools

Samtools is not required for running BRAKER2 if all your files are formatted, correctly (i.e. all sequences should have short and unique fasta names). If you are not sure whether all your files are fomatted correctly, it might be helpful to have Samtools installed because BRAKER2 can automatically fix certain format issues by using Samtools.

As a prerequisite for Samtools, download and install htslib (e.g.  git clone https://github.com/samtools/htslib.git, follow the htslib documentation for installation).

Download and install Samtools (e.g. git clone git://github.com/samtools/samtools.git), subsequently follow Samtools documentation for installation).

If already in your $PATH variable, BRAKER2 will find samtools, automatically. Otherwise, BRAKER2 can find Samtools either by taking a command line argument (–SAMTOOLS_PATH=/your_path_to_samtools/), or by using an environment variable $SAMTOOLS_PATH. For exporting the variable, e.g. for your current bash session, type:

export SAMTOOLS_PATH=/your_path_to_samtools/


Add the above line to a startup script (e.g. \sim/.bashrc) in order to set the environment variable for all bash sessions.

#### Python3 & Biopython

If Python3 and Biopython are installed, BRAKER2 can generate FASTA-files with coding sequences and protein sequences predicted by AUGUSTUS. This is an optional step, it can be disabled with the command-line flag --skipGetAnnoFromFasta; Python3 and Biopython are not required if this flag is set.

On Ubuntu, Python3 is installed by default. Install the Python3 package manager with:

sudo apt-get install python3-pip


Subsequently, install Biopython with:

sudo pip3 install biopython


On Ubuntu, python3 will be in your $PATH variable, by default, and BRAKER2 will automatically locate it. However, you have the option to specify the python3 binary location in two other ways: 1. Export an environment variable $PYTHON3_PATH, e.g. in your \sim/.bashrc file:

export PYTHON3_PATH=/path/to/python3/

2. Specify the command line option --PYTHON3_PATH=/path/to/python3/ to braker.pl.

This tool is required, only, if you would like to run protein to genome alignments with BRAKER2 using GenomeThreader. This is a suitable approach if an annotated species of short evolutionary distance to your target genome is available. Download GenomeThreader from http://genomethreader.org/. Unpack and install according to gth/README.

BRAKER2 will try to locate the GenomeThreader executable by using an environment variable
ALIGNMENT_TOOL_PATH. Alternatively, this can be supplied as command line argument (–ALIGNMENT_TOOL_PATH=/your/path/to/gth). #### Spaln This tool is required, only, if you would like to run protein to genome alignments with BRAKER2 using Spaln. This is a suitable approach if an annotated species of short evolutionary distance to your target genome is available. (We recommend the usage of GenomeThreader instad of Spaln.) Download Spaln from http://www.genome.ist.i.kyoto-u.ac.jp/~aln_user. Unpack and install according to spaln/doc/SpalnReadMe22.pdf. BRAKER2 will try to locate the Spaln executable by using an environment variable ALIGNMENT_TOOL_PATH. Alternatively, this can be supplied as command line argument
(–ALIGNMENT_TOOL_PATH=/your/path/to/spaln).

#### Exonerate

This tool is required, only, if you would like to run protein to genome alignments with BRAKER2 using Exonerate. This is a suitable approach if an annotated species of short evolutionary distance to your target genome is available. (We recommend the usage of GenomeThreader instad of Exonerate because Exonerate is comparably slower and has lower specificity than GenomeThreader.) Download Exonerate from https://github.com/nathanweeks/exonerate. Unpack and install according to exonerate/README. (On Ubuntu, download and install by typing sudo apt-get install exonerate.)

BRAKER2 will try to locate the Exonerate executable by using an environment variable

# Bug reporting

Before reporting bugs, please check that you are using the most recent versions of AUGUSTUS and BRAKER. Also, check the list of Common Problems (see section [commonproblems]), before reporting bugs.

## Reporting bugs on github

If you found a bug, please open an issue at https://github.com/Gaius-Augustus/BRAKER/issues (or contact katharina.hoff@uni-greifswald.de).

Information worth mentioning in your bug report:

Check in braker/yourSpecies/braker.log at which step braker.pl crashed.

There are a number of other files that might be of interest, depending on where in the pipeline the problem occured. Some of the following files will not be present if they did not contain any errors.

• braker/yourSpecies/errors/bam2hints.*.stderr - will give details on a bam2hints crash (step for converting bam file to intron gff file)

• braker/yourSpecies/hintsfile.gff - is this file empty? If yes, something went wrong during hints generation - does this file contain hints from source “b2h” and of type “intron”? If not: GeneMark-ET will not be able to execute properly.

• braker/yourSpecies/startAlign.stderr - if you provided a protein fasta file and this file is not empty, something went wrong during protein alignment

• braker/yourSpecies/startAlign.stdout - may give clues on at which point protein alignment went wrong

• braker/yourSpecies/(align_gthalign_exoneratealign_spaln)/*err - errors reported by the alignment tools gth/exonerate/spaln

• braker/yourSpecies/errors/GeneMark-ET.stderr - errors reported by GeneMark-ET

• braker/yourSpecies/errors/GeneMark-ET.stdout - may give clues about the point at which errors in GeneMark-ET occured

• braker/yourSpecies/GeneMark-ET/genemark.gtf - is this file empty? If yes, something went wrong during executing GeneMark-ET

• braker/yourSpecies/GeneMark-ET/genemark.f.good.gtf - is this file empty? If yes, something went wrong during filtering GeneMark-ET genes for training AUGUSTUS

• braker/yourSpecies/genbank.good.gb - try a “grep -c LOCUS genbank.good.gb” to determine the number of training genes for training AUGUSTUS, should not be low

• braker/yourSpecies/errors/firstetraining.stderr - contains errors from first iteration of training AUGUSTUS

• braker/yourSpecies/errors/secondetraining.stderr - contains errors from second iteration of training AUGUSTUS

• braker/yourSpecies/errors/optimize_augustus.stderr - contains errors optimize_augustus.pl (additional training set for AUGUSTUS)

• braker/yourSpecies/errors/augustus*.stderr - contain AUGUSTUS execution errors

## Common problems {#commonproblems}

• BRAKER complains that the RNA-Seq file does not correspond to the provided genome file, but I am sure the files correspond to each other!
Please check the headers of the genome FASTA file. If the headers are long and contain whitespaces, some RNA-Seq alignment tools will truncate sequence names in the BAM file. This leads to an error with BRAKER. Solution: shorten/simplify FASTA headers in the genome file before running the RNA-Seq alignment and BRAKER.

• There are duplicate Loci in the train.gb file (after using GenomeThreader)!
This issue arises if outdated versions of AUGUSTUS and BRAKER are used. Solution: Please update AUGUSTUS and BRAKER from github (https://github.com/Gaius-Augustus/Augustus, https://github.com/Gaius-Augustus/BRAKER).

# Citing BRAKER2 and software called by BRAKER2

Since BRAKER2 is a pipeline that calls several Bioinformatics tools, publication of results obtained by BRAKER2 requires that not only BRAKER2 is cited, but also the tools that are called by BRAKER2:

• Always cite and :

• Hoff, K.J., Lange, S., Lomsadze, A., Borodovsky, M. and Stanke, M. (2015). BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics, 32(5):767-769.

• Stanke, M., Diekhans, M., Baertsch, R. and Haussler, D. (2008). Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics, doi: 10.1093/bioinformatics/btn013.

• Stanke. M., Schöffmann, O., Morgenstern, B. and Waack, S. (2006). Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7, 62.

• If any kind of AUGUSTUS training was performed by BRAKER2, cite :

• Altschul, A.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990). A basic local alignment search tool. J Mol Biol, 215:403–410.

• Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., and Madden, T.L. (2009). Blast+: architecture and applications. BMC bioinformatics, 10(1):421.

• If BRAKER was executed with a genome file and no extrinsic evidence, cite :

• Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y.O. and Borodovsky, M. (2005). Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Research, 33(20):6494–6506.

• Ter-Hovhannisyan, V., Lomsadze, A., Chernoff, Y.O. and Borodovsky, M. (2008). Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome research, pages gr–081612, 2008.

• If BRAKER was executed with RNA-Seq information or with information from proteins of remote homology, cite :

• Lomsadze, A., Burns, P.D. and Borodovsky, M. (2014). Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Research, 42(15):e119.
• If BRAKER was executed with RNA-Seq alignments in bam-format, cite and:

• Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R.; 1000 Genome Project Data Processing Subgroup (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16):2078-9.

• Barnett, D.W., Garrison, E.K., Quinlan, A.R., Strömberg, M.P. and Marth G.T. (2011). BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics, 27(12):1691-2

• If BRAKER was executed with proteins of closely related species, cite :

• Gremme, G. (2013). Computational Gene Structure Prediction. PhD thesis, Universität Hamburg.

# Licence

All source code, i.e. scripts/*.pl or scripts/*.py are under the Artistic Licence (see http://www.opensource.org/licenses/artistic-license.php).

Altschul, S.F., W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. 1990. “Basic Local Alignment Search Tool.” Journal of Molecular Biology 215: 403–10.

Barnett, Derek W, Erik K Garrison, Aaron R Quinlan, Michael P Strömberg, and Gabor T Marth. 2011. “BamTools: A C++ Api and Toolkit for Analyzing and Managing Bam Files.” Bioinformatics 27 (12). Oxford University Press: 1691–2.

Gotoh, Osamu. 2008a. “A Space-Efficient and Accurate Method for Mapping and Aligning cDNA Sequences onto Genomic Sequence.” Nucleic Acids Research 36 (8). Oxford University Press: 2630–8.

———. 2008b. “Direct Mapping and Alignment of Protein Sequences onto Genomic Sequence.” Bioinformatics 24 (21). Oxford University Press: 2438–44.

Gremme, G. 2013. “Computational Gene Structure Prediction.” PhD thesis, Universität Hamburg.

Hoff, Katharina J, Simone Lange, Alexandre Lomsadze, Mark Borodovsky, and Mario Stanke. 2015. “BRAKER1: Unsupervised Rna-Seq-Based Genome Annotation with Genemark-et and Augustus.” Bioinformatics 32 (5). Oxford University Press: 767–69.

Iwata, Hiroaki, and Osamu Gotoh. 2012. “Benchmarking Spliced Alignment Programs Including Spaln2, an Extended Version of Spaln That Incorporates Additional Species-Specific Features.” Nucleic Acids Research 40 (20). Oxford University Press: e161–e161.

Li, Heng, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, Goncalo Abecasis, and Richard Durbin. 2009. “The Sequence Alignment/Map Format and Samtools.” Bioinformatics 25 (16). Oxford University Press: 2078–9.

Lomsadze, A., V. Ter-Hovhannisyan, Y.O. Chernoff, and M. Borodovsky. 2005. “Gene identification in novel eukaryotic genomes by self-training algorithm.” Nucleic Acids Research 33 (20): 6494–6506. doi:10.1093/nar/gki937.

Lomsadze, Alexandre, Paul D Burns, and Mark Borodovsky. 2014. “Integration of Mapped Rna-Seq Reads into Automatic Training of Eukaryotic Gene Finding Algorithm.” Nucleic Acids Research 42 (15). Oxford University Press: e119–e119.

Slater, Guy St C, and Ewan Birney. 2005. “Automated Generation of Heuristics for Biological Sequence Comparison.” BMC Bioinformatics 6 (1). BioMed Central: 31.

Stanke, Mario, Mark Diekhans, Robert Baertsch, and David Haussler. 2008. “Using Native and Syntenically Mapped cDNA Alignments to Improve de Novo Gene Finding.” Bioinformatics 24 (5). Oxford University Press: 637–44.

Stanke, Mario, Oliver Schöffmann, Burkhard Morgenstern, and Stephan Waack. 2006. “Gene Prediction in Eukaryotes with a Generalized Hidden Markov Model That Uses Hints from External Sources.” BMC Bioinformatics 7 (1). BioMed Central: 62.

Ter-Hovhannisyan, Vardges, Alexandre Lomsadze, Yury O Chernoff, and Mark Borodovsky. 2008. “Gene Prediction in Novel Fungal Genomes Using an Ab Initio Algorithm with Unsupervised Training.” Genome Research. Cold Spring Harbor Lab, gr–081612.

[^1]: EX = ES/ET/EP/ETP, all available for download under the name GeneMark-ES/ET

[^3]: Please use the latest version of AUGUSTUS distributed by the original developers, it is available from github at https://github.com/Gaius-Augustus/Augustus. Problems have been reported from users that tried to run BRAKER with AUGUSTUS releases maintained by third parties, i.e. Bioconda.

[^6]: install with sudo apt-get install cpanminus
[^8]: GeneMark-EX is not a mandatory tool if AUGUSTUS is to be trained from GenomeThreader aligments with the option –trainFromGth.