Regression-based annotation of protein-coding sequences from ribosome profiling data
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


ORF-RATER (Open Reading Frame - Regression Algorithm for Translational Evaluation of Ribosome-protected footprints) comprises a series of scripts for coding sequence annotation based on ribosome profiling data.

The software was created at Jonathan Weissman's lab at UCSF and is described in Fields, Rodriguez, et al., "A regression-based analysis of ribosome-profiling data reveals a conserved complexity to mammalian translation", Molecular Cell 60, 816-827 (2015).

Usage information can be found in the Detailed Protocol included in the paper's supplemental materials, or by running each script with the --help/-h flag.

Required packages include numpy, scipy, pysam, biopython, pandas, tables, scikit-learn, pybedtools, and plastid, all of which are available through PyPI.

Some features require the multiisotonic package, which must be downloaded manually. Multiisotonic additionally requires python-igraph.

Transcripts must be presented in UCSC's BED12 format. The most reliable method I've found to convert from GTF to BED12 involves first converting to genePred format, making use of UCSC's "gtfToGenePred" and "genePredToBed" scripts, which are available here. The full command is gtfToGenePred INPUT_GTFFILE.gtf stdout | genePredToBed stdin OUTPUT_BEDFILE.bed. Similarly, a BED file can be converted to a GTF using the command bedToGenePred INPUT_BEDFILE.bed stdout | genePredToGtf file stdin OUTPUT_GTFFILE.gtf.

Contact Alex Fields for further information or assistance.