OpenCRAVAT is a python package that performs genomic variant interpretation including variant impact, annotation, and scoring. OpenCRAVAT is similar to the original web-based CRAVAT but it can be installed locally and is easy to integrate into bioinformatics pipelines. Also, OpenCRAVAT has a modular architecture with a wide variety of analysis modules that can be selected and installed based on the needs of a given study. The modules are made available via the CRAVAT Store and are developed both by the CRAVAT team and the broader variant analysis community. OpenCRAVAT is a product of the Karchin Lab at Johns Hopkins University in collaboration with In Silico Solutions with funding provided by the National Cancer Institute's ITCR program.
OpenCRAVAT is a modular python package that is available in the pip PyPI repository. It takes a file of genomic variants as input. The most common input format is a VCF file but other formats are supported including dbSNP identifiers, 23&Me and Ancestry.com file formats.
The analysis performed by OpenCRAVAT depends upon user-selected annotation and visualization options, available for download from the free OpenCRAVAT Store. In addition to the interactive user interface, OpenCRAVAT provides several output formats including text reports, Excel spreadsheets, and a SQLite database of results used by cravat_view.
When the cravat program is run, it will execute a series of modules required for variant analysis. First, the appropriate converter will be run to parse the input variant file. Next, a mapper module will determine the transcripts and associated genes affected by each variant including protein impact. Then cravat runs all of the requested/installed annotation modules and after all annotation is complete, an aggregator program collects and collates the results into a SQLite database. Finally, reporter modules are run to produce the requested format of results.
As of 8/28/2019, openCRAVAT has the following annotators available, with more on the way.
- Gene-level annotators: BioGRID, Cancer Gene Census, Cancer Gene Landscape, CIViC Gene, COSMIC Gene, Essential Genes, ExAC Gene and CNV, gnomAD, GTEx, HGVS Format, InterPro, LINSIGHT, MuPIT, gnomAD Gene, Gene Ontology, HGDP, IntAct, LoFtool, MuPIT, NCBI Gene, NDEx, P(rec), p(HI), PubMed, RVIS, TARGET, UniProt, VEST
- Variant-level annotators: ABraOM, BRCA1 Multiplex Assay, CHASMplus, CIViC, ClinVar, COSMIC, dbSNP, denovo-DB, ESP6500, FATHMM, Flanking Sequence, GERP++, GHIS, GRASP, GWAS Catalog, Mutation Assessor, MutPred, ncRNA, Phast Cons, PhD-SNPg, PhyloP, Promoter IR, Pseudogene, Repeat Sequences, REVEL, SiPhy, 1000 Genomes, 1000 Genomes-Ad Mixed American, 1000 Genomes-African, 1000 Genomes-East Asian, 1000 Genomes-European, 1000 Genomes-South Asian, UK10k Cohorts, VEST, VISTA Enhancer Browser
In most cases, OpenCRAVAT can process approximately 1 million variants per hour. This estimate assumes that 2/3 of the input variants are in coding regions, approximately ten annotation modules, and the system is running on an at least 4 year old laptop with a solid state drive. Runtimes depend heavily on disk speed. A mechanical hard drive will perform about 1/3 to 1/4 as well as an SSD. Most modern processors are equivalent since the disk will bottleneck annotation speed before the processor. However, processors with fewer than four cores may see reduced runtimes. Memory size is not typically a limitation.
For a simple introduction to running OpenCRAVAT, please consult the Quickstart guide.
How to cite
Masica, D. L., Douville, C., Tokheim, C., Bhattacharya, R., Kim, R., Moad, K., ... & Karchin, R. (2017). CRAVAT 4: cancer-related analysis of variants toolkit. Cancer research, 77(21), e35-e38.
OpenCRAVAT users are encouraged to cite individual annotations used in their study analysis.