Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Open-CRAVAT is a python package that performs genomic variant interpretation including variant impact, annotation, and scoring. Open-CRAVAT is similar to the original web-based CRAVAT but it can be installed locally and is easy to integrate into bioinformatics pipelines. Also, Open-CRAVAT has a modular architecture with a wide variety of analysis modules that can be selected and installed based on the needs of a given study. The modules are made available via the CRAVAT Store and are developed both by the CRAVAT team and the broader variant analysis community. Open-CRAVAT is a product of the Karchin Lab at Johns Hopkins University in collaboration with In Silico Solutions with funding provided by the National Cancer Institute's ITCR program.
Open-CRAVAT is a modular python package that is available in the pip PyPI repository. It takes a file of genomic variants as input. The most common input format is a VCF file but other formats are supported. The Open-CRAVAT package includes 3 programs:
- cravat runs variant analysis
- cravat_admin configures cravat including getting modules from the CRAVAT Store
- cravat_view is used to interactively visualize, sort, filter, and explore cravat results
The type of analysis performed by cravat is dependent on which annotators have been installed. The CRAVAT Store contains all of the available annotators and can be browsed with the cravat_admin tool. In the near future, we will have a graphical CRAVAT Store in cravat_view program. Some annotators include large reference databases so they take time to install and use considerable disk space. Open-CRAVAT provides several output formats including text reports, Excel spreadsheets, and a SQLite database of results used by cravat_view.
When the cravat program is run, it will execute a series of modules required for variant analysis. First, the appropriate converter will be run to parse the input variant file. Next, a mapper module will determine the transcripts and associated genes affected by each variant including protein impact. Then cravat runs all of the requested/installed annotation modules and after all annotation is complete, an aggregator program collects and collates the results into a SQLite database. Finally, reporter modules are run to produce the requested format of results.
As of 9/24/2018, openCRAVAT has the following annotators available, with more on the way.
- Cancer Genome Census
- Cancer Genome Landscape
- COSMIC Gene
- Gene Ontology
- HGVS Format
- NCBI Gene
- Repeat Sequences
- Annotator template
- 1000 Genomes
To install Open-CRAVAT you need Python 3.5 or newer. There are two steps in installing Open-CRAVAT, installing Open-CRAVAT pip package and installing the base components of Open-CRAVAT which are essential in the operation of Open-CRAVAT.
For Mac OS: We recommend installing Python 3 using the installation file provided at python.org instead of using any other manual way of installing Python 3. After installing Python 3, a new terminal should be opened and used in executing the below commands. Any terminal session which was already open before installing Python 3 will not work properly with open-cravat commands.
For Ubuntu: pip3 provided by apt does not install executables properly. We recommend the following steps before proceeding.
sudo apt remove python-pip if pip3 has already been installed with apt. Then
wget https://bootstrap.pypa.io/get-pip.py and
sudo python3 get-pip.py.
Install Python Package
pip3 install open-cravat
Install Base Components
One of them, hg38 gene mapper, can take ~15 minutes to install.
Open-CRAVAT is now ready to use. With the base components installed, Open-CRAVAT can annotate variants with genes and sequence ontology. If you want more annotation, you can pick and choose annotation modules and use them, as shown in "Install Annotators" section.
If you want, you can test if Open-CRAVAT is working properly using a built-in test input. Make the test input file with
cravat-admin make-example-input .
which will create example_input in the current working directory. If you want to create the file in another directory, replace "." with the path to the directory.
Then, run Open-CRAVAT analysis on the test input file with
If example_input is not in the current working directory, use the full path to example_input instead of just "example_input". The run will create a bunch of files, all with the prefix "example_input.". The final result file is example_input.xlsx which can be opened with Excel.
You can search for annotators to install with the command
cravat-admin ls –a
This also tells you which annotators you have installed.
To install a new one:
cravat-admin install <annotator name>
cravat-admin install clinvar
Depending on the size of the data for the annotator, it may take some time to download and install.
To run your analysis you then can just type:
cravat <input file>
This command has various command line options you can see by typing cravat –h. By default, it will create text, excel, and SQLite output in the current directory and will run all of the installed annotators. Command line options can be used to select specific output or to run a subset of the installed annotators.
Open-CRAVAT Documentation Pages
For variant interpretation methods developers, the following pages describe how to package your method as an Open-CRAVAT annotation module and how to publish it to make it available to all Open-CRAVAT users.