X-CAP is a GBT model that predicts the pathogenicity of single-nucleotide stopgain variants. This repository contains the trained model as well as code to run X-CAP on a VCF file of variants.
bin
: the trained X-CAP modeldata
: variants in D_original and D_validation (HGMD variants are only labeled with accession numbers)example
: demo of X-CAP on chromosome 21 variants in D_validationfigures
: code to generate figures/tables in the paperpredictions
: X-CAP predictions for all stopgains in the human proteomesrc
: code to generate X-CAP features and run X-CAP on a VCF file
- Create a Conda environment with necessary requirements.
conda env create --file environment.yml
- Download ANNOVAR.
To run X-CAP on an arbitrary VCF file, use the following command:
python src/run_xcap.py <input_vcf_file> <reference_vcf_file> <output_dir> <annovar_dir>
<input_vcf_file>
: Path to VCF file containing the variants of uncertain significance (VUS) which should be scored. X-CAP will score only those variants annotated as stopgains by ANNOVAR and disregard the rest.<reference_vcf_file>
: Path to VCF file containing a reference set of known stopgain variants. X-CAP uses these to produce features for variants in the<input_vcf_file>
. During evaluation of X-CAP, we used training variants from D_original as the reference set.<output_dir>
: Directory where X-CAP results will be written. X-CAP features will be published to<output_dir>/stopgains.features
and scores to<outptut_dir>/xcap_scores.vcf
.<annovar_dir>
: Directory containing the ANNOVAR program. X-CAP uses ANNOVAR to filter out non-stopgain variants.
An illustrative demo can be found in the examples
subdirectory.