Skip to content

MAB-Lab/OpenVar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenVar: annotation of genetic variants using deep genome annotation

OpenVar is a deep genome annotation tool. OpenVar currently supports usual annotations (Ensembl and NCBI RefSeq), as well as the OpenProt deep open reading frame (ORF) annotation (www.openprot.org). However, the package can be used with any genome annotation if supplemented with the adequate SnpEff database (see http://pcingola.github.io/SnpEff/se_buildingdb/ for information on how to build a custom SnpEff database).

Installation of OpenVar

To install the OpenVar package, you will need to clone this repository on your computer. On the main page of this repository, click the Clone button. Click on HTTPS to clone with https, or SSH to clone with ssh, and copy the link. On your computer, go to the desired directory and type: git clone [the link of the repository]

For more information on how to clone a github repository, see https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository-from-github/cloning-a-repository

Using OpenVar

Depending on your architecture, you may want to edit the paths for the genome files at the beginning of the openvar.py file. All the necessary data for OpenVar to run are within the data folder of this repository.

Once you have edited the necessary paths, you can start using OpenVar in python. Below are basic commands to get you started, but see below for a detailed description of each class.

First, you will need to import openvar: from OpenVar.openvar import *

Then, you can create a SeqStudy object with the following command: vcf = SeqStudy(data_dir = 'path/to/data/directory', file_name = 'filename.vcf', results_dir = 'path/to/results/directory', study_name = 'studyname', specie = 'human', genome_version = 'hg38', annotation = 'OP_Ensembl', picard_path = 'path/to/picard/directory')

Then, create an OpenVar object with the following command: opv = OpenVar(snpeff_path = 'path/to/snpeff/', vcf = vcf)

Then, to run OpenVar with a classical annotation (Ensembl or NCBI RefSeq), use the following command: opv.run_snpeff(vcf.single_vcf_path, vcf.annotation) If using the OpenProt annotation, use the following command: opv.run_snpeff_parallel_pipe()

Then, generate a report with the following command: opvr = OPVReport(opv) opvr.aggregate_annotated_vcf() opvr.write_tabular()

If using a classical annotation (Ensembl or NCBI RefSeq), you can generate report statistics and figures with the following command: opvr.compute_summary_stats() If using the OpenProt annotation, use the following command: opvr.compute_chrom_gene_level_stats(write_summary_pkl = True)

The last command will generate a pickle object identical to the one generated when using the OpenVar web-based application. This allow you to quickly go back to previous analyses and see general statistics as presented on the Results page of the OpenVar web-based application. To load the pickle object, simply run the following command: import pickle summary_path = 'path/to/summary/pickle' pickle.load(open(summary_path, 'rb'))

If you have any question regarding OpenVar, don't hesitate to contact us: https://openprot.org/p/ng/contactUs

Input file format

An example input file can be found here The expected input format is a Variant Call Format (VCF. It is the de facto standard file format for genomic variants. Other formats should be converted to a VCF format, below are a few examples to run on a shell.

BED files

BED input files can be converted using PLINK with the following command: plink --bfile [filename] --recode vcf --out [vcf name]

dbSNP identifiers

In order to produce a VCF input from a list of dbSNP identifiers, download the VCF file containing all variants within dbSNP here Then use the following command: grep -wFf dbsnp_id_list.txt my_vcf.vcf > /path_to_output_folder

About

Deep genome annotation tool

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages