Work supporting the comparison of SnpEff and VEP effect prediction and HGVS identifiers
Jupyter Notebook
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.dockerignore VEP annotation pipeline Feb 19, 2017
.gitignore Snpeff Annotation Pipeline Feb 19, 2017
1kg-job-snpeff.yml
1kg-job-vep.yml Repo cleanup Mar 3, 2017
CFTR_artifical_variant.vcf.gz Adds cftr vcf Mar 3, 2017
Dockerfile.snpeff
Dockerfile.vep VEP annotation pipeline Feb 19, 2017
LICENSE Initial commit Feb 19, 2017
README.md Adds mismatch vcfs Mar 15, 2017
Variant-Annotation-Comparsion-2017.ipynb Show hgvs matches on canonical tx Mar 15, 2017
cftr-job-snpeff.yml Repo cleanup Mar 3, 2017
cftr-job-vep.yml Repo cleanup Mar 3, 2017
effect_mismatch.vcf.gz Adds mismatch vcfs Mar 15, 2017
gunzip.cwl Snpeff Annotation Pipeline Feb 19, 2017
hgvs_mismatch.vcf.gz Adds mismatch vcfs Mar 15, 2017
impact_mismatch.vcf.gz Adds mismatch vcfs Mar 15, 2017
requirements.txt Repo cleanup Mar 3, 2017
snpeff-workflow.cwl Repo cleanup Mar 3, 2017
snpeff.cwl
start.sh Snpeff Annotation Pipeline Feb 19, 2017
vep-workflow.cwl Repo cleanup Mar 3, 2017
vep.cwl
vep.ini

README.md

variant-annotation-comparison-2017

In 2014, I did some benchmarking of variant effect prediction algorithms. You can read about that in this blog post.

I wanted to follow up on that work and see how the current batch of algorithms are preforming. VEP and SnpEff are the most commonly used algorithms these days, so I limited my analysis to them.

I used the same input VCF that I created originally. It contains all snps, all 1 base pair insertions and deletions, and 2 possible 2 and 3 base pair insertions and deletions at all locations spanning the CFTR gene with 100bp margins on either side.

You can annotate this vcf with each algorithm using the CWL scripts provided.

For VEP:

docker build --tag="andrewjesaitis/vep" -f Dockerfile.vep .
cwltool vep-workflow.cwl cftr-job-vep.yml

Similarly for SnpEff:

docker build --tag="andrewjesaitis/snpeff" -f Dockerfile.snpeff .
cwltool snpeff-workflow.cwl cftr-job-snpeff.yml

Then you can open the Jupyter Notebook and rerun all cells.

Otherwise just skip to the punchline and open the notebook on Github.

I've written up a discussion of the results and dug into some particularly troublesome variants on my blog.

I've also added the gzip'd vcf that highligh some mismatches. Note that these vcfs contain repeated variants (since I am outputing a single variant-transcript pair per line). The keys in the INFO field are self documenting. These files are impact_mismatch.vcf.gz, effect_mismatch.vcf.gz, and hgvs_mismatch.vcf.gz.