VEP annotator based on TinDaisy-Core and GenomeVIP
this branch generates annotation with different VEP versions, including
- VEP v95
- VEP v99 / GENCODE v33
- VEP v100 / GENCODE v34
- VEP v102 / GENCODE v36
- VEP v109
Each has its own docker image tag
IMAGE_V99="mwyczalkowski/vep-annotate:20220505-v99"
IMAGE_V100="mwyczalkowski/vep-annotate:20220505-v100"
IMAGE_V102="mwyczalkowski/vep-annotate:20220505-v102"
IMAGE_V95="mwyczalkowski/vep-annotate:20230530-v95"
IMAGE_V109="mwyczalkowski/vep-annotate:20230530-v109"
This tool allows the use of either a VEP cache or online VEP database lookups, and allows the "staging" of a compressed version of the VEP cache for cromwell applications.
It is recommended to use a VEP cache in production runs for better performance. However, online VEP lookups allow for VEP annotation without having to install a large VEP cache file. Which mode is used depends on how VEP annotation is invoked and the arguments passed.
When src/vep_annotate.sh
is invoked, it accepts arguments cache_dir
and cache_gz
.
- If cache_dir is defined, it indicates location of VEP cache
- this will provide fastest execution because cache is available and does not need to be staged
- if cache_dir is not defined, and cache_gz is defined, the contents of
cache_gz
will be staged- Staging will extract cache_gz contents into "./vep-cache" and use VEP cache
- this is required for cromwell runs which do not stage directories
- if neither cache_dir nor cache_gz are defined, perform online VEP DB lookups
- Online VEP database lookups a) uses online database (so cache isn't installed) b) does not use tmp files It is meant to be used for testing and lightweight applications.
For CWL invocation cache_dir
is not a supported option, because Cromwell does not copy directories; this may
change in the future or for different CWL engines. For VEP cache to be used (highly recommended), the
contents of the VEP cache, created with tar -zcf
, must be provided.
VEP Cache creation is described in install/README.md
Arbitrary arguments may be passed to VEP with --vep_opts
argument to src/vep_annotate.sh
.
VEP arguments will in general differ for different pipelines, with TinDaisy and TinJasmine using different values.
At this time, --vep_opts
is hard-coded in the CWL, requiring different CWL for the two pipelines. Future development
may make this more general.
Note also that to use GRCh37, the following additional VEP argument is required: --port 3337
The VEP argument --flag_pick
is always added to VEP invocation.
Support for VEP custom annotation is provided with the following two arguments:
--custom_filename s: Path to VEP custom annotation file.
--custom_args s: Arguments passed to VEP custom annotation. Required if --custom_filename defined.
Example of custom_args
: ClinVar,vcf,exact,0,CLNSIG,CLNREVSTAT,CLNDN
ClinVar VEP custom annotation file can be obtained here: ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/
Note that the associated index .tbi
file must also be provided
Custom annotation is required for identifying ClinVar variants.
- Matthew Wyczalkowski m.wyczalkowski@wustl.edu
- Song Cao scao@wustl.edu
- Jay Mashl rmashl@wustl.edu