- annotate variants with gnomad allele frequencies from combined exomes + whole genomes at > 30K variants/second using only a 1.5GB compressed annotation file.
- call denovo variants with a simple expression that uses mom, dad, kid labels that is applied to each trio in a cohort (as inferred from a pedigree file).
kid.het && mom.hom_ref && dad.hom_ref && kid.DP > 10 && mom.DP > 10 && dad.DP > 10
- define and filter on arbitrary groups with labels. For example, 7 sets of samples each with 1 normal and 3 tumor time-points:
normal.AD = 0 && tumor1.AB < tumor2.AB && tumor2.AB < tumor3.AB
- filter variants with simple expressions:
variant.call_rate > 0.9 && variant.FILTER == "PASS" && INFO.AC < 22 && variant.num_hom_alt == 0
- see using slivar for rare disease research
slivar has sub-commands:
- expr: filter and/or annotate with INFO, trio, sample, group expressions
- make-gnotate: make a compressed zip file of annotations for use by slivar
- compound-hets: true compound hets using phase-by-inheritance within gene annotations
Table of Contents
- How it works
- Gnotation Files
get the latest binary from: https://github.com/brentp/slivar/releases/latest
slivar_static does not depend on any libraries and should work on any 64 bit linux system.
slivar_shared will require libhts.so (from htslib) to be in the usual places or in a directory indicated in
or use via docker from: brentp/slivar:latest
expr allows filtering on (abstracted) trios and groups. For example, given a VCF (and ped/fam file) with
slivar will apply an expression with
dad identifiers to each trio that it automatically
expr can also be used, for example to annotate with population allele frequencies from a
gnotate file without
any sample filtering. See the wiki for more detail and the gnotate
section for gnotation files that we distribute for
when --trio is used,
slivar finds all trios in a VCF, PED pair and let's the user specify an expression with indentifiers
dad that is applied to each possible trio. For example, a simple expression to call
de novo variants:
variant.FILTER == 'PASS' && \ # variant.call_rate > 0.95 && \ # genotype must be known for most of cohort. INFO.gnomad_af < 0.001 && \ # rare in gnomad (must be in INFO [but see below]) kid.het && mom.hom_ref && dad.hom_ref && \ # also unknown kid.DP > 7 && mom.DP > 7 && dad.DP > 7 && \ # sufficient depth in all (mom.AD + dad.AD) == 0 # no evidence for alternate in the parents
This requires passing variants that are rare in gnomad that have the expected genotypes and do
not have any alternate evidence in the parents. If there are 200 trios in the
ped::vcf given, then this expression
will be tested on each of those 200 trios.
slivar does not give direct access to the genotypes, instead exposing
unknown or via
alts where 0 is homozygous reference, 1 is heterozygous, 2 is
homozygous alternate and -1 when the genotype is unknown. It is recommended to decompose a VCF before sending to
Here it is assumed that
trio_autosomal_recessive is defined in
slivar-functions.js; an example implementation of that
and other useful functions is provided here
trio is a special-case of a
group that can be inferred from a pedigree. For more specialized use-cases, a
group can be
specified. For example we could, instead of using
--trio, use a
group file like:
#kid mom dad sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9
Where, here we have specified 3 trios below a header with their "labels". This can be accomplished using
--trio, but we can
for example specify quartets like this:
#kid mom dad sibling sample1 sample2 sample3 sample10 sample4 sample5 sample6 sample11 sample7 sample8 sample9 sample12
sample10 will be available as "sibling" in the first family and an expression like:
kid.alts == 1 && mom.alts == 0 && dad.alts == 0 and sibling.alts == 0
could be specified and it would automatically be applied to each of the 3 families.
Another example could be looking at somatic variants with 3 samples, each with a normal and 4 time-points of a tumor:
#normal tumor1 tumor2 tumor3 tumor4 ss1 ss8 ss9 ss10 ss11 ss2 ss12 ss13 ss14 ss15 ss3 ss16 ss17 ss18 ss19
where, again each row is a sample and the ID's (starting with "ss") will be injected for each sample to allow a single expression like:
normal.hom_ref && normal.DP > 10 \ && tumor1.AB > 0 \ && tumor1.AB < tumor2.AB \ && tumor2.AB < tumor3.AB \ && tumor3.AB < tumor4.AB
to find a somatic variant that has increasing frequency (AB is allele balance) along the tumor time-points. More detail on groups is provided here
Users can specify a boolean expression that is tested against each
sample using e.g.:
--sample-expr "hi_quality:sample.DP && sample.GQ > 10"
Each sample that passes this expression will be have its sample id appended to the INFO field of
is added to the output VCF.
Users can make their own
gnotate files like:
slivar make-gnotate --prefix gnomad \ --field AF_popmax:gnomad_popmax_af \ --field nhomalt:gnomad_num_homalt \ gnomad.exomes.r2.1.sites.vcf.gz gnomad.genomes.r2.1.sites.vcf.gz
this will pull
nhomalt from the INFO field and put them into
The resulting zip file will contain the union of values seen in the exome and genomes files with the maximum value for any intersection.
Note that the names (
gnomad_num_homalt in this case) should be chosen carefully as those will be the names added to the INFO of any file to be annotated with the resulting
More information on
make-gnotate is in the wiki
This command is used to find compound heterozygous variants (with phasing-by-inheritance) in trios. It is used after filtering to rare(-ish) heterozygotes.
See a full description of use here
This command is used to convert a filtered and annotated VCF to a TSV (tab-separated value file) for final examination. An example use is:
slivar tsv -p $ped \ -s denovo -s x_recessive \ -c CSQ \ -i gnomad_popmax_af -i gnomad_nhomalt \ -g gene_desc.txt -g clinvar_gene_desc.txt \ $vcf > final.tsv
x_recessive indicate the INFO fields that contain lists of samples (as added by slivar) that should be extracted.
gnomad_nhomalt are pulled from the INFO field.
-c arugment (CSQ) tells
slivar that it can get gene, transcript and impact information from the CSQ field in the INFO.
-g arguments are tab-delimited files of gene -> description where the description is added to the text output for quick inspection.
slivar tsv without any arguments for examples on how to create these for pLI and clinvar.
Also see the wiki
slivar duo-del finds structural deletions in parent-child duos using non-transmission of alleles. this can work to find deletions in exome data using genotypes, thereby avoiding the problems associated with depth-based CNV calling in exomes.
anything in the INFO is available as e.g. INFO.CSQ
if FORMAT.AB is not present, it is added so one can filter with kid.AB > 0.25 && kid.AB < 0.75
variant attributes are:
calculated variant attributes include:
sample attributes (via
dad) included in the FORMAT. available as e.g.
sample attributes for
unknownwhich are synonums for
sample.altsof 0, 1, 2, -1 respectively.
sample attributes from the ped for
idare available as, e.g. kid.sex.
sample relations are available as
dadwill be undefined if not available and kids will be an empty array.
How it works
For each variant, each trio (and each sample), it fills the appropriate
attributes. This can be intensive for
VCFs with many samples, but this is done as efficiently as possible such that
slivar can evaluate 10's of
thousand of variants per second even with dozens of trios.
slivar outputs a summary table with rows of samples and columns of expression where each value
indicates the number of variants that passed the expression in each sample. By default, this goes to STDOUT
but if the environment variable
SLIVAR_SUMMARY_FILE is set,
slivar will write the summary to that file
Users can create their own gnotation files with
slivar make-gnotate, but we provide:
gnomad for hg37 with AF popmax, numhomalts (total and controls only) here
gnomad for hg38 with AF popmax, numhomalts (updated in release v0.1.2) here
spliceai scores (maximum value of the 4 scores in spliceai) here
The available fields can be seen with, for example:
$ unzip -l gnomad.hg38.v2.zip | grep -oP "gnotate-[^.]+" | sort -u gnotate-gnomad_nhomalt gnotate-gnomad_nhomalt_controls gnotate-gnomad_popmax_af gnotate-gnomad_popmax_af_controls gnotate-variant
INFO.gnomad_popmax_af_controls will be
the fields after they are added to the INFO.