Skip to content
Brent Pedersen edited this page Jul 25, 2023 · 4 revisions

TLDR

slivar expr \
    --js js/csq.js \
    --info "INFO.gnomad_af < 0.05 && CSQs(INFO.CSQ, VCF.CSQ, ['SIFT']).some(function(csq) { return csq.CONSEQUENCE == 'missense' && csq.SIFT < 0.05 })"
    ...

Using CSQ information in slivar

The consequence field in the VCF is an unstructured, "|"-delimited field that contains transcript specific information about a variant. Most commonly, it indicates the effect (consequence) of the variant on each transcript--such as missense, stop_gain, etc.

slivar contains javascript in js/csq.js to facilitate working with these. That code can be concatenated with js/slivar-functions.js or your own javascript to provide the following functionality.

Note that:

  1. There will be a performance hit for using this so it's best to put it at the end of the --info expression so it will only have a cost when the other expressions pass.
  2. There will often be multiple CSQs for each variant.

The user must pass INFO.CSQ (or INFO.BCSQ or INFO.ANN) and VCF.CSQ (or VCF.BCSQ ...) which contains the list of field-names present in the each consequence, to the CSQs function as follows:

CSQs(INFO.CSQ, VCF.CSQ, [])

where the final argument (here []) is an array of fields that should be converted from String to Number--this will likely include any allele-frequencies along with scores such as SIFT but can be empty if the user does not need to access any of the numeric fields.

CSQs returns an array of CSQ objects. A CSQ object is simply a javascript object with keys as defined in the CSQ header and values from that particular variant, so given a CSQ header from VEP like this:

##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as predicted by VEP. Format: \
    Consequence|Codons|Amino_acids|Gene|SYMBOL|Feature|EXON|PolyPhen|SIFT|Protein_position|BIOTYPE">

The keys of the CSQ object are (NOTE all caps for keys): ['CONSEQUENCE', 'CODONS', 'AMINO_ACIDS', 'GENE', 'SYMBOL', 'FEATURE', 'EXON', 'POLYPHEN', 'SIFT', 'PROTEIN_POSITION', 'BIOTYPE']. So we can access, e.g. csq.GENE. Given a variant with a CSQ field like:

CSQ=upstream_gene_variant|||ENSG00000223972|DDX11L1|ENST00000456328|||||processed_transcript,\ # newline added for clarity
  downstream_gene_variant|||ENSG00000227232|WASH7P|ENST00000488147|||||unprocessed_pseudogene

(NOTE there are multiple transcripts separated by ",") then csq.GENE would give DDX11L1 for the first transcript and WASH7P for the 2nd transcript.

It's likely we want to check that some of the CSQs for each variant meet a criteria. In javascript, we can do this as:

my_csqs.some(check_fn)

where check_fn is a function that accepts a single CSQ object:

function check_fn(csq) {
    return csq.CONSEQUENCE == 'missense' && csq.SIFT < 0.05
}

Or we can put it all into a single expression with an anonymouse function to send to slivar:

slivar expr \
    --js js/csq.js \
    --info "INFO.gnomad_af < 0.05 && CSQs(INFO.CSQ, VCF.CSQ, ['SIFT']).some(function(csq) { return csq.CONSEQUENCE == 'missense' && csq.SIFT < 0.05 })"
    ...