Skip to content

Add MechPredict plugin#772

Merged
jamie-m-a merged 27 commits intoEnsembl:mainfrom
ainefairbrother:MechPredict-devel
Jun 5, 2025
Merged

Add MechPredict plugin#772
jamie-m-a merged 27 commits intoEnsembl:mainfrom
ainefairbrother:MechPredict-devel

Conversation

@ainefairbrother
Copy link
Contributor

@ainefairbrother ainefairbrother commented Feb 5, 2025

JIRA ticket: ENSVAR-6662

Description

This PR adds the MechPredict plugin, which annotates missense variants with one of predicted gene-level mechanisms:

  • Dominant-negative (DN)
  • Gain-of-function (GOF)
  • Loss-of-function (LOF)

MechPredict does this by reading in gene-level probabilities predicted by an external model and assigning the most likely mechanism based on empircally-derived cut-offs described in the related manuscript. For example, if gene A has the following probability values: DN = 0.2, GOF = 0.3, LOF = 0.9, then the returned interpretation would be "gene_predicted_as_associated_with_loss_of_function_mechanism".

Notes

  • New VEP fields added by plugin
    • MechPredict_pDN: Numeric
    • MechPredict_pGOF: Numeric
    • MechPredict_pLOF: Numeric
    • MechPredict_interpretation: Character
  • The plugin only annotates transcript-variant pairs with missense_variant as the consequence. This is because the methods used by the authors to generate the predictions was optimised to assess missense mutations, the most common protein-altering mutations.
  • The plugin reads in MechPredict_input.tsv which can be generated using instructions in the module's header.
  • There is a known exception found during testing:
    • The 'test with 50 missense variants - should annotate all' test will annotate 49 variants only. I believe this is to do with VEP's most severe consequence functionality - if a variant-transcript pair has >1 consequence, VEP will assign the more severe one.
    • As such, in the case below, start_lost is assigned over missense, and so missense is removed as a consequence and is thus not annotated by MechPredict.

Testing

Test with 50 missense variants - should annotate all

# run vep with MechPredict
./vep --input_file /hps/software/users/ensembl/variation/fairbrot/data/test-data/clinvar_20210102_missense_50.vcf.gz \
--output_file /hps/software/users/ensembl/variation/fairbrot/MechPredict/MechPredict_test_missense_out.vcf \
--format vcf \
--vcf \
--dir_plugins /hps/software/users/ensembl/variation/fairbrot/VEP_plugins \
--plugin MechPredict,file=/nfs/production/flicek/ensembl/variation/data/MechPredict/MechPredict_input.tsv \
--offline \
--cache \
--cache_version 113 \
--dir_cache /nfs/production/flicek/ensembl/variation/data/VEP/tabixconverted \
--assembly GRCh38 \
--fasta /nfs/production/flicek/ensembl/variation/data/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

# check output - are the MechPredict fields included?
cat /hps/software/users/ensembl/variation/fairbrot/MechPredict/MechPredict_test_missense_out.vcf | \
    grep -v "^#" | \
    grep "_mechanism" | 
    wc -l

Test with 50 intron variants - should annotate none

# run vep with MechPredict
./vep --input_file /hps/software/users/ensembl/variation/fairbrot/data/test-data/clinvar_20210102_intron_50.vcf.gz \
--output_file /hps/software/users/ensembl/variation/fairbrot/MechPredict/MechPredict_test_intron_out.vcf \
--format vcf \
--vcf \
--dir_plugins /hps/software/users/ensembl/variation/fairbrot/VEP_plugins \
--plugin MechPredict,file=/nfs/production/flicek/ensembl/variation/data/MechPredict/MechPredict_input.tsv \
--offline \
--cache \
--cache_version 113 \
--dir_cache /nfs/production/flicek/ensembl/variation/data/VEP/tabixconverted \
--assembly GRCh38 \
--fasta /nfs/production/flicek/ensembl/variation/data/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

# check output - are the MechPredict fields included?
cat /hps/software/users/ensembl/variation/fairbrot/MechPredict/MechPredict_test_intron_out.vcf | \
    grep -v "^#" | \
    grep "_mechanism" | 
    wc -l

@jamie-m-a jamie-m-a self-assigned this Feb 14, 2025
Copy link
Member

@sarahhunt sarahhunt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congratulations on your first plugin @ainefairbrother !

I spotted a couple of typos and places where we can make the information we are supplying clearer.
There are also optimisations we can make by changing data structures; let me know if it's useful to talk about these.

Copy link
Contributor

@jamie-m-a jamie-m-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jamie-m-a jamie-m-a merged commit 8ff05ff into Ensembl:main Jun 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants