Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multi-exon HGVSp parsing #1063

Merged
merged 2 commits into from
Feb 6, 2024

Conversation

nuno-agostinho
Copy link
Contributor

@nuno-agostinho nuno-agostinho commented Jan 18, 2024

ENSVAR-6167

Motivation

When a HGVSp like ENSP00000346614.5:p.D237G contains an amino acid whose codon sequence spans two exons, VEP gets the wrong reference sequence (GAG) instead of the expected (GAT):

MSG: Sequence translated from reference (GAG -> E) does not match input sequence (D)

This occurs because VEP is fetching the sequence from the exon and the intron, instead of skipping the intron sequence:
GAgtaagtggcgtatgtaaaattgtcattctacacaaaaaatcacgagcagagggcaaagtgaaatcgtggctgctttatcattaattttgcatgtgcagcggagagcttgtcctttgtgctctaaatccttgctacaaacggttacataaaagatctaagaaagtggagacaaaggaaggtgggtaaagttagaaggaaaaaaagagctagaaaagtgtgcaagtcacttcatacctgaattcttgacatttgactggaattgttctgattagaccatggtcctcaaggcatttcacagttttttttaagtctgcgctgccttaggggattttatccttgagacatccactggcttaactcaagtttccttcaaaatatgtagctaaatacagctgttcagctaatagctcagaggttctttggagaacaaatggaatgttatttactaatattacttgtggcatgttagcacttttgtgttctgccaagtgcttttgggtccattctcaaagccgccatggctaagctggtagtacgttggcgatggcccatatgggaagtggaagtggtagatcttcaggggactttcaaaatgctttgaatttaactctttcttcccctttattctaattcctagT

Logic

This issue was fixed by:

  • Fetching the cDNA sequence (i.e., ignoring the intron) before checking what are the nucleotide changes between the reference and replacement amino acids
  • Fetching the sequence (including the whole intron) as the reference allele

Testing

Examples of HGVSp where the codon spans two exons:

ENSP00000346614.5:p.Asp237Gly

ENSP00000489869.1:p.V108A
ENSP00000489869.1:p.V108S
ENSP00000489869.1:p.V108I
ENSP00000489869.1:p.H123T
ENSP00000489869.1:p.R136A
 
ENSP00000297350.4:p.G134I
ENSP00000297350.4:p.D273F
 
ENSP00000381693.2:p.P936S

Please also test with other HGVSp examples, such as those listed in https://www.ebi.ac.uk/seqdb/confluence/display/EV/HGVS#HGVS-SupportedHGVS

@nuno-agostinho nuno-agostinho changed the title Fix HGVSp parsing whose codons span two exons Fix HGVSp parsing whose respective codon spans two exons Jan 18, 2024
@nuno-agostinho nuno-agostinho marked this pull request as ready for review January 18, 2024 19:14
@nuno-agostinho nuno-agostinho changed the title Fix HGVSp parsing whose respective codon spans two exons Fix multi-exon HGVSp parsing Jan 18, 2024
@nakib103 nakib103 self-requested a review January 19, 2024 14:11
@nakib103 nakib103 self-assigned this Jan 19, 2024
Copy link
Contributor

@nakib103 nakib103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!~ code change make sense and everything works fine!

@nakib103 nakib103 merged commit 1026793 into Ensembl:postreleasefix/112 Feb 6, 2024
1 check passed
@nakib103
Copy link
Contributor

nakib103 commented Feb 6, 2024

merged to main and postrealasefix/112

@nuno-agostinho nuno-agostinho deleted the fix/hgvsp branch February 6, 2024 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants