Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HGVS to VCF Output for HGVS nomenclature with Duplication #67

Closed
Mchhu opened this issue Nov 21, 2023 · 1 comment
Closed

HGVS to VCF Output for HGVS nomenclature with Duplication #67

Mchhu opened this issue Nov 21, 2023 · 1 comment

Comments

@Mchhu
Copy link

Mchhu commented Nov 21, 2023

Using the example code to convert HGVS to VCF, when I put in a HGVS transcript with a duplication, the output for the ref and alt is incorrect. I have tested this with other HGVS nomenclature and this seems to only be affect variants that has a duplication. Below are a couple of examples with the input and outputs. I'm not sure what the problem is here.

Input: NM_021619.3:c.1071_1076dup
Output: ('NC_000009.11', 133557022, 'C', 'C')

Input: NM_181523.3:c.1299+80_1425+64dup
Output: ('NC_000005.9', 67589390, 'A', 'A')

import pyhgvs
from pysam.libcfaidx import FastaFile
from cdot.pyhgvs.pyhgvs_transcript import JSONPyHGVSTranscriptFactory, RESTPyHGVSTranscriptFactory

genome = FastaFile("/cdot/GCF_000001405.25_GRCh37.p13_genomic.fna")
factory = RESTPyHGVSTranscriptFactory()
pyhgvs.parse_hgvs_name('NM_021619.3:c.1071_1076dup', genome, get_transcript=factory.get_transcript_grch37)
@davmlaw
Copy link
Contributor

davmlaw commented Nov 27, 2023

Hi, this is due to a bug in pyhgvs, see counsyl/hgvs#50

I made a pull request in March 2020 but it hasn't been accepted, there is no activity on the project and I think it has been abandoned. Using my code, I get:

In [4]: pyhgvs.parse_hgvs_name('NM_021619.3:c.1071_1076dup', genome, get_transcript=factory.get_transcript_grch37)
Out[4]: ('NC_000009.11', 133556992, 'T', 'TCGCCGC')

In [5]: pyhgvs.parse_hgvs_name('NM_181523.3:c.1299+80_1425+64dup', genome, get_transcript=factory.get_transcript_grch37)
Out[5]: 
('NC_000005.9',
 67589388,
 'A',
 'AGATGAATTACATGTAATCAAGTCTAAAAAACTTGACACTCGTAATTACATAATTGCAATTTTAAAGATGTTTCCATGTCAGCTATTTTGTTAAACAATTGTTATTTGATTAAATACCTTATCCATTGAATTTATTTTAATCTTTCTAGGATCAAGTTGTCAAAGAAGATAATATTGAAGCTGTAGGGAAAAAATTACATGAATATAACACTCAGTTTCAAGAAAAAAGTCGAGAATATGATAGATTATATGAAGAATATACCCGCACATCCCAGGTGAGTTTTCTATGAAAATCAGATTAAAAAATAAGAGTTCTAAACTTTTAAAGACTAACATG')

I think your options are:

  • Use biocommons HGVS
  • Use my fork of pyHGVS ie python3 -m pip install git+https://github.com/SACGF/hgvs#egg=pyhgvs - warning: I have abandoned my fork as I have moved to Biocommons HGVS

@davmlaw davmlaw closed this as completed Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants