Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] | HGVS addition to pipeline #8

Open
Fatimabp opened this issue Sep 7, 2021 · 4 comments
Open

[FEATURE] | HGVS addition to pipeline #8

Fatimabp opened this issue Sep 7, 2021 · 4 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@Fatimabp
Copy link

Fatimabp commented Sep 7, 2021

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered, if any
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@Fatimabp Fatimabp added the enhancement New feature or request label Sep 7, 2021
@G-kodes
Copy link
Member

G-kodes commented Sep 7, 2021

There is a designated python package for generating accurate HGVS notation. Ideally, I would like to implement that and use the generated HGVS notations to query the E! Ensemble database (This provides the universally unique guarantee of HGVS notation when querying variants). This would also facilitate the ALFA project integration mentioned in #9
https://hgvs.readthedocs.io/en/stable/index.html

@G-kodes G-kodes changed the title HGVS addition to pipeline [FEATURE] | [FEATURE] | HGVS addition to pipeline Sep 7, 2021
@G-kodes
Copy link
Member

G-kodes commented Sep 7, 2021

It has been brought to my attention that the HGVS python package is not compatible with Windows yet. The creators and maintainers of a dependency package have no intentions to make their package compatible with windows, however, it is an optional dependency, so a new roadmap item has been registered to mark this dependency accordingly so that it does not break on windows. until then, we will have to write this code on a Linux machine in order to debug it.

biocommons/hgvs#522

@G-kodes
Copy link
Member

G-kodes commented Sep 8, 2021

I have performed a proof-of-concept test on a Linux machine using variant rs2259219 as a test reference. Using the following information:

Start Coordinate: 40843345
Stop Coordinate: 40843345
Reference Allele: C
Alternate Allele: C
Transcript ID: NC_000019.10
Transcript Type g (Genomic)

I managed to compile NC_000019.10:g.40843345C>G which matches the notation provided by E! Ensemble. The next issue is making sure that during our querying, we have access to all this information to be able to construct HGVS notation per variant and set that as our new IDs.

@G-kodes G-kodes added this to To do in Pharmacogenetics Pipeline BETA-Release via automation Sep 14, 2021
@G-kodes G-kodes moved this from To do to In progress in Pharmacogenetics Pipeline BETA-Release Sep 14, 2021
@G-kodes G-kodes moved this from In progress to To do in Pharmacogenetics Pipeline BETA-Release Sep 14, 2021
@G-kodes G-kodes added this to the BETA Release milestone Sep 14, 2021
@Fatimabp
Copy link
Author

Hi guys, been reading up on HGVS nomenclature. Just a few issues I was concerned with. Although HGVS is the most accurate way for representing variants, there does seem to be some issues because of the way things are named.

  1. We have to ensure we using the correct version numbers. Some papers express HGVS with gene names which would make it difficult to identify the protein isoform or version they referring to.
  2. Repeat shifting: VCF deletions of repeats are shifted left but with HGVS they are shifted right. If a variant is referred to by two different locations it might not be identified as the same variant.
  3. The rules for HGVS nomenclature get insanely complicated to interpret and write, especially for introns and non-coding regions.
    Let me know what you guys think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

2 participants