Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong HGVSc when using RefSeq transcripts #1064

Open
jpuntomarcos opened this issue Oct 13, 2021 · 5 comments
Open

Wrong HGVSc when using RefSeq transcripts #1064

jpuntomarcos opened this issue Oct 13, 2021 · 5 comments
Assignees

Comments

@jpuntomarcos
Copy link

Describe the issue

HGVSc annotations are not correct when the NM sequence differs from the GRCH37 sequence. For example, If I use VEP online (GRCH37) to query 11:64572018_A/C variant with refseq transcripts, I get:
image

HGVSp and codons are correct: In this position, the NM transcript contains a G and the alternate allele provided is a T (-strand), so predicted GCA/TCA codons are right (and the HGVSp also). However, the HGVSc notation seems wrong: NM_000244.3:c.1636A>T is actually the one we can obtain when annotating against Ensembl transcripts or using the --use_given_ref parameter to use the ref allele provided by the user.

Thanks :)

Additional information

System

Full VEP command line

./vep --appris --buffer_size 500 --check_existing --distance 5000 --hgvs --mane --polyphen b --refseq --regulatory --sift b --species homo_sapiens --symbol --transcript_version --tsl --cache --input_file [input_data] --output_file [output_file] --port 3337
@dglemos dglemos self-assigned this Oct 13, 2021
@dglemos
Copy link
Contributor

dglemos commented Oct 13, 2021

Hi @jpuntomarcos,
In cases where RefSeq transcripts do not match the reference genome, we are using alignment files provided by NCBI to create a new reference, matching the transcript and use this for consequence calling. The HGVS calculation does not currently use this reference modification.

@jpuntomarcos
Copy link
Author

Thanks @dglemos . Is it planned to modify this to use the NCBI reference?

@dglemos
Copy link
Contributor

dglemos commented Oct 15, 2021

We have a plan to improve the HGVS calculation, however I can't give you an exact date when it will be done.

@bibinf
Copy link

bibinf commented Nov 18, 2021

Hi @dglemos ,

I'm not quite sure if my issue is related to this one, but I've also some strange behavior with a bam-edited annotation:

http://grch37.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=yJr1DXAflVyUlOCG-7819531

image

The calculated HGVS.c and HGVS.p states synonymous_variant, NM_000130.5:c.1601A>G, NP_000121.2:p.Arg534=, but I think it should be missense_variant, c.1601G>A, p.Arg534Gln after the correction took place or am I wrong?

You wrote that the HGVS calculation is not using this reference modification, but on the other hand you see an Arg as reference amino acid. Otherwise it would have been Gln.

More to the background:
The variant is heterozygous in our case. Hence, after your correction the "p.Arg534=" only fits to the "new wildtype" allele and not the allele with the actual variant.

Best regards and thanks in advance,
Sebastian

@dglemos
Copy link
Contributor

dglemos commented Nov 19, 2021

Hi @bibinf,
The HGVSc is not using the reference modification as it should, see your example NM_000130.5:c.1601A>G.
You are correct, in your example the amino acid change should be p.Arg534Gln (G>A and not G>G). This issue needs to be addressed - thank you for reporting it.

Best wishes,
Diana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants