Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong normalization result #464

Closed
ianfab opened this issue Sep 26, 2017 · 4 comments · Fixed by #465
Closed

Wrong normalization result #464

ianfab opened this issue Sep 26, 2017 · 4 comments · Fixed by #465

Comments

@ianfab
Copy link

ianfab commented Sep 26, 2017

Running print(hn.normalize(hp.parse_hgvs_variant('NC_000001.10:g.1647893delinsCTTTCTT')))
in the hgvs-shell resulted in NC_000001.10:g.1647900dup, which looks incorrect since the length of the inserted sequence does not match. If I am not mistaken, it should be NC_000001.10:g.1647894_1647899dup (or NC_000001.10:g.1647893_1647894insTTTCTT).

@reece
Copy link
Member

reece commented Sep 27, 2017

Thanks, @ianfab. I agree, this is a bug.

@icebert Do you have a moment to look at this? It'll probably be more transparent to you than to me.

@icebert
Copy link
Contributor

icebert commented Sep 27, 2017

The correct normalization result should be NC_000001.10:g.1647895_1647900dup. I will find out how this bug happens.

@icebert
Copy link
Contributor

icebert commented Sep 27, 2017

I find out this is caused by the share of start and end position instance when the start and end is the same. In hgvs parser, when the start and end is same, they will use the same instance. Thus, when changing the end position of such variants in the normalizer, the start position will also be changed. This leads to this bug.

Example code:

import hgvs
import hgvs.parser
import hgvs.normalizer
import hgvs.dataproviders.uta

hdp = hgvs.dataproviders.uta.connect()

hp = hgvs.parser.Parser()

hn = hgvs.normalizer.Normalizer(hdp)

var_g = 'NC_000001.10:g.1647893delinsCTTTCTT'

var = hp.parse_hgvs_variant(var_g)

var.posedit.pos.start is var.posedit.pos.end

>>>True

var_g = 'NC_000001.10:g.1647893_1647894delinsCTTTCTT'

var = hp.parse_hgvs_variant(var_g)

var.posedit.pos.start is var.posedit.pos.end

>>>False

I think the start and end position should always be two different instances no matter the start and end positions are equal or not.

@ianfab
Copy link
Author

ianfab commented Sep 28, 2017

Thanks a lot for the quick reply and fix.

reece added a commit that referenced this issue Sep 28, 2017
…d_in_position

make start and end position independent when start and end are equal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants