New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set the wrong value for end for symbolic alts #1381

Closed
fnothaft opened this Issue Feb 3, 2017 · 5 comments

Comments

Projects
None yet
2 participants
@fnothaft
Member

fnothaft commented Feb 3, 2017

See https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/converters/VariantContextConverter.scala#L1851. We should not be taking the reference allele length; we should use the end position value...

@fnothaft fnothaft added the bug label Feb 3, 2017

@fnothaft fnothaft added this to the 0.21.1 milestone Feb 3, 2017

@fnothaft fnothaft self-assigned this Feb 3, 2017

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Feb 3, 2017

Member

I actually disagree with this; there might be additional discussion in bdg-formats, or perhaps I didn't bring it up yet.

The end field is documented strictly as Calculated by start + referenceAllele.length() and I find complicating that definition with "...or unless the VCF INFO key END is present and valid etc." messy.

Would a new field VariantAnnotation.symbolicEnd be better for mapping for the VCF INFO key "END"?

Member

heuermh commented Feb 3, 2017

I actually disagree with this; there might be additional discussion in bdg-formats, or perhaps I didn't bring it up yet.

The end field is documented strictly as Calculated by start + referenceAllele.length() and I find complicating that definition with "...or unless the VCF INFO key END is present and valid etc." messy.

Would a new field VariantAnnotation.symbolicEnd be better for mapping for the VCF INFO key "END"?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 3, 2017

Member

You can disagree all you want, but we throw an exception when saving Variants with multi-base symbolic alts back to VCF right now.

Member

fnothaft commented Feb 3, 2017

You can disagree all you want, but we throw an exception when saving Variants with multi-base symbolic alts back to VCF right now.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Feb 8, 2017

Member

When we convert htsjdk VariantContext to o.b.a.models.VariantContext, we're setting Variant.end to start + referenceAllele.length() and if VCF INFO key END is present, adding an entry to Variant.attributes. That seems correct to me.

On converting o.b.a.models.VariantContext to htsjdk VariantContext, is it htsjdk that throws an exception? I suppose the conversion should be

builder.stop(variant.getAnnotation.getSomatic ?
  variant.getAnnotation.getAttributes.get("END") : variant.getEnd)

(wrapped in Options with defaults as necessary)

Or am I misunderstanding the problem?

Member

heuermh commented Feb 8, 2017

When we convert htsjdk VariantContext to o.b.a.models.VariantContext, we're setting Variant.end to start + referenceAllele.length() and if VCF INFO key END is present, adding an entry to Variant.attributes. That seems correct to me.

On converting o.b.a.models.VariantContext to htsjdk VariantContext, is it htsjdk that throws an exception? I suppose the conversion should be

builder.stop(variant.getAnnotation.getSomatic ?
  variant.getAnnotation.getAttributes.get("END") : variant.getEnd)

(wrapped in Options with defaults as necessary)

Or am I misunderstanding the problem?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 8, 2017

Member

When we convert htsjdk VariantContext to o.b.a.models.VariantContext, we're setting Variant.end to start + referenceAllele.length() and if VCF INFO key END is present, adding an entry to Variant.attributes. That seems correct to me.

I think you're doing too narrow of a reading here. Reference allele != referenceAllele.length. If you've got a symbolic call, the reference allele is typically given as the first base covered by the symbolic alt for brevity, but obviously the reference allele is not actually a single base, it is the whole sequence that is covered by the alt.

I'd also note that the only place that the stop position is codified in the VCF spec (at least, for 4.2, as I'm reading it) is for symbolic alts with an END key.

On converting o.b.a.models.VariantContext to htsjdk VariantContext, is it htsjdk that throws an exception?

Correct.

builder.stop(variant.getAnnotation.getSomatic ?
variant.getAnnotation.getAttributes.get("END") : variant.getEnd)

Not somatic, symbolic ;)

Member

fnothaft commented Feb 8, 2017

When we convert htsjdk VariantContext to o.b.a.models.VariantContext, we're setting Variant.end to start + referenceAllele.length() and if VCF INFO key END is present, adding an entry to Variant.attributes. That seems correct to me.

I think you're doing too narrow of a reading here. Reference allele != referenceAllele.length. If you've got a symbolic call, the reference allele is typically given as the first base covered by the symbolic alt for brevity, but obviously the reference allele is not actually a single base, it is the whole sequence that is covered by the alt.

I'd also note that the only place that the stop position is codified in the VCF spec (at least, for 4.2, as I'm reading it) is for symbolic alts with an END key.

On converting o.b.a.models.VariantContext to htsjdk VariantContext, is it htsjdk that throws an exception?

Correct.

builder.stop(variant.getAnnotation.getSomatic ?
variant.getAnnotation.getAttributes.get("END") : variant.getEnd)

Not somatic, symbolic ;)

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Feb 8, 2017

Member

If you've got a symbolic call...

Right, the current definition for Variant.end doesn't make sense for symbolic alleles. Over in bigdatagenomics/bdg-formats#107 you propose updating the definition with the two conditions. I would prefer keeping the symbolic end in VariantAnnotation.attributes or in a separate VariantAnnotation.symbolicEnd field.

Not somatic, symbolic ;)

Distracted brain is not detail-oriented brain, apparently. :)

Member

heuermh commented Feb 8, 2017

If you've got a symbolic call...

Right, the current definition for Variant.end doesn't make sense for symbolic alleles. Over in bigdatagenomics/bdg-formats#107 you propose updating the definition with the two conditions. I would prefer keeping the symbolic end in VariantAnnotation.attributes or in a separate VariantAnnotation.symbolicEnd field.

Not somatic, symbolic ;)

Distracted brain is not detail-oriented brain, apparently. :)

fnothaft added a commit to fnothaft/adam that referenced this issue Feb 13, 2017

[ADAM-1381] Fix Variant end position.
Resolves #1381. Sets variant end position to the proper site for symbolic alleles.

@heuermh heuermh closed this in #1389 Feb 21, 2017

heuermh added a commit that referenced this issue Feb 21, 2017

[ADAM-1381] Fix Variant end position.
Resolves #1381. Sets variant end position to the proper site for symbolic alleles.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment