New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INDEL realigner binary search conditional is flipped #1402

Closed
fnothaft opened this Issue Feb 25, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@fnothaft
Member

fnothaft commented Feb 25, 2017

This is a bad one. See https://github.com/bigdatagenomics/adam/blame/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/read/realignment/RealignIndels.scala#L88-L93. However, what's weird with this is that it should indicate that the realigner is borked, and that isn't consistent with validation data, nor our instrumentation telemetry. I've started to pull in some additional debug infrastructure to capture what's going on at each target in order to suss this out.

@fnothaft fnothaft added the bug label Feb 25, 2017

@fnothaft fnothaft self-assigned this Feb 25, 2017

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Feb 25, 2017

Member

The conditional says if the read starts before the start of the indel realignment target, which is the head of the tail after splitting targets in half, then return the head else return the tail.

Seems right to me, unless the conditional in TargetOrdering.lt doesn't match its method scaladoc (target.readRange.compareTo(_) < 0).

Member

heuermh commented Feb 25, 2017

The conditional says if the read starts before the start of the indel realignment target, which is the head of the tail after splitting targets in half, then return the head else return the tail.

Seems right to me, unless the conditional in TargetOrdering.lt doesn't match its method scaladoc (target.readRange.compareTo(_) < 0).

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 25, 2017

Member

Ah see, the trouble is, what you wrote is read < target. What TargetOrdering.lt(tail.head._1, read) implements is target < read.

Member

fnothaft commented Feb 25, 2017

Ah see, the trouble is, what you wrote is read < target. What TargetOrdering.lt(tail.head._1, read) implements is target < read.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 25, 2017

Member

And yeah, I think the method scaladoc is wrong.

Member

fnothaft commented Feb 25, 2017

And yeah, I think the method scaladoc is wrong.

@fnothaft fnothaft referenced this issue Feb 25, 2017

Closed

Add coveralls #1403

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 25, 2017

Member

Lo and behold, the TargetOrdering.lt line is missed, as is the tree split.

Member

fnothaft commented Feb 25, 2017

Lo and behold, the TargetOrdering.lt line is missed, as is the tree split.

fnothaft added a commit to fnothaft/adam that referenced this issue Feb 26, 2017

@fnothaft fnothaft referenced this issue Feb 27, 2017

Open

Add regression test suite #1407

0 of 9 tasks complete

fnothaft added a commit to fnothaft/adam that referenced this issue Feb 28, 2017

fnothaft added a commit to fnothaft/adam that referenced this issue Mar 2, 2017

fnothaft added a commit to fnothaft/adam that referenced this issue Mar 2, 2017

@fnothaft fnothaft added this to the 0.22.0 milestone Mar 3, 2017

fnothaft added a commit to fnothaft/adam that referenced this issue Mar 15, 2017

fnothaft added a commit to fnothaft/adam that referenced this issue Mar 23, 2017

fnothaft added a commit to fnothaft/adam that referenced this issue Mar 26, 2017

[ADAM-1402] Fix INDEL realigner bad binary search.
Resolves #1402. Includes fixes to consensus generator and reference scorer.

Improve INDEL realigner performance:

* Exit early when realigning will not yield a better score.
* Eliminate substring call in sweep over reference.
* Change datastructures to be immutable wherever possible.
* Add bound checking and other error checking.
* Rewrite target association code to use array instead of set, and improve load balancing.
* Delete high coverage targets with reduceByKey.

Additionally:
* Improve telemetry/logging to sort out load balance issue.
* Support using reference file in INDEL realignment.
* Log reads with negative alignment sizes.
* Improved test coverage for insertion realignment.
* Fix CIGARs on reads that partially overlap INDEL.
* Soft clip reads that partially align to an insertion.
* Eliminate non-determinism.

fnothaft added a commit to fnothaft/adam that referenced this issue Mar 28, 2017

[ADAM-1402] Fix INDEL realigner bad binary search.
Resolves #1402. Includes fixes to consensus generator and reference scorer.

Improve INDEL realigner performance:

* Exit early when realigning will not yield a better score.
* Eliminate substring call in sweep over reference.
* Change datastructures to be immutable wherever possible.
* Add bound checking and other error checking.
* Rewrite target association code to use array instead of set, and improve load balancing.
* Delete high coverage targets with reduceByKey.

Additionally:
* Improve telemetry/logging to sort out load balance issue.
* Support using reference file in INDEL realignment.
* Log reads with negative alignment sizes.
* Improved test coverage for insertion realignment.
* Fix CIGARs on reads that partially overlap INDEL.
* Soft clip reads that partially align to an insertion.
* Eliminate non-determinism.

fnothaft added a commit to fnothaft/adam that referenced this issue Mar 31, 2017

[ADAM-1402] Fix INDEL realigner bad binary search.
Resolves #1402. Includes fixes to consensus generator and reference scorer.

Improve INDEL realigner performance:

* Exit early when realigning will not yield a better score.
* Eliminate substring call in sweep over reference.
* Change datastructures to be immutable wherever possible.
* Add bound checking and other error checking.
* Rewrite target association code to use array instead of set, and improve load balancing.
* Delete high coverage targets with reduceByKey.

Additionally:
* Improve telemetry/logging to sort out load balance issue.
* Support using reference file in INDEL realignment.
* Log reads with negative alignment sizes.
* Improved test coverage for insertion realignment.
* Fix CIGARs on reads that partially overlap INDEL.
* Soft clip reads that partially align to an insertion.
* Eliminate non-determinism.
* Fixed reference file.
* Serialization fixes and debug.
* Fix bad score.
* Clean up clipping code?
* Unclip clipped reads.

@heuermh heuermh closed this in #1412 Mar 31, 2017

heuermh added a commit that referenced this issue Mar 31, 2017

[ADAM-1402] Fix INDEL realigner bad binary search.
Resolves #1402. Includes fixes to consensus generator and reference scorer.

Improve INDEL realigner performance:

* Exit early when realigning will not yield a better score.
* Eliminate substring call in sweep over reference.
* Change datastructures to be immutable wherever possible.
* Add bound checking and other error checking.
* Rewrite target association code to use array instead of set, and improve load balancing.
* Delete high coverage targets with reduceByKey.

Additionally:
* Improve telemetry/logging to sort out load balance issue.
* Support using reference file in INDEL realignment.
* Log reads with negative alignment sizes.
* Improved test coverage for insertion realignment.
* Fix CIGARs on reads that partially overlap INDEL.
* Soft clip reads that partially align to an insertion.
* Eliminate non-determinism.
* Fixed reference file.
* Serialization fixes and debug.
* Fix bad score.
* Clean up clipping code?
* Unclip clipped reads.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment