Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INDEL realigner binary search conditional is flipped #1402

Closed
fnothaft opened this issue Feb 25, 2017 · 4 comments
Closed

INDEL realigner binary search conditional is flipped #1402

fnothaft opened this issue Feb 25, 2017 · 4 comments
Assignees
Labels
bug
Milestone

Comments

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Feb 25, 2017

This is a bad one. See https://github.com/bigdatagenomics/adam/blame/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/read/realignment/RealignIndels.scala#L88-L93. However, what's weird with this is that it should indicate that the realigner is borked, and that isn't consistent with validation data, nor our instrumentation telemetry. I've started to pull in some additional debug infrastructure to capture what's going on at each target in order to suss this out.

@fnothaft fnothaft added the bug label Feb 25, 2017
@fnothaft fnothaft self-assigned this Feb 25, 2017
@heuermh
Copy link
Member

@heuermh heuermh commented Feb 25, 2017

The conditional says if the read starts before the start of the indel realignment target, which is the head of the tail after splitting targets in half, then return the head else return the tail.

Seems right to me, unless the conditional in TargetOrdering.lt doesn't match its method scaladoc (target.readRange.compareTo(_) < 0).

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 25, 2017

Ah see, the trouble is, what you wrote is read < target. What TargetOrdering.lt(tail.head._1, read) implements is target < read.

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 25, 2017

And yeah, I think the method scaladoc is wrong.

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 25, 2017

Lo and behold, the TargetOrdering.lt line is missed, as is the tree split.

fnothaft added a commit to fnothaft/adam that referenced this issue Feb 26, 2017
@fnothaft fnothaft mentioned this issue Feb 27, 2017
0 of 9 tasks complete
fnothaft added a commit to fnothaft/adam that referenced this issue Feb 28, 2017
fnothaft added a commit to fnothaft/adam that referenced this issue Mar 2, 2017
fnothaft added a commit to fnothaft/adam that referenced this issue Mar 2, 2017
@fnothaft fnothaft added this to the 0.22.0 milestone Mar 3, 2017
fnothaft added a commit to fnothaft/adam that referenced this issue Mar 15, 2017
fnothaft added a commit to fnothaft/adam that referenced this issue Mar 23, 2017
fnothaft added a commit to fnothaft/adam that referenced this issue Mar 26, 2017
Resolves bigdatagenomics#1402. Includes fixes to consensus generator and reference scorer.

Improve INDEL realigner performance:

* Exit early when realigning will not yield a better score.
* Eliminate substring call in sweep over reference.
* Change datastructures to be immutable wherever possible.
* Add bound checking and other error checking.
* Rewrite target association code to use array instead of set, and improve load balancing.
* Delete high coverage targets with reduceByKey.

Additionally:
* Improve telemetry/logging to sort out load balance issue.
* Support using reference file in INDEL realignment.
* Log reads with negative alignment sizes.
* Improved test coverage for insertion realignment.
* Fix CIGARs on reads that partially overlap INDEL.
* Soft clip reads that partially align to an insertion.
* Eliminate non-determinism.
fnothaft added a commit to fnothaft/adam that referenced this issue Mar 28, 2017
Resolves bigdatagenomics#1402. Includes fixes to consensus generator and reference scorer.

Improve INDEL realigner performance:

* Exit early when realigning will not yield a better score.
* Eliminate substring call in sweep over reference.
* Change datastructures to be immutable wherever possible.
* Add bound checking and other error checking.
* Rewrite target association code to use array instead of set, and improve load balancing.
* Delete high coverage targets with reduceByKey.

Additionally:
* Improve telemetry/logging to sort out load balance issue.
* Support using reference file in INDEL realignment.
* Log reads with negative alignment sizes.
* Improved test coverage for insertion realignment.
* Fix CIGARs on reads that partially overlap INDEL.
* Soft clip reads that partially align to an insertion.
* Eliminate non-determinism.
fnothaft added a commit to fnothaft/adam that referenced this issue Mar 31, 2017
Resolves bigdatagenomics#1402. Includes fixes to consensus generator and reference scorer.

Improve INDEL realigner performance:

* Exit early when realigning will not yield a better score.
* Eliminate substring call in sweep over reference.
* Change datastructures to be immutable wherever possible.
* Add bound checking and other error checking.
* Rewrite target association code to use array instead of set, and improve load balancing.
* Delete high coverage targets with reduceByKey.

Additionally:
* Improve telemetry/logging to sort out load balance issue.
* Support using reference file in INDEL realignment.
* Log reads with negative alignment sizes.
* Improved test coverage for insertion realignment.
* Fix CIGARs on reads that partially overlap INDEL.
* Soft clip reads that partially align to an insertion.
* Eliminate non-determinism.
* Fixed reference file.
* Serialization fixes and debug.
* Fix bad score.
* Clean up clipping code?
* Unclip clipped reads.
heuermh added a commit that referenced this issue Mar 31, 2017
Resolves #1402. Includes fixes to consensus generator and reference scorer.

Improve INDEL realigner performance:

* Exit early when realigning will not yield a better score.
* Eliminate substring call in sweep over reference.
* Change datastructures to be immutable wherever possible.
* Add bound checking and other error checking.
* Rewrite target association code to use array instead of set, and improve load balancing.
* Delete high coverage targets with reduceByKey.

Additionally:
* Improve telemetry/logging to sort out load balance issue.
* Support using reference file in INDEL realignment.
* Log reads with negative alignment sizes.
* Improved test coverage for insertion realignment.
* Fix CIGARs on reads that partially overlap INDEL.
* Soft clip reads that partially align to an insertion.
* Eliminate non-determinism.
* Fixed reference file.
* Serialization fixes and debug.
* Fix bad score.
* Clean up clipping code?
* Unclip clipped reads.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.