Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INDEL realigner cleanup #1412

Merged

Commits on Mar 31, 2017

  1. [ADAM-1402] Fix INDEL realigner bad binary search.

    Resolves bigdatagenomics#1402. Includes fixes to consensus generator and reference scorer.
    
    Improve INDEL realigner performance:
    
    * Exit early when realigning will not yield a better score.
    * Eliminate substring call in sweep over reference.
    * Change datastructures to be immutable wherever possible.
    * Add bound checking and other error checking.
    * Rewrite target association code to use array instead of set, and improve load balancing.
    * Delete high coverage targets with reduceByKey.
    
    Additionally:
    * Improve telemetry/logging to sort out load balance issue.
    * Support using reference file in INDEL realignment.
    * Log reads with negative alignment sizes.
    * Improved test coverage for insertion realignment.
    * Fix CIGARs on reads that partially overlap INDEL.
    * Soft clip reads that partially align to an insertion.
    * Eliminate non-determinism.
    * Fixed reference file.
    * Serialization fixes and debug.
    * Fix bad score.
    * Clean up clipping code?
    * Unclip clipped reads.
    fnothaft committed Mar 31, 2017
    Copy the full SHA
    b6de9a5 View commit details
    Browse the repository at this point in the history