Performance tune-up in BQSR #1358

Closed
fnothaft opened this Issue Jan 19, 2017 · 0 comments

Comments

Projects
None yet
1 participant
@fnothaft
Member

fnothaft commented Jan 19, 2017

Details TBA.

@fnothaft fnothaft added this to the 0.21.1 milestone Jan 19, 2017

@fnothaft fnothaft self-assigned this Jan 19, 2017

fnothaft added a commit to fnothaft/adam that referenced this issue Mar 20, 2017

[ADAM-1358] Refactor BQSR to improve performance and legibility.
Resolves #1358.

* Adds instrumentation to BQSR.
* Changed SnpTable to remove RichVariant conversion, use VariantRDD API.
* Refactoring SnpTable to eliminate per-residue costly masked site lookup.
* Restructuring core of SnpTable around an array to improve GC performance.
  Additionally, wrote custom serializer to improve serialization performance.
* Added test suite for SnpTable, to test table creation.
* Refactored SnpTable to use an IntervalArray-like approach. This approach
  improves masked site lookup performance by 50%.
* Added tests to SnpTableSuite to cover lookup case, and reenabled tests in
  BaseQualityRecalibrationSuite.
* Adding unit test coverage to covariates
* Revert "[ADAM-775] Allow all IUPAC codes in BQSR"
  This reverts commit 207eeba.
* Pulled Seq allocation for base check out into an immutable set.
* Rewrote dinuc covariate. 50% improvement in performance.
* Rewrite main BQSR aggregate as reduce by key
* Added tests to recalibrator, recalibration table.
* Majorly refactors of BQSR tables.
* Starting to factor out the QualityScore class
* Refactoring CovariateKey to reduce size in memory
* Eliminated `org.bdgenomics.adam.rich.DecadentRead` (partially resolves #577)
* Refactor CovariateKey to store record group ID instead of record group name.
* Removed `org.bdgenomics.adam.models.QualityScore`.
* Split multi-class files into one class per file (excepting private classes) to improve navigability.
* Scaladoc all the recalibrators! You get a scaladoc! And you get a scaladoc!

fnothaft added a commit to fnothaft/adam that referenced this issue Mar 20, 2017

[ADAM-1358] Refactor BQSR to improve performance and legibility.
Resolves #1358.

* Adds instrumentation to BQSR.
* Changed SnpTable to remove RichVariant conversion, use VariantRDD API.
* Refactoring SnpTable to eliminate per-residue costly masked site lookup.
* Restructuring core of SnpTable around an array to improve GC performance.
  Additionally, wrote custom serializer to improve serialization performance.
* Added test suite for SnpTable, to test table creation.
* Refactored SnpTable to use an IntervalArray-like approach. This approach
  improves masked site lookup performance by 50%.
* Added tests to SnpTableSuite to cover lookup case, and reenabled tests in
  BaseQualityRecalibrationSuite.
* Adding unit test coverage to covariates
* Revert "[ADAM-775] Allow all IUPAC codes in BQSR"
  This reverts commit 207eeba.
* Pulled Seq allocation for base check out into an immutable set.
* Rewrote dinuc covariate. 50% improvement in performance.
* Rewrite main BQSR aggregate as reduce by key
* Added tests to recalibrator, recalibration table.
* Majorly refactors of BQSR tables.
* Starting to factor out the QualityScore class
* Refactoring CovariateKey to reduce size in memory
* Eliminated `org.bdgenomics.adam.rich.DecadentRead` (partially resolves #577)
* Refactor CovariateKey to store record group ID instead of record group name.
* Removed `org.bdgenomics.adam.models.QualityScore`.
* Split multi-class files into one class per file (excepting private classes) to improve navigability.
* Scaladoc all the recalibrators! You get a scaladoc! And you get a scaladoc!

@heuermh heuermh closed this in b1fce67 Mar 20, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment