perf(msgf): CSR graph + streaming GF merge + drop Hashtable sync#15
perf(msgf): CSR graph + streaming GF merge + drop Hashtable sync#15
Conversation
…ives-gf Replaces FlexAminoAcidGraph + GeneratingFunction with CSR-based PrimitiveAminoAcidGraph + flat-array PrimitiveGeneratingFunction in DBScanner.computeSpecEValue hot path. Ported without the experimental remainingUses / eager ScoreDist release (proven 5.9% CPU regression for 3.3% RSS gain). Legacy FlexAminoAcidGraph and GeneratingFunction remain for other callers. Parity verified by new TestPrimitiveRegression (ported from feat/primitives-gf) and existing test suite. Phase 1 of 3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rewrites PrimitiveGeneratingFunctionGroup as a running merger. Each per-mass-index GF is computed, merged into the aggregate via addProbDist, and released before the next mass index is built, so peak memory stays at one graph + one GF regardless of tolerance-window width. Math is identical because addProbDist with scoreDiff=0 and aaProb=1f is a linear sum; the running aggregate transparently widens its bounds if a later GF has a wider score range. This addresses the Phase-1 Astral regression (127% slower, +7.3% RSS) caused by concurrent accumulation of distByNode across all mass indices. TMT parity preserved. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
There was a problem hiding this comment.
Pull request overview
This PR introduces new “primitive” (array-based) implementations of the amino-acid graph and generating function, and wires them into DBScanner.computeSpecEValue to reduce CPU time and peak memory usage on the DB-search hot path.
Changes:
- Add
PrimitiveAminoAcidGraph(CSR) andPrimitiveGeneratingFunction(flat-array DP) to replaceFlexAminoAcidGraph+GeneratingFunctioninDBScanner.computeSpecEValue. - Add
PrimitiveGeneratingFunctionGroupthat merges per-mass-index score distributions in a streaming fashion to avoid retaining all per-index DP tables at once. - Add
TestPrimitiveRegressionto validate primitive parity vs legacy behavior, including support for negative nominal-mass states.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java | Switches the E-value computation hot path to the primitive graph/GF and streaming merge group. |
| src/main/java/edu/ucsd/msjava/msgf/PrimitiveAminoAcidGraph.java | New CSR-backed amino-acid graph used by the primitive generating function. |
| src/main/java/edu/ucsd/msjava/msgf/PrimitiveGeneratingFunction.java | New array-based generating function DP producing ScoreDist for spectral probabilities. |
| src/main/java/edu/ucsd/msjava/msgf/PrimitiveGeneratingFunctionGroup.java | New streaming merger to aggregate ScoreDist across isotope mass indices without retaining all GFs. |
| src/test/java/msgfplus/TestPrimitiveRegression.java | Adds regression tests comparing primitive vs legacy graph/GF, including negative nominal-mass handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Adds two user-facing pages to close documentation gaps surfaced in the landscape review: - Troubleshooting.md: centroiding (MSConvert peakPicking filter), XML prolog / encoding errors, FASTA size limits and split-and-merge workaround, -Xmx sizing table, -tasks tuning for OOM, thread-cap behaviour, and the OpenMS TOPPAS adapter workaround. - IsobaricLabeling.md: fixed-mod recipes for TMT-6/10/11, TMTpro-16/18, iTRAQ-4/8, the correct -protocol flag per label, and a full worked Mods.txt example. Addresses the recurring "does MS-GF+ support TMT-16plex?" question (issue MSGFPlus#82) which is a docs, not a code, gap. Also wires the two pages into docs/README.md's table of contents. The .gitignore update excludes a local-only working research note (.claude/investigations/msgfplus_research_report.md) from the repo, consistent with the existing "benchmark harness is local-only" convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Astral profiling revealed that synchronized Hashtable contention in the shared scorer tables dominated runtime: Hashtable.get alone took 23% of CPU and the surrounding monitor machinery (TrySpin, SafeFetch, ObjectSynchronizer) added another ~20% — 43% of CPU was sync overhead with 4 worker threads serializing through per-table monitors. NewRankScorer tables (fragOFFTable, insignificantFragOFFTable, rankDistTable, ionErrDistTable, noiseErrDistTable, ionExistenceTable) are populated once in readFromFile and read-only during search, so plain HashMap is safe. NewScorerFactory.scorerTable is a mutable cache with possible concurrent writes before warmup, so it uses ConcurrentHashMap. ScoringParameterGenerator(s) use HashMap too to match the NewRankScorer field types (build-time, not search path). Benchmarks (baseline = dev, branch = feat/primitives-optimization): PXD001819 LFQ: 176.5s -> 109.4s (-38.0%), 2254 -> 2472 MB (+9.7%) TMT: 644.3s -> 265.6s (-58.8%), 2872 -> 3125 MB (+8.8%) Astral: 2155.9s -> 723.3s (-66.5%), 6775 -> 7627 MB (+12.6%) PSM@1%FDR and SII counts match exactly on all three datasets. RSS grows modestly because worker threads now actually run in parallel (no more monitor serialization), so 4 concurrent graph/GF states are alive instead of effectively one. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
b1a1498): PortPrimitiveAminoAcidGraph,PrimitiveGeneratingFunction,PrimitiveGeneratingFunctionGroupfromfeat/primitives-gf. Wire intoDBScanner.computeSpecEValue, replacingFlexAminoAcidGraph + GeneratingFunction + GeneratingFunctionGroupon the hot path. Explicitly excludes theremainingUseseager-release experiment.9d07047): RewritesPrimitiveGeneratingFunctionGroupas a streaming merger. Each per-mass-index GF is computed, merged viaScoreDist.addProbDist, and released before the next mass index is built — peak memory stays at one graph + one GF regardless of tolerance-window width.addProbDistis linear withscoreDiff=0/aaProb=1f, so math is identical; running aggregate widens bounds transparently.java.util.Hashtable(8442f2c): Profiling revealed that synchronized Hashtable contention inNewRankScorer/NewScorerFactorywas consuming ~43% of CPU with 4 worker threads (23%Hashtable.get+ 10%ObjectMonitor::TrySpin+ ~10% misc monitor machinery).NewRankScorertables are populated once at scorer load and read-only during search, soHashMapis safe.NewScorerFactory.scorerTableneeds concurrent writes during warmup →ConcurrentHashMap.ScoringParameterGenerator(s)also updated to match field types (build-time only, not search hot path).Legacy
FlexAminoAcidGraphandGeneratingFunctionremain for other callers.Benchmark: PXD001819 LFQ (UPS1 + Yeast, LTQ Orbitrap Velos, CID)
Benchmark: TMT (PXD007683, Orbitrap Fusion Lumos)
Benchmark: Astral (ProteoBench Module 8, Orbitrap Astral)
Host: macOS, 11 CPUs, 18 GB RAM, Java 21.0.6,
-Xmx8192m(TMT/Astral) /-Xmx4096m(PXD001819), 4 threads.Why RSS grew with the Hashtable change
Phase 2 alone showed RSS -7.2% on Astral. After dropping the Hashtable monitor, the 4 worker threads actually run in parallel (previously most of their time was spent blocked on the per-table monitor), so 4× graph/GF state is live simultaneously. The RSS regression is the cost of real parallelism. All datasets remain well within
-Xmx8192m.Test plan
mvn -B verify— 141 tests pass (57 skipped for missing external data)TestPrimitiveRegression(ported fromfeat/primitives-gf) — 2/2 parity testsNon-goals (explicitly not in this PR)
remainingUses/ eagerScoreDistrelease experiment (proven bad tradeoff perfeat/primitives-gfdata)distByNodebounded retention /GFTable-style eviction for primitivesFlexAminoAcidGraphor legacyGeneratingFunctionshort[] massToNodeIdx, two-pass CSR build,NominalMassscratch reuse) — profiling showed GC is only 1.9% of wall time, undercutting the premise🤖 Generated with Claude Code