Skip to content

perf(msgf): CSR graph + streaming GF merge + drop Hashtable sync#15

Merged
ypriverol merged 4 commits intodevfrom
feat/primitives-optimization
Apr 17, 2026
Merged

perf(msgf): CSR graph + streaming GF merge + drop Hashtable sync#15
ypriverol merged 4 commits intodevfrom
feat/primitives-optimization

Conversation

@ypriverol
Copy link
Copy Markdown
Member

@ypriverol ypriverol commented Apr 17, 2026

Summary

  • Commit 1 — Primitive CSR graph + flat-array GF (b1a1498): Port PrimitiveAminoAcidGraph, PrimitiveGeneratingFunction, PrimitiveGeneratingFunctionGroup from feat/primitives-gf. Wire into DBScanner.computeSpecEValue, replacing FlexAminoAcidGraph + GeneratingFunction + GeneratingFunctionGroup on the hot path. Explicitly excludes the remainingUses eager-release experiment.
  • Commit 2 — Streaming mass-index merger (9d07047): Rewrites PrimitiveGeneratingFunctionGroup as a streaming merger. Each per-mass-index GF is computed, merged via ScoreDist.addProbDist, and released before the next mass index is built — peak memory stays at one graph + one GF regardless of tolerance-window width. addProbDist is linear with scoreDiff=0/aaProb=1f, so math is identical; running aggregate widens bounds transparently.
  • Commit 3 — Drop java.util.Hashtable (8442f2c): Profiling revealed that synchronized Hashtable contention in NewRankScorer / NewScorerFactory was consuming ~43% of CPU with 4 worker threads (23% Hashtable.get + 10% ObjectMonitor::TrySpin + ~10% misc monitor machinery). NewRankScorer tables are populated once at scorer load and read-only during search, so HashMap is safe. NewScorerFactory.scorerTable needs concurrent writes during warmup → ConcurrentHashMap. ScoringParameterGenerator(s) also updated to match field types (build-time only, not search hot path).

Legacy FlexAminoAcidGraph and GeneratingFunction remain for other callers.

Benchmark: PXD001819 LFQ (UPS1 + Yeast, LTQ Orbitrap Velos, CID)

Variant Wall time Peak RSS PSM@1%FDR SII
baseline 176.5 s 2254 MB 14802 78118
branch 109.4 s 2472 MB 14802 78118
delta -38.0% +9.7% 0.0% 0.0%

Benchmark: TMT (PXD007683, Orbitrap Fusion Lumos)

Variant Wall time Peak RSS PSM@1%FDR SII
baseline 644.3 s 2872 MB 8547 87116
branch 265.6 s 3125 MB 8547 87116
delta -58.8% +8.8% 0.0% 0.0%

Benchmark: Astral (ProteoBench Module 8, Orbitrap Astral)

Variant Wall time Peak RSS PSM@1%FDR SII
baseline 2155.9 s 6775 MB 25037 272542
branch 723.3 s 7627 MB 25037 272542
delta -66.5% +12.6% 0.0% 0.0%

Host: macOS, 11 CPUs, 18 GB RAM, Java 21.0.6, -Xmx8192m (TMT/Astral) / -Xmx4096m (PXD001819), 4 threads.

Why RSS grew with the Hashtable change

Phase 2 alone showed RSS -7.2% on Astral. After dropping the Hashtable monitor, the 4 worker threads actually run in parallel (previously most of their time was spent blocked on the per-table monitor), so 4× graph/GF state is live simultaneously. The RSS regression is the cost of real parallelism. All datasets remain well within -Xmx8192m.

Test plan

  • mvn -B verify — 141 tests pass (57 skipped for missing external data)
  • TestPrimitiveRegression (ported from feat/primitives-gf) — 2/2 parity tests
  • PXD001819 LFQ PSM@1%FDR exact parity on a full-dataset run
  • TMT PSM@1%FDR exact parity on a full-dataset run
  • Astral PSM@1%FDR exact parity on a full-dataset run
  • Wall-time improves on every dataset (38% / 59% / 66%)

Non-goals (explicitly not in this PR)

  • remainingUses / eager ScoreDist release experiment (proven bad tradeoff per feat/primitives-gf data)
  • distByNode bounded retention / GFTable-style eviction for primitives
  • Changes to FlexAminoAcidGraph or legacy GeneratingFunction
  • Original "Phase 3" compactions (short[] massToNodeIdx, two-pass CSR build, NominalMass scratch reuse) — profiling showed GC is only 1.9% of wall time, undercutting the premise

🤖 Generated with Claude Code

ypriverol and others added 2 commits April 16, 2026 23:21
…ives-gf

Replaces FlexAminoAcidGraph + GeneratingFunction with CSR-based
PrimitiveAminoAcidGraph + flat-array PrimitiveGeneratingFunction in
DBScanner.computeSpecEValue hot path. Ported without the experimental
remainingUses / eager ScoreDist release (proven 5.9% CPU regression
for 3.3% RSS gain).

Legacy FlexAminoAcidGraph and GeneratingFunction remain for other
callers. Parity verified by new TestPrimitiveRegression (ported from
feat/primitives-gf) and existing test suite.

Phase 1 of 3.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rewrites PrimitiveGeneratingFunctionGroup as a running merger. Each
per-mass-index GF is computed, merged into the aggregate via
addProbDist, and released before the next mass index is built, so peak
memory stays at one graph + one GF regardless of tolerance-window
width. Math is identical because addProbDist with scoreDiff=0 and
aaProb=1f is a linear sum; the running aggregate transparently widens
its bounds if a later GF has a wider score range.

This addresses the Phase-1 Astral regression (127% slower, +7.3% RSS)
caused by concurrent accumulation of distByNode across all mass
indices. TMT parity preserved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 17, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 20ba372d-32d9-4e60-8f99-0029152917da

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/primitives-optimization

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ypriverol
Copy link
Copy Markdown
Member Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces new “primitive” (array-based) implementations of the amino-acid graph and generating function, and wires them into DBScanner.computeSpecEValue to reduce CPU time and peak memory usage on the DB-search hot path.

Changes:

  • Add PrimitiveAminoAcidGraph (CSR) and PrimitiveGeneratingFunction (flat-array DP) to replace FlexAminoAcidGraph + GeneratingFunction in DBScanner.computeSpecEValue.
  • Add PrimitiveGeneratingFunctionGroup that merges per-mass-index score distributions in a streaming fashion to avoid retaining all per-index DP tables at once.
  • Add TestPrimitiveRegression to validate primitive parity vs legacy behavior, including support for negative nominal-mass states.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java Switches the E-value computation hot path to the primitive graph/GF and streaming merge group.
src/main/java/edu/ucsd/msjava/msgf/PrimitiveAminoAcidGraph.java New CSR-backed amino-acid graph used by the primitive generating function.
src/main/java/edu/ucsd/msjava/msgf/PrimitiveGeneratingFunction.java New array-based generating function DP producing ScoreDist for spectral probabilities.
src/main/java/edu/ucsd/msjava/msgf/PrimitiveGeneratingFunctionGroup.java New streaming merger to aggregate ScoreDist across isotope mass indices without retaining all GFs.
src/test/java/msgfplus/TestPrimitiveRegression.java Adds regression tests comparing primitive vs legacy graph/GF, including negative nominal-mass handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ypriverol and others added 2 commits April 17, 2026 09:08
Adds two user-facing pages to close documentation gaps surfaced in the
landscape review:

- Troubleshooting.md: centroiding (MSConvert peakPicking filter), XML
  prolog / encoding errors, FASTA size limits and split-and-merge
  workaround, -Xmx sizing table, -tasks tuning for OOM, thread-cap
  behaviour, and the OpenMS TOPPAS adapter workaround.
- IsobaricLabeling.md: fixed-mod recipes for TMT-6/10/11, TMTpro-16/18,
  iTRAQ-4/8, the correct -protocol flag per label, and a full worked
  Mods.txt example. Addresses the recurring "does MS-GF+ support
  TMT-16plex?" question (issue MSGFPlus#82) which is a docs, not a code, gap.

Also wires the two pages into docs/README.md's table of contents.

The .gitignore update excludes a local-only working research note
(.claude/investigations/msgfplus_research_report.md) from the repo,
consistent with the existing "benchmark harness is local-only"
convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Astral profiling revealed that synchronized Hashtable contention in the
shared scorer tables dominated runtime: Hashtable.get alone took 23%
of CPU and the surrounding monitor machinery (TrySpin, SafeFetch,
ObjectSynchronizer) added another ~20% — 43% of CPU was sync overhead
with 4 worker threads serializing through per-table monitors.

NewRankScorer tables (fragOFFTable, insignificantFragOFFTable,
rankDistTable, ionErrDistTable, noiseErrDistTable, ionExistenceTable)
are populated once in readFromFile and read-only during search, so
plain HashMap is safe. NewScorerFactory.scorerTable is a mutable cache
with possible concurrent writes before warmup, so it uses
ConcurrentHashMap. ScoringParameterGenerator(s) use HashMap too to
match the NewRankScorer field types (build-time, not search path).

Benchmarks (baseline = dev, branch = feat/primitives-optimization):
  PXD001819 LFQ:  176.5s -> 109.4s (-38.0%),  2254 -> 2472 MB (+9.7%)
  TMT:            644.3s -> 265.6s (-58.8%),  2872 -> 3125 MB (+8.8%)
  Astral:        2155.9s -> 723.3s (-66.5%),  6775 -> 7627 MB (+12.6%)

PSM@1%FDR and SII counts match exactly on all three datasets. RSS
grows modestly because worker threads now actually run in parallel
(no more monitor serialization), so 4 concurrent graph/GF states are
alive instead of effectively one.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ypriverol ypriverol changed the title perf(msgf): primitive CSR graph + streaming GF merge perf(msgf): CSR graph + streaming GF merge + drop Hashtable sync Apr 17, 2026
@ypriverol ypriverol merged commit e597230 into dev Apr 17, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants