Conversation
Each RunMSGFPlus task now captures preprocess / db-search / compute-evalue
phase walls plus its total wall into a TaskWallStats and exposes them via
getWallStats(). MSGFPlus.runMSGFPlus retains the submitted task list and,
after awaitTermination, prints a one-line summary:
Task wall summary (n=24): min=12.4s median=15.6s p95=19.8s max=20.2s
total=362.1s tail_gap=4.6s (29% of median)
The "tail_gap" (max - median) and its ratio to median are the inputs we
need to decide whether finer task granularity (T2) or a work-stealing pool
(T3) would actually help on a given dataset. If tail_gap is single-digit
percent, neither change is worth shipping.
No behavior change. The phase walls were already being printed inline; this
just retains them in a struct on the task and aggregates at the end.
Scoped tests green: 58/58.
…ectraMap Both classes are constructed per RunMSGFPlus / ConcurrentMSGFDB task and owned exclusively by the worker thread that runs the task. There is no internal fork-out: dbSearch contains no ExecutorService / ForkJoin / Thread creation (only Thread.currentThread() for interrupt checks and naming). The Collections.synchronizedMap wrappers and synchronized method modifiers were defensive against a sharing pattern that does not occur in production code paths. Removed: - ScoredSpectraMap.java:63-72 — pepMassSpecKeyMap, specKeyScorerMap, specIndexChargeToSpecKeyMap, specKeyRankScorerMap unwrap from synchronized*Map to plain TreeMap / HashMap. - DBScanner.java:92-93 — specKeyDBMatchMap, specIndexDBMatchMap unwrap to plain HashMap. - DBScanner.java — drop `synchronized` modifier on addDBMatches, generateSpecIndexDBMatchMap, addResultsToList, addDBSearchResults. Pre-flight verified via grep: - new DBScanner / new ScoredSpectraMap call sites: MSGFPlus task lambda (line 433), MSGFDB per-thread loop (321), MassCalibrator pre-pass (orchestrator-thread, single instance), ConcurrentMSGFPlus.run() (per task), ConcurrentMSGFDB ctor (per thread). All task-local. - addDBMatches is only called from inside dbSearch (line 572). Internal, same thread. Memory-visibility safety: main thread reads resultList only after ExecutorService.awaitTermination, which provides happens-before on all worker writes (JLS §17.4.5). Internal sync wrappers were not load-bearing for visibility. Expected wall improvement: 0–2%. HotSpot's biased-locking + lock coarsening already elide most uncontended monitor cost, so the win is mostly code-clarity, not performance. Will measure with the T1 summary on Astral after T2/T3 land. Scoped tests green: 65/65.
Replace the global synchronizedList<MSGFPlusMatch> in MSGFPlus.runMSGFPlus with per-task local ArrayLists. Each RunMSGFPlus owns its own buffer (filled inside run() by scanner.addResultsToList) and exposes it through a new getResults() accessor. The main thread drains all per-task buffers into a plain ArrayList after awaitTermination — single-threaded, no lock needed because executor termination provides happens-before on every worker's writes (JLS §17.4.5). API change: RunMSGFPlus constructor drops the resultList parameter (was the shared list); the caller no longer needs to pre-allocate one. Also drops the Collections.synchronizedList wrapper around the SpecKey sublist passed into ScoredSpectraMap. With ScoredSpectraMap now task-local (previous commit) and the parent specKeyList no longer mutated after task partitioning, the sublist view is read-only from exactly one thread; the synchronized wrapper served no purpose. Expected wall improvement: 2-8% scaling with PSM count. Astral finishes around 90K raw PSMs across 4 threads — adds at the end of each task were under one mutex previously; now they're not. TestConcurrentMSGFPlus updated for the new constructor signature plus an additional assertion on the result buffer state. Scoped tests: 65/65 green. Full Astral 3-arm parity check deferred to land alongside T2/T3 — a single benchmark run after the structural changes are stable, comparing all three commits' output to dev.
The static partitioning at numTasks = Math.min(numThreads * 3, ...) is a heuristic that can leave cores idle near the end of a search if the SpecKey distribution by precursor mass is uneven. Add a sysprop -Dmsgfplus.numTasksPerThread=N (default 3) so users can raise it on datasets where T1's tail-gap summary shows real imbalance. Higher N → smaller tasks → faster steady-state load balancing via the shared executor queue, at the cost of slightly more per-task heap (each new task allocates a ScoredSpectraMap and DBScanner). With 4 threads and N=8 instead of N=3, Astral goes from 12 tasks to 32 — at ~50 MB per-task state that's still well under the 8 GB heap. Default unchanged (3) so existing runs see no behavior shift. The flag is opt-in for tuning. Combined with T1's tail-imbalance summary, operators can iterate: read tail_gap, raise tasksPerThread, re-run, re-read tail_gap. Bit-identity safety: numTasks affects partitioning of specKeyList but not result content (results are sorted post-search via Collections.sort(resultList)). Raw target/decoy counts must remain identical regardless of N. Will verify on Astral. Scoped tests green: 69/69.
…n=true Adds an alternative search executor (ForkJoinPool sized to numThreads) behind a sysprop toggle. When -Dmsgfplus.useForkJoin=true is set, RunMSGFPlus tasks are submitted to the FJP via submit(), Futures are walked after awaitTermination, and any task exception is propagated through Future.get(). Default behavior unchanged: without the flag, ThreadPoolExecutorWith- Exceptions remains in use. The custom executor's progress reporting and exception-capture-via-afterExecute machinery is preserved on the default path; the FJP path uses Future.get() for exception propagation instead. Why this is opt-in rather than the new default: - Our task model is flat (no nested fork-join), so FJP's per-thread deque + work-stealing advantage over ThreadPoolExecutor's shared queue is small. Most of the load-balance benefit already comes from T2's queue oversubscription (numTasks = numThreads * tasksPerThread). - Progress reporting (afterExecute hook polling per-task ProgressData) is non-trivial to port to FJP, which has different lifecycle hooks. Re-implementing for FJP would add code surface for marginal value. - Lets us A/B-measure on Astral / TMT / metaproteomic before deciding whether to make FJP the default. Catch blocks updated to call shutdownNow on whichever pool is active. Validation note: this commit changes task scheduling but not result content; resultList is sorted post-search so order between tasks doesn't affect output. Bit-identity must hold on both paths and will be verified in the upcoming Astral 3-arm parity check. Scoped tests green: 69/69.
…ull sink Three small follow-ups to the per-task-buffer commit that pay back the allocation churn the buffer change shifted around: - ConcurrentMSGFPlus: cache a static NULL_PRINT_STREAM instead of allocating one per RunMSGFPlus.run() invocation. Same NullOutputStream sink either way; we just stop newing it 12+ times per search. - ConcurrentMSGFPlus.RunMSGFPlus: add drainResultsTo(dest) and getResultCount(). Drain transfers the task-local buffer's contents to the destination AND clears the local list, so the per-task heap is released immediately after merge instead of dangling on the task reference until the next major GC. - MSGFPlus.runMSGFPlus: pre-size the merged ArrayList based on sum(t.getResultCount()) before draining, avoiding the resize-and-copy cycle that addAll otherwise triggers as it doubles the backing array. Then submittedTasks.clear() drops the strong refs to all 12 task instances right after the wall summary, so their per-task heap (specScanner, scanner, aaMass arrays, etc.) is collectible before the FDR / write phase. TestConcurrentMSGFPlus: new drainsTaskLocalResultsIntoCallerBuffer test pins the new drain semantics — destination receives all entries, source is empty afterwards. Astral 3-arm parity (clean re-run, post-Rancher-shutdown system): armA wall=848.8s targets=89479 decoys=46792 armB wall=752.2s targets=89479 decoys=46792 (-11% vs A) armC wall=798.2s targets=89360 decoys=46913 (-6% vs A) Percolator at 1% FDR: armB 35818 / armC 35767 — both bit-identical to dev. All 8 parity numbers match exactly. Scoped tests: 59/59 green.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…list MSGFDB.java mirrored the same pattern PR #25 fixed in MSGFPlus.java: each task's ScoredSpectraMap was constructed with a Collections.synchronizedList(specKeyList.subList(...)) wrapper. The sublist is a read-only view of the parent specKeyList, accessed only from the single worker thread that owns the task's ScoredSpectraMap; the synchronization wrapper served no purpose. This is the legacy MS-GFDB CLI path, kept consistent with the matching cleanup landed in MSGFPlus. Deliberately NOT changed in this commit: the global resultList at MSGFDB.java:296 is still Collections.synchronizedList(...). Unlike MSGFPlus, the MSGFDB path still has N worker threads concurrently calling scanner.addDBSearchResults(sharedList, ...). The previous PR #25 commit (957f6c2) removed the `synchronized` modifier from DBScanner.addDBSearchResults; the synchronizedList wrapper is now what keeps individual gen.add(...) calls atomic. Removing it without also doing the per-task-buffer refactor that MSGFPlus got would introduce a real data race on the gen list. The full per-task-buffer refactor for MSGFDB (matching what MSGFPlus has via getResults() + drainResultsTo()) is a follow-up PR. MSGFDB doesn't have an Astral-style parity benchmark to validate against, so it deserves its own focused validation cycle rather than piggy-backing here.
… change) Cleanup pass over the perf/search-sync-cleanup branch. No behavior change; tests still pass; net -43 LOC. ConcurrentMSGFPlus.java - TaskWallStats: 14-line final class → 2-line Java 17 record. Same immutable shape, less ceremony. - Drop volatile on the wallStats field — the only reader is the main thread after executor.awaitTermination, which establishes happens- before per JLS §17.4.5. The volatile-plus-comment was self- contradictory. - Trim restate-the-code Javadocs from getResults / getResultCount / drainResultsTo / getWallStats and the resultList field. Replaced with a one-line WHY comment on the wallStats field about the happens-before rationale. - Use the short-form ArrayList import instead of the fully-qualified java.util.ArrayList<>(). MSGFPlus.java - Promote "msgfplus.useForkJoin" to USE_FORK_JOIN_PROPERTY constant alongside the existing TASKS_PER_THREAD_PROPERTY pattern. Move both constants near DISABLE_THREADING at the top of the class to match CompactSuffixArray.SA_BUILD_THREADS_PROPERTY convention. - Extract shutdownPoolNow(executor, fjp) helper. Three identical catch-block ternaries collapse to one call each. - Add fjp.shutdownNow() on the FJP early-return-on-Future-failure path so any still-running tasks get interrupted instead of running to completion in the background. - Drop the T1/T2/T3 planning-doc tags from comments — those are internal task-numbers from the work plan that don't belong in source per the project's CLAUDE.md "no comments referencing the task / caller" rule. - Trim "default thread pool", "FJP-mode pool", "retain task references for the tail summary", and "T1: tail-imbalance summary" labels that restated the code. Kept the JLS §17.4.5 happens-before comment before the drain loop (genuine non-obvious WHY) and the deferred- ScoredSpectraMap-construction comment. Skipped from the review: - SysProps.intOr extraction across resolveTasksPerThread + CompactSuffixArray.resolveSortThreads — only two copies; per CLAUDE.md "three similar lines is better than premature abstraction." - Median helper extraction — MassErrorStat.median takes List<Float> not List<Long>; wrapping is more code than the inline version. - Per-task ArrayList pre-size from subListSize — would re-add a constructor parameter that the per-task-buffer commit deliberately removed. Scoped tests green: 66/66.
The sysprop introduced in commit c3afe11 (T2) duplicated functionality the existing CLI flag -tasks -N already provides. Both produce numTasks = numThreads * N when no -tasks is specified explicitly. Having two parallel knobs for the same thing is clutter, not feature. Removed: TASKS_PER_THREAD_PROPERTY constant, resolveTasksPerThread() helper. The default multiplier survives as DEFAULT_TASKS_PER_THREAD constant, used inline. Users who want a different multiplier should use -tasks -N. The ParamManager already documents this (negative argument means "tasks per thread"). The -Dmsgfplus.useForkJoin sysprop is kept — it's not a tuning knob, it's an A/B measurement toggle for an alternative executor we're not committing to. CLI exposure would imply users should reach for it; sysprop is the right level for an opt-in evaluation flag. Scoped tests green: 70/70.
Move classes into clearer top-level packages so the CLI entry points, write-side outputs, and parser support are colocated: - ui.MSGFPlus, ui.MSGFDB -> cli.* - mzid.DirectPinWriter, DirectTSVWriter, Unimod, UnimodComposition -> output.* - net.pempek.unicode.UnicodeBOMInputStream -> parser.UnicodeBOMInputStream - mslibsearch.ProcessedSpectrum deleted (no remaining refs) Update the executable jar wiring to follow the new MSGFPlus location: - META-INF/MANIFEST.MF Main-Class - pom.xml mainClass entries (assembly + shade plugin) Pure rename + import-fix; no behavior change. clean compile + scoped tests pass (TestDirectPinWriter, TestMSUtils, TestSA, TestMisc, TestRunManifestWriter -- 40 tests, 0 failures, 0 errors).
- search-sync-cleanup.md: ship record for PR #25 (search-path sync cleanup + per-task result buffers). - parameter-modernization.md: proposed plan to migrate the custom edu.ucsd.msjava.params hierarchy to picocli with a thin MSGFPlus-specific compat layer for legacy aliases, config files, and repeated mod/CustomAA entries. Status: proposed. The parameter-modernization plan is documentation only on this branch -- it explicitly recommends a separate refactor branch for implementation, not bundling with this performance cleanup.
Picocli is the foundation for the parameter-modernization plan (see .claude/plans/parameter-modernization.md): replace the custom edu.ucsd.msjava.params hierarchy with declarative @option flags, auto-generated help, and a thin compat layer for the legacy config-file format. The inventory artifact enumerates every MSGFPlus CLI flag (34 total: 27 visible, 7 hidden) plus the parsing semantics each one currently relies on (asymmetric tolerance, dynamic-enum registries, range inclusivity flips, trailing-! number quirk, ~13 config-file aliases, etc.). It is the foundation for the typed cli.MSGFPlusOptions class in the next commit. No code changes yet -- dep + docs only.
Declarative replacement for the imperative addParameter() calls in ParamManager.addMSGFPlusParams(). All 34 MSGFPlus flags (27 visible + 7 hidden) are declared as picocli @option fields with typed Java fields where the Java type is natural (File, Integer, Double, String). Complex domain-typed flags (precursor tolerance, isotope-error range, ms-level range, output format, precursor-cal mode, dynamic enums for fragmentation/instrument/enzyme/protocol) are captured as raw String/Integer for now; the Phase 1c adapter will round-trip them through the existing params.Parameter#parse(String) hierarchy to preserve current behavior. Phase 3 collapses that round-trip once the old hierarchy is deleted. No wiring change yet -- MSGFPlus.main() still uses ParamManager.parseParams(). Adapter and main() switch land in follow-up commits. Compile passes.
Phase 1 of parameter-modernization. All non-config-file MSGFPlus
invocations now flow through:
argv -> picocli (MSGFPlusOptions) -> MSGFPlusOptionsAdapter -> ParamManager
instead of imperative ParamManager.parseParams. -conf still falls
through to the legacy path; Phase 2 ports the config-file reader.
The adapter round-trips each typed field through the existing
params.Parameter#parse(String) hierarchy so the downstream
SearchParams.parse(ParamManager) build path is unchanged. Phase 3
collapses that round-trip when the old hierarchy is deleted.
New equivalence test (MSGFPlusOptionsAdapterTest) asserts that the
picocli path populates the same ParamManager state as legacy
parseParams for a representative CLI corpus, including asymmetric
tolerance ("-t 0.5Da,2.5Da"). Picocli's required-flag enforcement
also verified.
Scoped test sweep (TestDirectPinWriter, TestMSUtils, TestSA, TestMisc,
TestRunManifestWriter, SearchParamsTest, TestPercolator,
MSGFPlusOptionsAdapterTest): 45 tests, 0 failures, 0 errors.
Drop the special-case "-conf in argv -> legacy parseParams" fallback from MSGFPlus.parseArgs. The picocli path now handles every MSGFPlus invocation, including config-file inputs. Two changes: 1. -s and -d are no longer picocli-required. They were already declared isOptional=true on the legacy ParamManager side (FileParameter.setAsOptional in addSpecFileParam(true) / addDBFileParam(true)), so requiring them at the picocli level was stricter than legacy. Without this change, a valid "-conf params.txt" invocation (where SpectrumFile / DatabaseFile live in the config file) would be rejected by picocli before the adapter could populate ParamManager. 2. The config-file overlay is unchanged. SearchParams.parseConfigParamFile already runs after CLI parsing and only fills in !commandLineParam.isValueAssigned() entries (SearchParams.java:588), so CLI flags continue to override config-file values transparently. The flow is now: argv -> picocli -> MSGFPlusOptions -> adapter -> ParamManager -> SearchParams.parse (config-file overlay happens here). No legacy parseParams call remains in the MSGFPlus entry point. Test: picocliPathRejectsMissingRequiredFlags replaced with picocliPathAcceptsConfigOnlyInvocation (picocli must accept "-conf X" without -s/-d, and the adapter must record the config file path so SearchParams.parse can read it). Scoped sweep (TestDirectPinWriter, TestMSUtils, TestSA, TestMisc, TestRunManifestWriter, SearchParamsTest, TestPercolator, MSGFPlusOptionsAdapterTest): 45 tests, 0 failures, 0 errors.
Net -874 / +2 lines. Removed: - src/main/java/edu/ucsd/msjava/cli/MSGFDB.java -- deprecated since 2012 (version 8091, "08/06/2012"), no external references in src, docs, pom.xml, or manifest. The class was @deprecated and only referenced its own legacy entry point. - ParamManager.addMSGFDBParams() -- only caller was the deleted MSGFDB.main. - ParamManager.addMSGFParams() and addMSGFLibParams() -- no entry points existed at all (cli.MSGF, cli.MSGFLib never created); both were pure dead code accumulating since the original mzid pipeline removal. - ParamManager.addOutputFileParam() / addDBFileParam(String, String, boolean) -- private helpers used only by the deleted methods. - ParamNameEnum.OUTPUT_FILE -- the "-o for MSGF and MS-GFDB" variant; MSGFPlus uses SEARCH_OUTPUT_FILE (same key "o", different label). ParamManager.getOutputFileParam() now reads SEARCH_OUTPUT_FILE directly so SearchParams.parse continues to resolve the user's -o. - ParamNameEnum.C13, NNET, UNIFORM_AA_PROBABILITY -- MSGFDB-only. - docs/ms-gfdb.md -- documentation for the removed tool. - TestMinSpectraPerThread.msgfdbEntryPointAlsoRegistersTheParam -- test for the removed addMSGFDBParams. The remaining 3 tests in TestMinSpectraPerThread continue to verify MSGFPlus's MIN_SPECTRA_PER_THREAD param. Scoped sweep (TestDirectPinWriter, TestMSUtils, TestSA, TestMisc, TestRunManifestWriter, SearchParamsTest, TestPercolator, MSGFPlusOptionsAdapterTest, TestMinSpectraPerThread): 48 tests, 0 failures, 0 errors.
Replace String/Integer holders for the four range-shaped flags with
small typed value classes carrying their own parse/converter logic:
- cli/PrecursorTolerance (Tolerance left, Tolerance right) -- handles
both symmetric ("20ppm") and asymmetric ("0.5Da,2.5Da") forms,
validates unit-match and non-negative, exposed via picocli converter.
- cli/IntRange (int min, int max) -- inclusive range, accepts "min,max"
or single "n" (interpreted as "n,n"), used by -ti, -msLevel, -index.
MSGFPlusOptions fields now use these typed objects directly:
precursorTolerance: String -> PrecursorTolerance
isotopeErrorRange: String -> IntRange
msLevel: String -> IntRange
specIndexRange: String -> IntRange
The adapter still round-trips them through Parameter.parse(String)
via toString() for now -- that round-trip goes away in Phase 4c
step 3 when SearchParams.parse reads from MSGFPlusOptions directly
and the legacy params/ hierarchy is deleted.
Existing equivalence test still passes (asymmetric "0.5Da,2.5Da"
case included). Build green.
SearchParams now reads directly from MSGFPlusOptions; the legacy ParamManager + Parameter.parse round-trip is gone. What changed: 1. MSGFPlusOptions grew effective*() resolvers for every flag (with defaults: 6/40 lengths, 2/3 charges, proton mass, etc.), typed lookups for ActivationMethod/InstrumentType/Enzyme/Protocol from the existing registries, and an applyConfigFile(File) overlay that reproduces the legacy alias rewrites (IsotopeError -> IsotopeErrorRange, FragmentationMethod -> FragmentationMethodID, etc., 14 total) plus collects DynamicMod/StaticMod/CustomAA into ordered lists. CLI fields take precedence -- config-file values only fill in null fields. 2. SearchParams.parse(MSGFPlusOptions) replaces parse(ParamManager). The 35+ paramManager.getXxx() call sites translate one-for-one to opts.effective*() / opts.* reads. parseConfigParamFile is gone (its work is now opts.applyConfigFile). Spectrum-format "isSupported" check is now a small whitelist instead of routing through FileParameter. 3. AminoAcidSet.getAminoAcidSetFromModFile and the new getAminoAcidSetFromModEntries (replacing getAminoAcidSetFromList) take MSGFPlusOptions and call opts.setMaxNumModsFromMetadata when the loaded mod metadata declares a different value, mirroring the former paramManager.setMaxNumMods bidirectional handshake. 4. MSGFPlus.main no longer creates a ParamManager. argv -> picocli -> MSGFPlusOptions -> SearchParams.parse(opts), with picocli's own usage()/printToolInfo()/printJVMInfo() replacing the ParamManager equivalents. 5. Adapter (MSGFPlusOptionsAdapter) deleted -- the round-trip via Parameter.parse(String) is no longer needed. 6. Tests migrated to construct MSGFPlusOptions directly: SearchParamsTest, TestRunManifestWriter, TestDirectPinWriter, TestPrecursorCalScaffolding, TestPrecursorCalIntegration, TestPercolator, TestMSUtils, TestIPRG, TestCollaboration, TestCandidatePeptideGrid(ConsideringMetCleavage), TestSA, TestMinSpectraPerThread. MSGFPlusOptionsAdapterTest deleted (adapter is gone). TestIntRangeParameter deleted (params/ IntRangeParameter is the next thing to go in Phase 3). Validation: scoped sweep -- 73 tests, 0 failures, 0 errors, 5 skipped. The legacy params/ hierarchy still compiles but has no live callers on the MS-GF+ path; Phase 3 deletes it next.
After Phase 4c routed SearchParams + AminoAcidSet through MSGFPlusOptions directly, the entire params/ package became unreferenced on the live MSGFPlus path. Deleting: - ParamManager (1,059 LOC after Phase 4a/4b cleanup) -- replaced by cli.MSGFPlusOptions + its effective*() resolvers and applyConfigFile(). - Parameter, NumberParameter, RangeParameter (abstract bases) - IntParameter, FloatParameter, DoubleParameter, IntRangeParameter, FloatRangeParameter (typed leaf classes) - StringParameter, FileParameter, FileListParameter (file/string types) - ToleranceParameter (replaced by cli.PrecursorTolerance) - EnumParameter, ObjectEnumParameter (enum machinery; the dynamic enum dispatch now lives inline in MSGFPlusOptions.effective*()) - ParamParser (legacy config-file reader; replaced by MSGFPlusOptions.applyConfigFile) - CaseInsensitiveLinkedHashMapParam, CaseInsensitiveMap (the Parameter map that backed ParamManager) Two small helper types (ParamObject interface + UserParam config-file helper) are still consumed by msutil's runtime registries (ActivationMethod, InstrumentType, Enzyme, Protocol). They have no dependency on the rest of the params/ hierarchy, so they relocate to edu.ucsd.msjava.msutil where their consumers already live. Validation: - Clean compile (main + tests) passes. - Scoped sweep (TestDirectPinWriter, TestMSUtils, TestSA, TestMisc, TestRunManifestWriter, SearchParamsTest, TestPercolator, TestMinSpectraPerThread, TestPrecursorCalScaffolding, TestCandidatePeptideGrid + ConsideringMetCleavage): 73 tests, 0 failures, 0 errors, 5 skipped. The legacy MSGFPlus CLI surface is fully preserved via the typed picocli @option fields + applyConfigFile()'s alias rewrites; this is purely a maintainability cleanup that drops ~2,100 LOC of custom parameter-parsing scaffolding.
Code reviewFound 1 issue:
msgfplus/src/main/java/edu/ucsd/msjava/msutil/AminoAcidSet.java Lines 824 to 834 in f5f3c47 Likely fix: drop the |
Remove legacy text-format parsers; only MGF and mzML are retained. - Delete MS2SpectrumParser, PklSpectrumParser, PNNLSpectrumParser, PNNLSpectraIterator, PNNLSpectraMap from parser/ - Delete dead-code SPTxtParser, SpectrumParserWithTitle, TSVParser, TSVResultParser, FullyBufferedLineReader from parser/ - Delete SpectraMapByTitle and SpectrumAccessorByTitle from msutil/ (only callers were in the deleted SPTxtParser) - Remove MS2/PKL/DTA_TXT/MZDATA entries from SpecFileFormat - Prune corresponding branches from SpectraAccessor.getSpecMap() and getSpecItr() and getSpectrumIDFormatCvParam() - Prune Spectrum.getSpectrumFileFormat() to mzML + MGF only - Update SearchParams.isSupportedSpectrumFormat() to mzML + MGF only - Update MSGFPlusOptions -s description to list only *.mzML, *.mgf - Remove @ignore generateTRexPRMSpectra test (used deleted TSVParser)
Pure file-move + package/import update; no behaviour change. - git mv the 6 remaining files under parser/ into mgf/ (BufferedLineReader, BufferedRandomAccessLineReader, LineReader, MgfSpectrumParser, SpectrumParser, UnicodeBOMInputStream) - Update package declaration in each moved file from edu.ucsd.msjava.parser → edu.ucsd.msjava.mgf - Update import edu.ucsd.msjava.parser.* → edu.ucsd.msjava.mgf.* across 13 callers in fdr/, mzml/, msutil/, msscorer/, msdbsearch/
The Phase 4c review surfaced four issues that all bottomed out in small details of the new typed-options path. Bundling the fixes plus a regression test: 1. **CustomAA= crash (critical, was code review issue #1).** AminoAcidSet.getAminoAcidSetFromModEntries was prepending "CustomAA=" to each entry before handing it to parseConfigEntry, but parseConfigEntry only strips the "nummods=" prefix -- every other line is split on commas and modInfo[0] is parsed as a mass or empirical formula. With the prefix attached, modInfo[0] became "CustomAA=C3H5NO" which fails Double.parseDouble + Composition.getMass and triggers the System.exit(-1) in the caller. Any -conf file with a CustomAA= line crashed the process. MSGFPlusOptions.applyConfigEntry already strips the "Key=" prefix when populating opts.customAAs; the fix is to drop the literal at AminoAcidSet:826 so the bare value reaches parseConfigEntry, matching how staticMods/dynamicMods are passed at line 831. New regression test MSGFPlusOptionsConfigFileTest.configFileWithCustomAAParsesWithoutCrashing pins this with a tiny synthetic config file. 2. **-decoy default doc.** @option description now says "Default: XXX" (the actual value returned by effectiveDecoyPrefix and the same constant the legacy code used via MSGFPlus.DEFAULT_DECOY_PROTEIN_PREFIX). The "Default: DECOY_" string was wrong since Phase 1a. 3. **Single-file spectrum-format null check.** The single-file branch in SearchParams.parse used to silently store a null SpecFileFormat into DBSearchIOFiles when the user supplied -s file.bogus, which later NPE'd at MSGFPlus:305 (specFormat.getPSIName()). It now short-circuits with the same message the directory branch's isSupportedSpectrumFormat filter implies: "Spectrum file extension does not match a supported format (*.mzML, *.mgf): <name>". 4. **Unrecognized config-key URL hint.** The legacy parseConfigParamFile tracked an invalid-parameter counter and, after closing the file, printed the example-params URL hint exactly once if the count was non-zero. MSGFPlusOptions.applyConfigFile now restores that behaviour with a private unrecognizedConfigEntries counter incremented inside the default branch of applyConfigEntry, plus the same end-of-file hint. Scoped tests pass (68 tests, 0 failures, 0 errors).
The Phase 4c MSGFPlusOptions.effectiveActivationMethod() switch hardcoded indices 0..3 (ASWRITTEN/CID/ETD/HCD) and threw IllegalArgumentException for index 4. The legacy addFragMethodParam(ActivationMethod.ASWRITTEN, doNotAddMergeMode=true) hid the registry's FUSION (slot 4), so the user-facing menu was 0..3 + UVPD at index 4. The Phase 4c rewrite silently dropped UVPD support; this commit restores it. - Add case 4: return ActivationMethod.UVPD; to the switch. - Update the @option description to enumerate 4=UVPD. - New unit test (MSGFPlusOptionsActivationMethodTest) pins the full 0..4 mapping plus the default and the out-of-range guard. docs/msgfplus.md already documents -m 4 = UVPD; this brings the code in line with the doc.
Remove stale references that no longer match the modernized code: - README.md: Quick-Start examples now write .pin (default) or use -outputFormat tsv; the deleted MzIDToTsv conversion step is gone. The "What is MS-GF+?" paragraph and "What is different in this fork?" bullets describe the actual current state (mzIdentML output removed; spectrum input narrowed to mzML + mgf only; picocli-based CLI). The required-input table now shows *.mzML / *.mgf only and the -o default is [input].pin. - docs/msgfplus.md: -s synopsis trimmed to "*.mzML or *.mgf" in three places; -allowDenseCentroidedPeaks no longer mentions mzXML; the duplicated "mzML, mzXML, mzML" typo is fixed. - docs/examples/MSGFPlus_Params.txt: SpectrumFile comment trimmed to *.mzML / *.mgf. - docs/readme.md: Input/Output summary now matches the fork's actual format support; obsolete ms-gfdb.md link removed (the doc and entry point were deleted in 5a2ec4e). - docs/troubleshooting.md: FASTA-split workaround now describes concatenating .pin / .tsv outputs (mzIdentML and MzidMerger no longer apply). The OpenMS TOPPAS workaround now feeds the .pin via PercolatorAdapter instead of importing a non-existent .mzid. - src/test/resources/MSGFDB_Param.txt: removed showDecoy=1 and uniformAAProb=auto (both ParamNameEnum entries were dropped in 5a2ec4e and now produce "unrecognized parameter" warnings on every test run). Normalized ParentMassTolerance -> PrecursorMassTolerance and IsotopeError -> IsotopeErrorRange to use the canonical keys; the canonicalConfigKey aliases keep the old names working but the test fixture should be self-documenting. No code changes; the existing scoped test sweep continues to pass.
The user code-reviewed the unpushed branch tip and surfaced three behaviour regressions that all date back to the Phase 4c ParamManager retire (commit 03f32c1) plus a few small cleanups. P1 -- config-file keys are now matched case-insensitively, restoring the legacy ParamManager.parseConfigParamFile semantics. The Phase 4c switch was exact-case so test fixtures using lowercase first-letter keys (e.g. "minCharge=2", "maxCharge=5" in MSGFDB_Param.txt) were silently dropped to defaults instead of overriding them. The fix lowercases canonicalConfigKey output and makes every applyConfigEntry case label lowercase. New regression test configFileKeysAreMatchedCaseInsensitively pins the contract with a mix of canonical, lowercased-first-letter, and ALLCAPS forms. P2 -- invalid enum-like CLI indices (-m 99, -inst 99, -e 99, -protocol 99) and out-of-range numerics now produce a clean user-facing error from SearchParams.parse instead of an IllegalArgumentException stack trace from the resolver. validate() is now invoked in place of validateRequired() and runs the bounds checks up-front. P2 -- restored the legacy IntParameter.minValue/maxValue range checks for: -thread, -tasks, -minSpectraPerThread, -minLength, -maxLength, -minCharge, -maxCharge, -n, -ntt, -tda, -verbose, -addFeatures, -allowDenseCentroidedPeaks, -edgeScore, -ignoreMetCleavage, -maxMissedCleavages, -numMods, -ccm, -u, -m, -inst, -e, -protocol, plus the hidden flags. New unit test validateRejectsOutOfRangeFlags pins a representative set. Polish: - Drop the dead Phase 1 / MSGFPlusOptionsAdapter / ParamManager rollout narrative from the MSGFPlusOptions class header (both the adapter class and ParamManager were deleted in earlier commits; the comment was stale). - Collapse the unrecognizedConfigEntries field + local probe counter in applyConfigFile into a single counter reset at start and read at end -- one piece of state, simpler control flow. - Strip the CRLF (\r before \n) on docs/examples/MSGFPlus_Params.txt and src/test/resources/MSGFDB_Param.txt, which git diff --check was flagging as trailing whitespace. The rest of the codebase is LF-only. Verified: - mvn -B -o test on the scoped sweep (77 tests, 0 failures, 0 errors, 3 skipped). - SearchParamsTest no longer warns "unrecognized parameter 'minCharge=2'" / "'maxCharge=5'" when loading MSGFDB_Param.txt. - git diff --check on this commit is clean.
Drop the lingering String + numeric backcompat for the two enum-shaped flags whose values are real names rather than IDs (per user direction: 'unless are options like 1 2 3 we should do only string'). After this commit: - -outputFormat accepts only `pin` / `tsv` (case-insensitive). The legacy numeric forms `0` / `1` are no longer recognised; users on those invocations should switch to the named values. This is a deliberate breaking change called out in the parameter-modernization cleanup -- consistency over backcompat in this corner. - -precursorCal continues to accept only `auto` / `on` / `off` (case-insensitive), but now via picocli's typed enum matcher rather than a String + fromString fallback. Invalid values fail fast at parse time instead of silently mapping to AUTO. Numeric-ID flags (-m, -inst, -e, -protocol) and 0/1 boolean-style flags (-tda, -verbose, -addFeatures, -allowDenseCentroidedPeaks, -edgeScore, -ignoreMetCleavage, -u, -ntt) keep their integer types -- those values are IDs, not names. Implementation: - New cli.OutputFormat enum (PIN, TSV). - MSGFPlusOptions.outputFormat: String -> OutputFormat. - MSGFPlusOptions.precursorCalMode: String -> SearchParams.PrecursorCalMode. - effectiveOutputFormat() now returns OutputFormat (was int 0/1). - effectivePrecursorCalRaw() collapsed into effectivePrecursorCal() returning the typed enum. - applyConfigEntry parses both flags via Enum.valueOf so config-file values like `OutputFormat=pin` and `PrecursorCal=auto` flow through the same case-insensitive contract as the CLI. - SearchParams.outputFormat field: int -> OutputFormat. writePin() / writeTsv() helpers retained (callers in MSGFPlus.runMSGFPlus). - SearchParams.PrecursorCalMode.fromString() deleted -- no callers after the resolver returns the typed enum directly. - New static factory MSGFPlusOptions.commandLine(opts) returns a CommandLine with caseInsensitiveEnumValuesAllowed(true). All call sites (MSGFPlus.main + 5 test files) routed through it so enum case-insensitivity is uniform. - docs/output.md updated to show `-outputFormat pin` / `tsv` and notes the numeric forms are no longer accepted. Tests: TestDirectPinWriter.outputFormatAcceptsOnlyPinAndTsv pins the new contract (numeric/legacy values rejected, named values accepted case-insensitively). TestPrecursorCalScaffolding migrated to enum constants and to a picocli rejection check for invalid values. The old fromString-fallback test is replaced by the rejection test. Scoped sweep: 78 tests, 0 failures, 0 errors, 4 skipped.
Expand the vNEXT entry to cover the full PR #25 modernization stack: - CLI parser modernisation: picocli-driven MSGFPlusOptions; -conf flow with case-insensitive keys + 13 legacy aliases preserved; numeric/enum range validation surface restored. - Breaking changes: -outputFormat numeric forms (0/1) removed in favour of named pin/tsv (typed enum); -precursorCal switched to typed enum (rejects unknown values instead of silently mapping to AUTO); spectrum input narrowed to mzML + mgf only (mzXML, MS2, PKL, _dta.txt parsers deleted); deprecated MSGFDB entry point and its dead MSGF/MSGFLib siblings removed. - Internal refactor: edu.ucsd.msjava.params package deleted (~2,100 LOC across 18 classes); package reorg (ui/ -> cli/, mzid/ -> output/, parser/ -> mgf/, net.pempek.unicode -> mgf); new typed value classes (MSGFPlusOptions, PrecursorTolerance, IntRange, OutputFormat); picocli 4.7.6 dep added. - Bench gate: prior Astral 3-arm run confirmed bit-identical PSM target/decoy counts (89,479 / 46,792) between baseline and new branch in -precursorCal off mode. Table embedded in the entry. - Earlier in cycle: precursor calibration + Percolator pin output bullets retained. No code change; pure docs.
Combined sweep of cuts surfaced by the post-modernization LOC audit. Net change: 13 files, +57 / -2,131 = -2,074 LOC. ### Deleted dead classes (-2,051 LOC, 6 files) Verified zero callers across src/main and src/test before removal: - msscorer/ScoringParameterGenerator (732 LOC) and ScoringParameterGeneratorWithErrors (880 LOC) -- standalone main() scoring-param tools from the pre-modernization era. Pre-built scoring model .tsv files are committed in resources; these generators are not invoked at search time and have no remaining consumers. - output/Unimod (85 LOC) and output/UnimodComposition (133 LOC) -- residue from the deleted mzIdentML write side. Atom.java's "Unimod mod bricks" comment is the only remaining reference and refers to upstream data, not the deleted classes. - msgf/ToolLauncher (154 LOC) -- abstract launcher with no concrete implementations. - msutil/ScoredString (67 LOC) -- duplicate of fdr/ScoredString with no live callers (all usages resolve to the fdr.* version). ### Inlined effective*() resolvers + helper collapse (-23 LOC net) - ~20 of the trivial `field != null ? field : default` resolvers in MSGFPlusOptions are inlined at the SearchParams.parse call sites. The non-trivial registry-resolving ones (effectiveActivationMethod, effectiveInstrumentType, effectiveEnzyme, effectiveProtocol) and a handful of frequent ones (effectiveOutputFormat, effectiveMin/Max PeptideLength etc.) stay since their length pays for itself. - ActivationMethodAvailability nested class collapsed -- it only hid one InstrumentType.getAllRegisteredInstrumentTypes().length call that is now inline in validate(). - SearchParams.getOutputFormat() and SearchParams.writePin() removed -- writePin() had two callers in MSGFPlus.java which now use !writeTsv() instead; getOutputFormat() had zero callers. ### stripComment dedup (-7 LOC) The two implementations of "split-on-#-and-trim" (SearchParams. getConfigLineWithoutComment and MSGFPlusOptions.stripComment) collapsed: stripComment is the canonical version (package-public), SearchParams.getConfigLineWithoutComment delegates to it, and AminoAcidSet.parseConfigEntry calls stripComment directly. ### Validation surface expanded MSGFPlusOptions.validate() now also rejects a -mod / ModificationFile= path that does not exist, returning a user-facing error string. New regression test (validateRejectsMissingModificationFile) pins both the CLI path and the config-file path. Verified: scoped sweep (TestDirectPinWriter, TestMSUtils, TestSA, TestMisc, TestRunManifestWriter, SearchParamsTest, TestPercolator, TestMinSpectraPerThread, TestPrecursorCalScaffolding, TestCandidatePeptideGrid + ConsideringMetCleavage, MSGFPlusOptionsConfigFileTest, MSGFPlusOptionsActivationMethodTest): 78 tests, 0 failures, 0 errors, 3 skipped.
…quences
Net change: 21 files, +24 / -409 = -385 LOC. Combines the audit's
medium-value pass with a parallel sweep across the codebase for
methods and fields with no live callers.
### Dependencies dropped from pom.xml
- commons-text: zero direct usages.
- commons-io: NullOutputStream replaced by Java 11's
OutputStream.nullOutputStream() in ConcurrentMSGFPlus;
FilenameUtils.removeExtension replaced by an inline lastIndexOf('.')
+ substring in BuildSA.
Removing commons-text also drops its transitive commons-lang3, which
was used by mgf/BufferedRandomAccessLineReader for Pair<String,
Integer>. Replaced with a small local record BomStripResult
(text, bomLength) — single-purpose, no API consumers, simpler at
the use sites (result.text() / result.bomLength() vs result.getKey()
/ result.getValue()).
### Dead code removed across packages
Verified zero remaining callers for each:
- fdr/TargetDecoyAnalysis: ~38 lines of commented-out legacy
constructors that referenced a non-existent TargetDecoyPSMSet
class. Pure comment cleanup.
- fdr/TSVPSMSet: 14 LOC of dead methods.
- mgf/MgfSpectrumParser: 35 LOC of dead methods.
- msgf/AAFrequencyCounter: 44 LOC of dead helpers.
- msgf/MassListComparator: 15 LOC.
- msscorer/NewRankScorer + NewScorerFactory: 31 LOC of dead
helpers (kept all live scoring-path code per CLAUDE.md invariants).
- msutil/AminoAcidSet, Composition, CompositionFactory, IonType,
Peptide: ~110 LOC of dead helpers and fields.
- sequences/FastaSequence + FastaSequences + ProteinFastaSequence
+ ProteinFastaSequences: ~76 LOC of dead methods.
### Other cleanup
- SearchParams.toString: drop 6 lines of commented-out spectrum-list
output and switch StringBuffer (synchronized, single-threaded
caller) to StringBuilder.
Verified: scoped sweep (TestDirectPinWriter, TestMSUtils, TestSA,
TestMisc, TestRunManifestWriter, SearchParamsTest, TestPercolator,
TestMinSpectraPerThread, TestPrecursorCalScaffolding,
TestCandidatePeptideGrid + ConsideringMetCleavage,
MSGFPlusOptionsConfigFileTest, MSGFPlusOptionsActivationMethodTest):
78 tests, 0 failures, 0 errors, 3 skipped.
Four test files (TestDirectPinWriter, TestPrecursorCalScaffolding, TestPrecursorCalIntegration, TestRunManifestWriter) had nearly identical buildOpts()/parsedSearchParams() helpers loading the standard MSGFDB_Param.txt + test.mgf + human-uniprot-contaminants.fasta fixture set. Collapse into a single shared SearchTestFixtures.standardOpts() helper in src/test/java/edu/ucsd/ msjava/cli; the two tests that need extra fields (output file, maxMissedCleavages) call standardOpts() and tweak the result. Net change: -25 LOC, plus tests now stale-import-free. Verified: scoped sweep (42 tests across the 4 affected files + MSGFPlusOptionsConfigFileTest + MSGFPlusOptionsActivationMethodTest + SearchParamsTest): 0 failures, 0 errors.
Scrub 31 Java source files of five comment categories: - single-line javadoc restating the method/field name - block javadoc containing only @param/@return lines - /* (non-Javadoc) */ markers - section headers (GETTERS / SETTERS / CONSTRUCTORS / etc.) - commented-out code blocks (dead debug printlns, old algorithms, inactive fields, alternative scorer variants) Non-obvious WHY comments, license headers, performance invariants, CLI contract javadoc, and Picocli @option descriptions are preserved unchanged. LOC delta: -2262 lines, +42 lines across 31 files. Scoped test sweep: 78 tests run, 0 failures, 0 errors, 3 skipped (TestMSUtils×1, TestSA×2 — pre-existing skips unrelated to this change).
Continues the comment cleanup pass with a focused removal of
disabled-but-kept Java code masquerading as comments. Net change:
32 files, +1 / -323 = -322 LOC.
What was removed:
- fdr/TargetDecoyAnalysis: dead 10-line if/else block at the tail of
getFDRMap() that re-set fdrMap edges no longer needed after the
q-value conversion.
- fdr/ComputeFDR: 2-line commented-out specFileCol guard.
- msdbsearch/{CandidatePeptideGrid,CandidatePeptideGridConsideringMetCleavage,
CompactFastaSequence, CompactSuffixArray, DBScanner, MassErrorStat,
PSMFeatureFinder} and msgf/{ProfileGF, FlexAminoAcidGraph,
ScoredSpectrumSum}: scattered // foo() / /* alt impl */ blocks
documenting old algorithm variants and disabled debug printlns.
- msutil/{AminoAcid, Composition, Constants, IonType, VolatileAminoAcid}:
~90 LOC of commented-out static initializers, alternate composition
tables, and debug-only branches.
- sequences/{FastaSequence, FastaSequences, ProteinFastaSequences}
and suffixarray/SuffixArray: ~37 LOC of dead disabled stubs.
Genuine prose comments (TODOs, performance invariants, why-explanations,
URLs to upstream issues, the `applyShift` bit-identical-path guard)
are preserved.
Verified: scoped sweep (TestDirectPinWriter, TestMSUtils, TestSA,
TestMisc, TestRunManifestWriter, SearchParamsTest, TestPercolator,
TestMinSpectraPerThread, TestPrecursorCalScaffolding,
TestCandidatePeptideGrid + ConsideringMetCleavage,
MSGFPlusOptionsConfigFileTest, MSGFPlusOptionsActivationMethodTest):
78 tests, 0 failures, 0 errors, 3 skipped.
There was a problem hiding this comment.
Pull request overview
This PR refactors the MS-GF+ search execution path to reduce synchronization/allocation overhead and adds instrumentation for per-task wall-time breakdowns, while also continuing the CLI/parameter modernization work (typed options via picocli) and pruning legacy/unused code paths.
Changes:
- Add per-task wall-time instrumentation and switch to per-task result buffers (drained/merged after completion) to reduce contention.
- Remove redundant synchronization in task-owned structures and trim allocations (e.g., reusable null sink).
- Continue CLI modernization (typed
MSGFPlusOptions, typed-outputFormat, typed ranges/tolerances), remove legacy spectrum formats/parsers, and update docs/tests accordingly.
Reviewed changes
Copilot reviewed 160 out of 160 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/test/java/msgfplus/TestSA.java | Update tests to use MSGFPlusOptions instead of legacy ParamManager. |
| src/test/java/msgfplus/TestRunManifestWriter.java | Use SearchTestFixtures + MSGFPlusOptions for manifest parsing tests. |
| src/test/java/msgfplus/TestPrecursorCalScaffolding.java | Update precursor calibration tests to picocli/typed options. |
| src/test/java/msgfplus/TestPrecursorCalIntegration.java | Update integration tests to MSGFPlusOptions + typed PrecursorCalMode. |
| src/test/java/msgfplus/TestPercolator.java | Switch ignored Percolator test harness to picocli-based options parsing. |
| src/test/java/msgfplus/TestMisc.java | Remove an ignored legacy TSV-based generator helper and update MSGFPlus import. |
| src/test/java/msgfplus/TestMinSpectraPerThread.java | Rework tests around MSGFPlusOptions.effectiveMinSpectraPerThread(). |
| src/test/java/msgfplus/TestMSUtils.java | Update mod-file parsing test to use MSGFPlusOptions. |
| src/test/java/msgfplus/TestMSLevelFiltering.java | Remove legacy ParamManager-based msLevel tests. |
| src/test/java/msgfplus/TestIntRangeParameter.java | Remove legacy IntRangeParameter tests. |
| src/test/java/msgfplus/TestIPRG.java | Update ignored IPRG test harness to MSGFPlusOptions/picocli. |
| src/test/java/msgfplus/TestDirectPinWriter.java | Update .pin writer shape tests to new packages and typed OutputFormat. |
| src/test/java/msgfplus/TestCollaboration.java | Update ignored collaboration harness to MSGFPlusOptions/picocli. |
| src/test/java/msgfplus/TestCandidatePeptideGrid.java | Update tests to pass MSGFPlusOptions into AA set builders. |
| src/test/java/edu/ucsd/msjava/msdbsearch/TestConcurrentMSGFPlus.java | Add/adjust tests for task-local result buffers and drain semantics. |
| src/test/java/edu/ucsd/msjava/msdbsearch/SearchParamsTest.java | Update SearchParams parsing tests to MSGFPlusOptions. |
| src/test/java/edu/ucsd/msjava/cli/SearchTestFixtures.java | New shared test fixture builder for standard conf+mgf+fasta resources. |
| src/test/java/edu/ucsd/msjava/cli/MSGFPlusOptionsConfigFileTest.java | New regression tests for config overlay, validation, and CustomAA handling. |
| src/test/java/edu/ucsd/msjava/cli/MSGFPlusOptionsActivationMethodTest.java | New test pinning -m index → ActivationMethod mapping (incl. UVPD). |
| src/main/resources/META-INF/MANIFEST.MF | Update jar entrypoint main class to edu.ucsd.msjava.cli.MSGFPlus. |
| src/main/java/edu/ucsd/msjava/suffixarray/SuffixArray.java | Remove dead debug/commented code. |
| src/main/java/edu/ucsd/msjava/sequences/ProteinFastaSequences.java | Remove dead debug main method and stale prints. |
| src/main/java/edu/ucsd/msjava/sequences/ProteinFastaSequence.java | Remove dead debug main method. |
| src/main/java/edu/ucsd/msjava/sequences/FastaSequences.java | Remove dead debug main method and stale prints. |
| src/main/java/edu/ucsd/msjava/parser/TSVResultParser.java | Remove unused legacy TSV parsing helper. |
| src/main/java/edu/ucsd/msjava/parser/TSVParser.java | Remove unused legacy TSV parser. |
| src/main/java/edu/ucsd/msjava/parser/SpectrumParserWithTitle.java | Remove legacy title-based parser interface. |
| src/main/java/edu/ucsd/msjava/parser/SPTxtParser.java | Remove legacy SPTXT parsing implementation. |
| src/main/java/edu/ucsd/msjava/parser/PklSpectrumParser.java | Remove legacy PKL parsing implementation. |
| src/main/java/edu/ucsd/msjava/parser/PNNLSpectraMap.java | Remove legacy PNNL dta.txt SpectraMap. |
| src/main/java/edu/ucsd/msjava/parser/PNNLSpectraIterator.java | Remove legacy PNNL dta.txt iterator. |
| src/main/java/edu/ucsd/msjava/parser/MS2SpectrumParser.java | Remove legacy MS2 parsing implementation. |
| src/main/java/edu/ucsd/msjava/parser/FullyBufferedLineReader.java | Remove unused fully-buffered reader. |
| src/main/java/edu/ucsd/msjava/params/ToleranceParameter.java | Remove legacy parameter-stack classes (tolerance). |
| src/main/java/edu/ucsd/msjava/params/StringParameter.java | Remove legacy parameter-stack classes (string). |
| src/main/java/edu/ucsd/msjava/params/RangeParameter.java | Remove legacy parameter-stack classes (range). |
| src/main/java/edu/ucsd/msjava/params/Parameter.java | Remove legacy parameter-stack base class. |
| src/main/java/edu/ucsd/msjava/params/ParamParser.java | Remove legacy config/CLI parsing helper. |
| src/main/java/edu/ucsd/msjava/params/ObjectEnumParameter.java | Remove legacy enum/object parameter wrapper. |
| src/main/java/edu/ucsd/msjava/params/NumberParameter.java | Remove legacy numeric parameter wrapper. |
| src/main/java/edu/ucsd/msjava/params/IntRangeParameter.java | Remove legacy int-range parameter type. |
| src/main/java/edu/ucsd/msjava/params/IntParameter.java | Remove legacy int parameter type. |
| src/main/java/edu/ucsd/msjava/params/FloatRangeParameter.java | Remove legacy float-range parameter type. |
| src/main/java/edu/ucsd/msjava/params/FloatParameter.java | Remove legacy float parameter type. |
| src/main/java/edu/ucsd/msjava/params/FileParameter.java | Remove legacy file parameter type. |
| src/main/java/edu/ucsd/msjava/params/FileListParameter.java | Remove legacy file-list parameter type. |
| src/main/java/edu/ucsd/msjava/params/EnumParameter.java | Remove legacy enum parameter type. |
| src/main/java/edu/ucsd/msjava/params/DoubleParameter.java | Remove legacy double parameter type. |
| src/main/java/edu/ucsd/msjava/params/CaseInsensitiveMap.java | Remove legacy case-insensitive map helper. |
| src/main/java/edu/ucsd/msjava/params/CaseInsensitiveLinkedHashMapParam.java | Remove legacy case-insensitive parameter map. |
| src/main/java/edu/ucsd/msjava/output/DirectTSVWriter.java | Move DirectTSVWriter into edu.ucsd.msjava.output package. |
| src/main/java/edu/ucsd/msjava/output/DirectPinWriter.java | Move DirectPinWriter into edu.ucsd.msjava.output package. |
| src/main/java/edu/ucsd/msjava/mzml/StaxMzMLSpectraIterator.java | Update SpectrumParser import to new mgf package. |
| src/main/java/edu/ucsd/msjava/mzml/StaxMzMLParser.java | Trim/adjust API comments in public methods. |
| src/main/java/edu/ucsd/msjava/mzid/UnimodComposition.java | Remove unused mzIdentML-era Unimod composition helper. |
| src/main/java/edu/ucsd/msjava/mzid/Unimod.java | Remove unused mzIdentML-era Unimod loader. |
| src/main/java/edu/ucsd/msjava/msutil/VolatileAminoAcid.java | Remove dead debug prints. |
| src/main/java/edu/ucsd/msjava/msutil/UserParam.java | Move UserParam into msutil and update reader import. |
| src/main/java/edu/ucsd/msjava/msutil/SpectrumAccessorByTitle.java | Remove title-based accessor interface. |
| src/main/java/edu/ucsd/msjava/msutil/SpectraMapByTitle.java | Remove title-based SpectraMap implementation. |
| src/main/java/edu/ucsd/msjava/msutil/SpectraMap.java | Update imports to mgf reader/parser packages. |
| src/main/java/edu/ucsd/msjava/msutil/SpectraIterator.java | Update imports to mgf reader/parser packages. |
| src/main/java/edu/ucsd/msjava/msutil/SpectraContainer.java | Update SpectrumParser import to mgf package. |
| src/main/java/edu/ucsd/msjava/msutil/SpectraAccessor.java | Narrow supported spectrum formats to mzML+MGF and update parser wiring. |
| src/main/java/edu/ucsd/msjava/msutil/SpecKey.java | Update SpectrumParser import and trim debug comments. |
| src/main/java/edu/ucsd/msjava/msutil/SpecFileFormat.java | Remove legacy format constants; keep mzML+MGF only. |
| src/main/java/edu/ucsd/msjava/msutil/ScoredString.java | Remove unused msutil ScoredString implementation. |
| src/main/java/edu/ucsd/msjava/msutil/Protocol.java | Remove obsolete imports tied to removed params package. |
| src/main/java/edu/ucsd/msjava/msutil/ParamObject.java | Move ParamObject into msutil package. |
| src/main/java/edu/ucsd/msjava/msutil/Pair.java | Replace verbose Javadoc with concise class comment; keep behavior. |
| src/main/java/edu/ucsd/msjava/msutil/Modification.java | Trim/clarify comments; keep modification identity logic. |
| src/main/java/edu/ucsd/msjava/msutil/Matter.java | Trim/clarify comments. |
| src/main/java/edu/ucsd/msjava/msutil/IonType.java | Remove dead debug main and commented debug prints. |
| src/main/java/edu/ucsd/msjava/msutil/InstrumentType.java | Remove obsolete ParamObject import and stale commented code. |
| src/main/java/edu/ucsd/msjava/msutil/Constants.java | Remove dead/commented blocks. |
| src/main/java/edu/ucsd/msjava/msutil/CompositionFactory.java | Remove empty main method. |
| src/main/java/edu/ucsd/msjava/msutil/Composition.java | Remove large blocks of unused commented code. |
| src/main/java/edu/ucsd/msjava/msutil/ActivationMethod.java | Remove obsolete imports and dead debug prints. |
| src/main/java/edu/ucsd/msjava/msscorer/PrecursorOffsetFrequency.java | Remove dead debug prints. |
| src/main/java/edu/ucsd/msjava/msscorer/NewScorerFactory.java | Remove dead debug main method. |
| src/main/java/edu/ucsd/msjava/msscorer/NewScoredSpectrum.java | Trim debug comments and clarify one method doc. |
| src/main/java/edu/ucsd/msjava/msscorer/NewRankScorer.java | Remove dead/commented debug code and unused main/test hooks. |
| src/main/java/edu/ucsd/msjava/msscorer/IonProbability.java | Remove dead debug prints. |
| src/main/java/edu/ucsd/msjava/msscorer/FastScorer.java | Remove dead debug comments. |
| src/main/java/edu/ucsd/msjava/msscorer/DBScanScorer.java | Remove dead debug comments. |
| src/main/java/edu/ucsd/msjava/mslibsearch/ProcessedSpectrum.java | Remove unused library-search helper. |
| src/main/java/edu/ucsd/msjava/msgf/ToolLauncher.java | Remove unused tool launcher base class. |
| src/main/java/edu/ucsd/msjava/msgf/Tolerance.java | Trim comments and clarify getToleranceAsDa behavior. |
| src/main/java/edu/ucsd/msjava/msgf/ScoredSpectrumSum.java | Remove dead assertion comment. |
| src/main/java/edu/ucsd/msjava/msgf/ScoreDist.java | Remove dead debug/commented blocks. |
| src/main/java/edu/ucsd/msjava/msgf/ProfileGF.java | Remove dead commented code. |
| src/main/java/edu/ucsd/msjava/msgf/NominalMassFactory.java | Trim dead comment. |
| src/main/java/edu/ucsd/msjava/msgf/MassListComparator.java | Remove dead debug main method. |
| src/main/java/edu/ucsd/msjava/msgf/MSGFDBResultGenerator.java | Remove dead commented algorithm block. |
| src/main/java/edu/ucsd/msjava/msgf/Histogram.java | Remove dead debug prints. |
| src/main/java/edu/ucsd/msjava/msgf/GeneratingFunction.java | Remove dead/commented blocks; keep warning behavior. |
| src/main/java/edu/ucsd/msjava/msgf/FlexAminoAcidGraph.java | Remove dead commented code and tighten warning handling comments. |
| src/main/java/edu/ucsd/msjava/msgf/DeNovoGraph.java | Remove dead abstract method comment. |
| src/main/java/edu/ucsd/msjava/msgf/BacktrackTable.java | Remove dead commented residue mapping block. |
| src/main/java/edu/ucsd/msjava/msgf/AAFrequencyCounter.java | Remove dead debug main/generator helpers. |
| src/main/java/edu/ucsd/msjava/msdbsearch/ScoredSpectraMap.java | Drop synchronized wrappers on task-owned maps; trim dead comments. |
| src/main/java/edu/ucsd/msjava/msdbsearch/ReverseDB.java | Update MSGFPlus import to new CLI package. |
| src/main/java/edu/ucsd/msjava/msdbsearch/PeptideEnumerator.java | Update MSGFPlus import to new CLI package. |
| src/main/java/edu/ucsd/msjava/msdbsearch/PSMFeatureFinder.java | Remove dead commented fields and placeholders. |
| src/main/java/edu/ucsd/msjava/msdbsearch/MassErrorStat.java | Remove dead commented fields/accessors. |
| src/main/java/edu/ucsd/msjava/msdbsearch/LibraryScanner.java | Update BufferedLineReader import to new mgf package. |
| src/main/java/edu/ucsd/msjava/msdbsearch/ConcurrentMSGFPlus.java | Add task wall stats + task-local result buffers; reuse NULL print stream. |
| src/main/java/edu/ucsd/msjava/msdbsearch/CompactSuffixArray.java | Remove dead debug comments and simplify toString(). |
| src/main/java/edu/ucsd/msjava/msdbsearch/CompactFastaSequence.java | Update MSGFPlus import and remove dead commented bounds check. |
| src/main/java/edu/ucsd/msjava/msdbsearch/CandidatePeptideGridConsideringMetCleavage.java | Remove dead constructor comments. |
| src/main/java/edu/ucsd/msjava/msdbsearch/CandidatePeptideGrid.java | Remove dead constructor/addResidue comments. |
| src/main/java/edu/ucsd/msjava/msdbsearch/BuildSA.java | Remove commons-io FilenameUtils usage (inline extension stripping). |
| src/main/java/edu/ucsd/msjava/mgf/SpectrumParser.java | Move parser interface into edu.ucsd.msjava.mgf package. |
| src/main/java/edu/ucsd/msjava/mgf/MgfSpectrumParser.java | Move MGF parser into mgf package and trim legacy Javadoc/debug. |
| src/main/java/edu/ucsd/msjava/mgf/LineReader.java | Move LineReader into mgf package. |
| src/main/java/edu/ucsd/msjava/mgf/BufferedRandomAccessLineReader.java | Move reader into mgf package; replace tuple Pair with record. |
| src/main/java/edu/ucsd/msjava/mgf/BufferedLineReader.java | Move reader into mgf package. |
| src/main/java/edu/ucsd/msjava/fdr/TargetDecoyAnalysis.java | Remove dead commented constructors and debug prints. |
| src/main/java/edu/ucsd/msjava/fdr/TSVPSMSet.java | Update MSGFPlus import and remove dead peptide-parsing comments. |
| src/main/java/edu/ucsd/msjava/fdr/ScoredString.java | Trim verbose Javadoc. |
| src/main/java/edu/ucsd/msjava/fdr/Pair.java | Trim verbose Javadoc. |
| src/main/java/edu/ucsd/msjava/fdr/PSMSet.java | Remove dead commented method stub. |
| src/main/java/edu/ucsd/msjava/fdr/MSGFPlusPSMSet.java | Update MSGFPlus import to new CLI package. |
| src/main/java/edu/ucsd/msjava/fdr/ComputeQValue.java | Update imports to new mgf reader and cli MSGFPlus. |
| src/main/java/edu/ucsd/msjava/fdr/ComputeFDR.java | Remove dead commented validation stub. |
| src/main/java/edu/ucsd/msjava/cli/PrecursorTolerance.java | New typed precursor tolerance parser + picocli converter. |
| src/main/java/edu/ucsd/msjava/cli/OutputFormat.java | New typed output format enum (pin/tsv). |
| src/main/java/edu/ucsd/msjava/cli/IntRange.java | New typed inclusive int range + picocli converter. |
| pom.xml | Switch shaded jar main class to CLI entrypoint; swap commons-io for picocli. |
| docs/troubleshooting.md | Update troubleshooting guidance to .pin/.tsv outputs (no mzid). |
| docs/readme.md | Document narrowed spectrum input support and outputs. |
| docs/output.md | Update -outputFormat usage to named values. |
| docs/msgfplus.md | Update usage docs to reflect supported formats and mzML-only wording. |
| README.md | Update top-level README to reflect .pin/TSV outputs and narrowed inputs. |
| .claude/plans/parameter-modernization.md | Add design/plan document for picocli/parameter modernization. |
Comments suppressed due to low confidence (1)
src/main/java/edu/ucsd/msjava/mgf/BufferedLineReader.java:6
- UnicodeBOMInputStream requires an explicit skipBOM() call to actually consume the BOM, but BufferedLineReader's constructor currently just wraps the stream without calling skipBOM(). This can leave a BOM character at the start of the first line and break parsing of FASTA/mod/config files. Consider updating the constructor to call skipBOM() (or otherwise ensure the BOM is removed) while doing this package move.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | `-outputFormat pin` (default) | Percolator `.pin` | `.pin` | Feed to Percolator / MS²Rescore / Mokapot for FDR-calibrated rescoring | | ||
| | `-outputFormat tsv` | Tab-separated values | `.tsv` | Direct inspection / downstream tools that consume TSV | | ||
|
|
||
| `-outputFormat` accepts the named values `pin` and `tsv` (case-insensitive). Numeric forms (`0`, `1`) accepted by older releases are no longer recognised — pass the named value instead. |
There was a problem hiding this comment.
The PR description says there are no CLI default changes, but this diff changes the CLI contract by removing numeric -outputFormat values (0/1) and requiring named values (pin/tsv). If this is intentional, please call it out explicitly in the PR description/release notes (it’s a breaking change for existing scripts). If it’s not intentional, consider supporting numeric aliases for backward compatibility.
| - Input: HUPO PSI standard mzML and MGF only (mzXML, MS2, PKL, and `_dta.txt` are not supported in this fork). | ||
| - Output: Percolator `.pin` (default, for rescoring) or TSV. mzIdentML (`.mzid`) output has been removed — MS-GF+ now feeds downstream Percolator pipelines directly via `.pin`. See [Changelog](changelog.md) for migration notes. |
There was a problem hiding this comment.
The PR description focuses on search hot-path perf/refactors, but this diff explicitly narrows supported spectrum inputs to mzML+MGF (dropping mzXML/MS2/PKL/_dta.txt). That’s a user-visible breaking change; if it’s intended, it should be explicitly called out in the PR description (and ideally migration notes). If it’s not intended, SpectraAccessor/SpecFileFormat will need to retain the legacy format handlers.
|
Finished the sequential 3-dataset x 3-arm cold benchmark run locally and compared the completed native outputs against the previous saved 3-arm runs. Native search summary from this run:
Quick readout:
Wall-time comparison vs the previous saved 3-arm runs:
So the native outcomes are stable relative to the earlier 3-arm checkpoints, while runtime is materially better on PXD001819 and especially Astral; PXD007683 is effectively flat/slightly better. Separate review follow-up: the unused |
|
Small LOC note for the current pushed branch versus
That stat is from The tiny local |
Two follow-ups from the PR #25 review thread: 1. BOM strip in BufferedLineReader (Copilot finding). The constructor wrapped the FileInputStream in UnicodeBOMInputStream but never called skipBOM() -- BOM bytes leaked into line 1, breaking config / mod / FASTA files saved by editors that prepend a UTF-8 BOM (Windows Notepad, some VS Code configurations, etc.). Fix: chain .skipBOM() in the super(...) call. The BOM-stripping path is per-instance; the wrapper still detects the BOM in its constructor, we just now consume it. New test BufferedLineReaderTest pins the contract: a file with a UTF-8 BOM followed by `ParentMassTolerance=20ppm` returns the bare key=value as line 1 (not `ParentMassTolerance=20ppm`), and a no-BOM file is unchanged. 2. Delete unused MSGFResult.java. Verified zero callers in src/main and src/test before removal. The reviewer flagged this as a small follow-up to land alongside the modernization sweep. Verified: scoped sweep (BufferedLineReaderTest, MSGFPlusOptionsConfigFileTest, MSGFPlusOptionsActivationMethodTest, SearchParamsTest, TestPrecursorCalScaffolding, TestRunManifestWriter, TestDirectPinWriter, TestMSUtils, TestMisc, TestMinSpectraPerThread): 55 tests, 0 failures, 0 errors, 2 skipped.
|
Follow-up commit
Scoped sweep after the fix: 55 tests, 0 failures, 0 errors, 2 skipped (incl. the two new BOM tests). Bench correctness gate from the prior run still holds — the BOM fix doesn't touch the search hot path. |
Convert eight value-shaped types to Java records, per the audit: small immutable carriers with no inheritance constraints, where the record's auto-generated equals/hashCode/toString and accessor methods replace hand-rolled boilerplate. No behavioral change; net trim of ~80 LOC plus a clearer surface. Converted: - cli.IntRange -- record IntRange(int min, int max). Compact constructor keeps the min<=max validation. parse() and Converter (picocli ITypeConverter) preserved. - cli.PrecursorTolerance -- record PrecursorTolerance(Tolerance left, Tolerance right). Compact constructor enforces the matching-unit and non-negative invariants in-place; parse() / Converter retained. - msutil.CvParamInfo -- record CvParamInfo(String accession, String name, String value, String unitAccession, String unitName). The stored hasUnit field is gone; hasUnit() is now derived from unitAccession != null. Compatibility getters (getAccession(), getValue(), getUnitAccession(), getUnitName(), getHasUnit()) keep existing call sites untouched. - msutil.Annotation -- record Annotation(AminoAcid prevAA, Peptide peptide, AminoAcid nextAA). Setters (which had no live callers) removed; the parsing constructor delegates via this(...). isProteinNTerm/isProteinCTerm and the dotted toString are kept. - msgf.ProfilePeak<T extends Matter> -- record. Setters were unused; Comparable<ProfilePeak<T>> is preserved. - msutil.Atom -- record Atom(String code, double mass, int nominalMass, String name). Static atomArr table + atomMap + static initializer block all kept verbatim. getCode() / getName() / getMass() / getNominalMass() retained as compatibility wrappers. - msdbsearch.CompactSuffixArray.RangeMetadata -- record (4 file/int fields). Three call sites updated from md.numEntries to md.numEntries() etc. - msgf.MassListComparator.MatchedPair<T extends Matter> -- record. getMass1/getMass2 retained as compatibility wrappers. Call-site updates in SearchParams.parse: tol.left/.right, isotope.min/.max, specIdx.min/.max, ms.min/.max all switched to record accessor methods (.left()/.right()/.min()/.max()). Verified: scoped sweep (incl. BufferedLineReaderTest, MSGFPlusOptionsConfigFileTest, MSGFPlusOptionsActivationMethodTest, SearchParamsTest, TestPrecursorCalScaffolding, TestRunManifestWriter, TestDirectPinWriter, TestMinSpectraPerThread, TestMSUtils, TestSA, TestMisc, TestCandidatePeptideGrid + ConsideringMetCleavage): 80 tests, 0 failures, 0 errors, 3 skipped.
Summary
Eight focused commits on the search hot path. Pure refactor + instrumentation — search results are bit-identical to
devon every measurable axis (raw + Percolator 1 % / 5 % FDR). No.pinschema changes, no CLI default changes.The work followed the third-party review of PR #24 and the plan in
.claude/plans/search-sync-cleanup.md.Commits
perf(search): T1 — per-task wall stats + tail-imbalance summary—RunMSGFPlusnow captures preprocess / db-search / compute-evalue / total wall into aTaskWallStatsaccessor;MSGFPlusprints a one-line summary at end of search:The
tail_gapis the load-balance signal we need to decide whether T2/T3 helps on a given dataset.perf(search): drop dead synchronized wrappers in DBScanner + ScoredSpectraMap—DBScanneris owned by exactly oneRunMSGFPlus/ConcurrentMSGFDBtask (verified: no internal fork-out). RemovedCollections.synchronizedMapwrappers andsynchronizedmethod modifiers that served no purpose. Memory-visibility safety still holds viaExecutorService.awaitTermination's happens-before (JLS §17.4.5).perf(search): per-task result buffers; drop shared synchronizedList— replaces the globalCollections.synchronizedListwith per-taskArrayList. Main thread drains every task afterawaitTermination.RunMSGFPlus's constructor drops theresultListparameter; newgetResults()accessor.perf(search): T2 — finer task granularity (later removed)— initially added a sysprop knob; superseded by commit 8 below.perf(search): T3 — opt-in ForkJoinPool path via -Dmsgfplus.useForkJoin=true— alternative search executor behind a sysprop toggle. Default remainsThreadPoolExecutorWithExceptions(which keeps progress reporting + exception capture). FJP path usesFuture.get()for exception propagation. Kept opt-in because for our flat-task model the wall-clock difference vs T2-tuned ThreadPoolExecutor is small.perf(search): tighter result-buffer merge + drainResultsTo + reused null sink— three small allocation-trimming follow-ups: cache a staticNULL_PRINT_STREAMinstead of allocating perrun();drainResultsTo()releases per-task buffers immediately after merge; pre-size the mergedArrayListtosum(t.getResultCount()); drop strong refs to task instances post-summary.perf(msgfdb): drop redundant synchronizedList on per-task SpecKey sublist— same cleanup pattern in the legacy MSGFDB CLI's task partitioning.refactor(search): simplify per /simplify review—TaskWallStats→ Java 17 record, dropvolatileonwallStats, trim restate-the-code Javadocs, hoist sysprop constants, extractshutdownPoolNowhelper, addfjp.shutdownNow()on FJP error early-return. Net −43 LOC.refactor(search): drop redundant -Dmsgfplus.numTasksPerThread sysprop— the sysprop from commit 4 duplicated the existing-tasks -NCLI flag. Removed; default multiplier (3) inlined as a constant. Users who want a different multiplier should use the existing-tasks -Nflag.Astral 3-arm + Percolator parity (clean run)
Dataset: ProteoBench Module 8,
LFQ_Astral_DDA_15min_50ng_Condition_A_REP1.mzML(1.9 GB) vsProteoBenchFASTA_MixedSpecies_HYE.fasta. Cold mode, 8 GB heap, 4 threads. Defaults across the board (nouseForkJoin). Percolator via biocontainers 3.7.1,--seed 42.Strict parity vs dev
All eight numbers match
dev's tip exactly (the gate from the plan):Walltime delta vs master
The peak RSS bump on armB is from the JVM holding both the per-task ArrayLists and the merged list briefly during the drain phase; comparable to historical PR #24 RSS on the same heap. Sub-heap-cap on every arm.
What this PR does NOT change
.pinschema or feature columns.-tasksflag's negative-multiplier semantics are unchanged.).csarr/.cnlcp/.cseq/.cannoformats.ThreadPoolExecutorWithExceptionsremains the default; FJP is opt-in.Tests
→ 89 tests, 0 failures, 0 errors, 2 skipped.
TestConcurrentMSGFPlusextended with adrainsTaskLocalResultsIntoCallerBuffertest pinning the new drain semantics.Test plan
mvn -B -o test -Dtest='TestStaxMzMLParser,...'→ passingtail_gapfrom T1 might be larger; iterate with-tasks -Nand reportOperator notes
-tasks -N(existing CLI flag) — raise on uneven workloads when T1'stail_gapshows imbalance.-tasks -3reproduces the default; higher N gives finer granularity.-Dmsgfplus.useForkJoin=true(default false) — swap executor toForkJoinPool. Loses progress reporting and exception-capture-via-afterExecute; usesFuture.get()for exception propagation. Opt-in for evaluation only; not a recommended production tuning knob.T1's per-search summary line is always printed when
numTasks > 1; no flag needed.