Skip to content

Comet v2026.02.0

Latest

Choose a tag to compare

@jke000 jke000 released this 12 Jun 00:54
· 3 commits to master since this release

What's Changed

New Features

  • Concurrent multi-threaded real-time search (RTS)

    • The RTS path now supports N concurrent C# Task threads sharing a single CometSearchManagerWrapper instance. The MS2 fragment index search and MS1 spectral library alignment are both thread-safe: preprocessing uses per-thread RtsScratch scratch pools, DoSingleSpectrumSearchMultiResults operates on a thread-local Query* without touching g_pvQuery, and DoMS1SearchMultiResults serializes only the RT alignment history update. This delivers significant throughput improvement for MS2 RTS searches on multi-core hardware. Example implementation in SearchMS1MS2.cs in the RealtimeSearch project.
  • Compound modifications aka Comet Multi-Modification

    • Merged the compound modifications branch to facilitate future code support. A new compoundmods_file parameter accepts a file listing J-residue mass modifications. These are searched via a dedicated CompoundModSearch() path integrated into SearchForPeptides() and MergeVarMods(), with output writers and post-analysis updated to handle the new modification encodings. Utility is for adduct screening and is currently considering a beta/test feature.
  • Peak memory reporting

    • Comet now reports peak resident set size at the end of index creation and search steps. Peak memory is also surfaced to the RTS C# layer via CometSearchManagerWrapper::GetPeakMemory().
  • Python q-value / FDR tool

    • A new tools/qvalue.py script computes q-values from Comet tab-delimited output and supports side-by-side comparison of two result files with an optional --diff flag to list differing PSMs.

Performance Improvements

  • Parallel .idx index building

    • GeneratePlainPeptideIndex now uses a parallel per-length sort+dedup phase followed by a k-way heap merge write. On benchmarks with the human proteome this reduces index creation time by 1.3× (tryptic) to 1.9× (no-enzyme/MHC) compared to v2026.01.1.
  • RTS preprocessing thread-local pool (RtsScratch)

    • All six scratch arrays used during single-spectrum preprocessing (raw data, fast XCorr, correlation, sparse matrix blocks) are pre-allocated once per thread and reused across spectra, eliminating per-spectrum heap allocation. Only the elements actually read/written are zeroed on each reuse.
  • E-value computation restructured with CSR inverted index

    • GenerateXcorrDecoys() now uses a pre-built CSR inverted index (s_invIdx_data, s_invIdx_start) and a thread-local 3000-element float accumulator, replacing the previous per-decoy inner loop. Decoy scores are accumulated via scatter then histogrammed once, reducing cache pressure significantly.
  • AScore optimizations

    • Eliminated redundant Scan copies in AScoreCalculator and AScoreDllInterface (two copies reduced to one via pass-by-value + std::move).
    • getMassList() now caches its result; repeated calls with identical parameters return immediately without recomputation.
    • matchPeaks() replaces an unordered_map with two vector<bool> arrays, removing all hash operations from the hot matching loop.
  • In-memory protein name cache for RTS

    • g_pvProteinNameCache (an unordered_map<file_offset, string>) is populated once at index load time. RTS protein lookups are now O(1) in-memory instead of seeking into the FASTA on every hit.
  • AcquirePoolSlot() contention reduction

    • The previous busy-spin wait on _pbSearchMemoryPool is replaced by a std::condition_variable::wait_for with proper lock/notify at all release sites, eliminating CPU waste under thread contention.

Bug Fixes

  • I/L deduplication: When equal_I_and_L=1, the FASTA-original (L-containing) peptide sequence is now preserved in the index; the I-containing variant is the one discarded. Previously the choice was arbitrary, causing extra spurious entries in the index.
  • g_pvProteinsList heap-allocation storm: Replaced element-by-element vector growth with a CSR (compressed sparse row) pre-allocation, eliminating O(N²) reallocation behavior on large databases.
  • DBIndex::sPeptide / PlainPeptideIndexStruct::sPeptide: Refactored from std::string to char[MAX_PEPTIDE_LEN] fixed-size arrays, eliminating per-peptide heap allocations during index construction and search.
  • set_Z_user_amino_acid parameter: Was incorrectly setting the X residue mass; now sets Z as intended.
  • Peptide length range error message: Was displaying scan range values instead of peptide length values.
  • logout() routing: All logout() calls now go to stdout instead of stderr.

Tools and Build

  • Fragment ion index parameters added to the params file generated by comet -p.
  • Visual Studio Clean Solution now removes Linux-built expat and zlib directories, preventing stale headers from interfering with subsequent builds.
  • expat source distribution switched from .tar.gz to .zip for consistent cross-platform unpacking.
  • Linux binary restored to static linking (-static) for compatibility with older glibc environments (e.g., Ubuntu 18.04 Docker images).

Full Changelog: v2026.01.1...v2026.02.0