Skip to content

Merge correlate_preloaded into correlate with auto-detection of PSM spectrum property#262

Merged
RalfG merged 4 commits intorelease/4.2from
feat/merge-correlate-preloaded
Apr 15, 2026
Merged

Merge correlate_preloaded into correlate with auto-detection of PSM spectrum property#262
RalfG merged 4 commits intorelease/4.2from
feat/merge-correlate-preloaded

Conversation

@RalfG
Copy link
Copy Markdown
Member

@RalfG RalfG commented Apr 15, 2026

Merge correlate_preloaded into correlate with auto-detection, skip invalid peptidoforms with summarized warnings, and optimize performance.


Changed

  • correlate now auto-detects preloaded spectra on PSMs (MS2Spectrum/AnnotatedMS2Spectrum), removing the need for a separate correlate_preloaded function
  • Invalid peptidoforms (unsupported amino acids, length outside 4–100, missing charge) are skipped with an empty ProcessingResult and a summarized warning
  • read_psms accepts list[PSM] in addition to PSMList, str, and Path
  • Replaced np.corrcoef with direct Pearson formula (~60% faster per call)
  • Cached proforma_to_mass_shift with lru_cache
  • Use model_construct for internal ObservedSpectrum creation (skips Pydantic validation)
  • Consolidated pearson() in correlation.py, reused by ms2pip_pearson and calculate_correlations
  • Extracted resolve_spectra dispatcher and MatchedSpectrum NamedTuple in _spectrum_processing
  • Extracted _preprocess_spectrum helper to deduplicate preprocessing logic
  • Made internal spectrum functions private (_load_and_match_spectra, _preloaded_to_annotations, _read_raw_spectra, _to_observed_spectrum)
  • Moved validate_peptidoform and filter_valid_psms to psm_input

Removed

  • correlate_preloaded from public API

RalfG added 4 commits April 14, 2026 21:33
Make `correlate` auto-detect whether PSMs carry preloaded spectra
(MS2Spectrum/AnnotatedMS2Spectrum) or need file-based loading, removing
the need for a separate `correlate_preloaded` function.
Validate peptidoforms before passing to ms2rescore-rs: check for
unsupported amino acids, sequence length (4-100), and missing charge.
Invalid PSMs are skipped with an empty ProcessingResult and a single
summarized warning log message, matching the old silent-skip behavior.
- Replace np.corrcoef with direct dot-product Pearson formula
- Consolidate pearson() in correlation.py, reuse in result.py
- Cache proforma_to_mass_shift with lru_cache (avoids recomputation)
- Use model_construct for internal ObservedSpectrum creation
@RalfG RalfG merged commit b981a2e into release/4.2 Apr 15, 2026
5 checks passed
@RalfG RalfG deleted the feat/merge-correlate-preloaded branch April 15, 2026 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant