Merge correlate_preloaded into correlate with auto-detection of PSM spectrum property#262
Merged
RalfG merged 4 commits intorelease/4.2from Apr 15, 2026
Merged
Conversation
Make `correlate` auto-detect whether PSMs carry preloaded spectra (MS2Spectrum/AnnotatedMS2Spectrum) or need file-based loading, removing the need for a separate `correlate_preloaded` function.
Validate peptidoforms before passing to ms2rescore-rs: check for unsupported amino acids, sequence length (4-100), and missing charge. Invalid PSMs are skipped with an empty ProcessingResult and a single summarized warning log message, matching the old silent-skip behavior.
- Replace np.corrcoef with direct dot-product Pearson formula - Consolidate pearson() in correlation.py, reuse in result.py - Cache proforma_to_mass_shift with lru_cache (avoids recomputation) - Use model_construct for internal ObservedSpectrum creation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Merge
correlate_preloadedintocorrelatewith auto-detection, skip invalid peptidoforms with summarized warnings, and optimize performance.Changed
correlatenow auto-detects preloaded spectra on PSMs (MS2Spectrum/AnnotatedMS2Spectrum), removing the need for a separatecorrelate_preloadedfunctionProcessingResultand a summarized warningread_psmsacceptslist[PSM]in addition toPSMList,str, andPathnp.corrcoefwith direct Pearson formula (~60% faster per call)proforma_to_mass_shiftwithlru_cachemodel_constructfor internalObservedSpectrumcreation (skips Pydantic validation)pearson()incorrelation.py, reused byms2pip_pearsonandcalculate_correlationsresolve_spectradispatcher andMatchedSpectrumNamedTuple in_spectrum_processing_preprocess_spectrumhelper to deduplicate preprocessing logic_load_and_match_spectra,_preloaded_to_annotations,_read_raw_spectra,_to_observed_spectrum)validate_peptidoformandfilter_valid_psmstopsm_inputRemoved
correlate_preloadedfrom public API