rename updateProteinReferences function to removeDanglingProteinReferences#8500
Conversation
WalkthroughRenamed and reimplemented IDFilter::updateProteinReferences(...) to IDFilter::removeDanglingProteinReferences(...) across three overloads (PeptideIdentificationList, ConsensusMap, ConsensusMap+ref_run), updated call sites, Python bindings, and tests; implementations now use run-to-accessions-based filtering semantics described in comments. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@jpfeuffer would this be a better name? removeInvalidProteinReferences |
|
Hmm maybe. Although invalid might feel like they are invalid because of errors or something. |
removeDanglingProteinReferences() ? |
The function removes PeptideEvidence entries that reference proteins that no longer exist in the protein hits. The new name more clearly describes this removal behavior, aligning with other IDFilter methods like removeUnreferencedProteins and removeUngroupedProteins. Changes: - Renamed all three overloads of the function - Improved documentation with detailed @param[in]/[out] annotations - Updated all usages in TOPP tools and library code - Updated Python bindings with expanded docstring - Updated unit tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ef976f8 to
6acc344
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/openms/include/OpenMS/PROCESSING/ID/IDFilter.h (1)
1215-1253:keepNBestHits(AnnotatedMSRun&):removeDanglingProteinReferencescurrently operates on a copy onlyIn this block you re-wrap
peptide_idintotemp_vec, callremoveDanglingProteinReferences(temp_vec, annotated_data.getProteinIdentifications());, but never copy the cleaned hit back intopeptide_id. As a result, the peptide/protein references inannotated_dataare not actually updated, despite the comment saying “we still need to update protein references” — this was already the case withupdateProteinReferences.If you want this cleanup to affect the
AnnotatedMSRun, consider assigning back after the call:temp_vec = {peptide_id}; - removeDanglingProteinReferences(temp_vec, annotated_data.getProteinIdentifications()); + removeDanglingProteinReferences(temp_vec, annotated_data.getProteinIdentifications()); + if (!temp_vec.empty()) + { + peptide_id = temp_vec[0]; + }If this was only intended as a safety net and you don’t actually rely on it, alternatively drop the call (and the comment) to avoid confusion.
Based on learnings, keeping behavior and documentation aligned avoids subtle maintenance bugs.
🧹 Nitpick comments (1)
src/openms/source/PROCESSING/ID/IDFilter.cpp (1)
301-405:removeDanglingProteinReferencesimplementations align with the documented semanticsThe three overloads correctly (a) derive valid accessions from the provided protein identifications, (b) rebuild
PeptideEvidencelists to keep only evidences pointing to existing proteins, and (c) optionally drop peptide hits without remaining evidences when requested. This matches the header documentation and preserves the behavior of the formerupdateProteinReferences.If you touch this again later, consider factoring out the shared “per‑run accession map + evidence filtering” logic between the ConsensusMap and PeptideIdentificationList variants to avoid duplication, but it’s not urgent.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
src/openms/include/OpenMS/PROCESSING/ID/IDFilter.h(3 hunks)src/openms/source/ANALYSIS/ID/BasicProteinInferenceAlgorithm.cpp(3 hunks)src/openms/source/PROCESSING/ID/IDFilter.cpp(3 hunks)src/pyOpenMS/pxds/IDFilter.pxd(1 hunks)src/tests/class_tests/openms/source/IDFilter_test.cpp(3 hunks)src/topp/FalseDiscoveryRate.cpp(1 hunks)src/topp/IDFilter.cpp(1 hunks)src/topp/IsobaricWorkflow.cpp(2 hunks)src/topp/ProteomicsLFQ.cpp(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/topp/IDFilter.cpp
- src/topp/FalseDiscoveryRate.cpp
- src/pyOpenMS/pxds/IDFilter.pxd
🧰 Additional context used
📓 Path-based instructions (3)
src/openms/**/*.{cpp,h,hpp,cc,cxx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
src/openms/**/*.{cpp,h,hpp,cc,cxx}: Follow the existing C++ coding conventions in the codebase
Use the established naming patterns for classes, methods, and variables
Use OpenMS data structures (e.g., MSExperiment, FeatureMap, PeptideIdentification)
Follow the established error handling patterns
Utilize OpenMS logging mechanisms
Be mindful of memory usage when processing large datasets
Consider algorithmic complexity for data processing operations
Use appropriate OpenMS containers and algorithms
Files:
src/openms/source/PROCESSING/ID/IDFilter.cppsrc/openms/source/ANALYSIS/ID/BasicProteinInferenceAlgorithm.cppsrc/openms/include/OpenMS/PROCESSING/ID/IDFilter.h
src/tests/class_tests/openms/**/*.cpp
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Write unit tests for new functionality
Files:
src/tests/class_tests/openms/source/IDFilter_test.cpp
src/openms/include/OpenMS/**/*.h
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Add Doxygen comments for new public methods and classes
Files:
src/openms/include/OpenMS/PROCESSING/ID/IDFilter.h
🧠 Learnings (6)
📚 Learning: 2025-08-05T12:43:11.681Z
Learnt from: CR
Repo: OpenMS/OpenMS PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-08-05T12:43:11.681Z
Learning: Applies to src/tests/class_tests/openms/**/*.cpp : Write unit tests for new functionality
Applied to files:
src/tests/class_tests/openms/source/IDFilter_test.cpp
📚 Learning: 2025-08-05T12:43:11.681Z
Learnt from: CR
Repo: OpenMS/OpenMS PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-08-05T12:43:11.681Z
Learning: Applies to src/tests/class_tests/openms/*Test.cpp : Follow naming convention: `ClassNameTest.cpp` for `ClassName`
Applied to files:
src/tests/class_tests/openms/source/IDFilter_test.cpp
📚 Learning: 2025-09-03T12:58:20.032Z
Learnt from: timosachsenberg
Repo: OpenMS/OpenMS PR: 8177
File: src/tests/topp/CMakeLists.txt:1857-1865
Timestamp: 2025-09-03T12:58:20.032Z
Learning: OpenMS tests: When adding new TOPP tests in src/tests/topp/CMakeLists.txt that compare outputs via FuzzyDiff, always add set_tests_properties("<tool>_<n>_out<m>" PROPERTIES DEPENDS "<tool>_<n>") to avoid flakiness under parallel CTest runs.
Applied to files:
src/tests/class_tests/openms/source/IDFilter_test.cpp
📚 Learning: 2025-10-21T13:02:16.431Z
Learnt from: timosachsenberg
Repo: OpenMS/OpenMS PR: 8318
File: src/openms/include/OpenMS/ML/CLUSTERING/HashGrid.h:466-480
Timestamp: 2025-10-21T13:02:16.431Z
Learning: In OpenMS, when making internal methods public (e.g., in HashGrid.h), remove XXX/TODO comments about implementation details rather than documenting them in the public API documentation.
Applied to files:
src/tests/class_tests/openms/source/IDFilter_test.cppsrc/openms/include/OpenMS/PROCESSING/ID/IDFilter.h
📚 Learning: 2025-08-05T12:43:11.681Z
Learnt from: CR
Repo: OpenMS/OpenMS PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-08-05T12:43:11.681Z
Learning: Applies to src/openms/**/*.{cpp,h,hpp,cc,cxx} : Use OpenMS data structures (e.g., MSExperiment, FeatureMap, PeptideIdentification)
Applied to files:
src/tests/class_tests/openms/source/IDFilter_test.cpp
📚 Learning: 2025-08-05T12:43:11.681Z
Learnt from: CR
Repo: OpenMS/OpenMS PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-08-05T12:43:11.681Z
Learning: Update relevant documentation when modifying existing functionality
Applied to files:
src/openms/include/OpenMS/PROCESSING/ID/IDFilter.h
🧬 Code graph analysis (4)
src/topp/IsobaricWorkflow.cpp (1)
src/openms/source/PROCESSING/ID/IDFilter.cpp (6)
removeDanglingProteinReferences(302-337)removeDanglingProteinReferences(302-302)removeDanglingProteinReferences(339-370)removeDanglingProteinReferences(339-339)removeDanglingProteinReferences(372-405)removeDanglingProteinReferences(372-372)
src/tests/class_tests/openms/source/IDFilter_test.cpp (1)
src/openms/source/PROCESSING/ID/IDFilter.cpp (6)
removeDanglingProteinReferences(302-337)removeDanglingProteinReferences(302-302)removeDanglingProteinReferences(339-370)removeDanglingProteinReferences(339-339)removeDanglingProteinReferences(372-405)removeDanglingProteinReferences(372-372)
src/topp/ProteomicsLFQ.cpp (1)
src/openms/source/PROCESSING/ID/IDFilter.cpp (6)
removeDanglingProteinReferences(302-337)removeDanglingProteinReferences(302-302)removeDanglingProteinReferences(339-370)removeDanglingProteinReferences(339-339)removeDanglingProteinReferences(372-405)removeDanglingProteinReferences(372-372)
src/openms/include/OpenMS/PROCESSING/ID/IDFilter.h (2)
src/openms/source/PROCESSING/ID/IDFilter.cpp (6)
removeDanglingProteinReferences(302-337)removeDanglingProteinReferences(302-302)removeDanglingProteinReferences(339-370)removeDanglingProteinReferences(339-339)removeDanglingProteinReferences(372-405)removeDanglingProteinReferences(372-372)src/openms/include/OpenMS/METADATA/ProteinIdentification.h (1)
vector(58-101)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-win
- GitHub Check: build-macos-arm
- GitHub Check: build-lnx-arm
- GitHub Check: cppcheck-test
🔇 Additional comments (5)
src/openms/include/OpenMS/PROCESSING/ID/IDFilter.h (1)
785-831: Doxygen forremoveDanglingProteinReferencesis consistent with implementation and namingThe new overloads are well-documented: parameter directions are clear, the per-run matching semantics match the implementations in
IDFilter.cpp, and the naming aligns withremoveUnreferencedProteins. No issues here.src/topp/IsobaricWorkflow.cpp (1)
766-795: Renamed calls in FDR cleanup use the new API correctlySwitching from
updateProteinReferencestoremoveDanglingProteinReferences(cmap, rm_pep)preserves the intended behavior: peptide evidences (and optionally peptide hits) are cleaned after protein-FDR filtering and before removing unreferenced proteins / updating groups. The flag wiring viarm_pepis still correct.src/topp/ProteomicsLFQ.cpp (1)
1314-1342: ConsensusMap clean‑up calls now useremoveDanglingProteinReferenceswith correct semanticsBoth updated call sites in
inferProteinGroups_passtrueto drop peptide hits without surviving protein references, which is precisely what you want after removing decoy / low-confidence proteins. The rename is consistent and behavior is preserved.src/tests/class_tests/openms/source/IDFilter_test.cpp (1)
242-273: Updated unit test correctly exercisesremoveDanglingProteinReferencessemanticsThis START_SECTION now validates both modes of the new API: it checks that evidences are restricted to surviving protein hits when
remove_peptides_without_referenceis false, and that peptide hits lacking any remaining evidences are dropped when it is true. This matches the implementation and gives good coverage of the renamed function.src/openms/source/ANALYSIS/ID/BasicProteinInferenceAlgorithm.cpp (1)
85-94: Protein inference now uses the new overloads in the intended wayAll three updated call sites (
pep_ids+ single run,ConsensusMap+prot_run, andpep_ids+ multi-runprot_ids) pass the filtered protein sets intoremoveDanglingProteinReferenceswithremove_peptides_without_reference = true, ensuring PSMs referencing proteins that didn’t meetmin_peptides_per_proteinare removed. The temporary vector trick in the single-run case preserves previous behavior while fitting the new API.Also applies to: 184-191, 260-263
|
Yeah better! |
The function removes PeptideEvidence entries that reference proteins that no longer exist in the protein hits. The new name more clearly describes this removal behavior, aligning with other IDFilter methods like removeUnreferencedProteins and removeUngroupedProteins.
Changes:
Description
Checklist
How can I get additional information on failed tests during CI
Click to expand
If your PR is failing you can check outIf you click in the column that lists the failed tests you will get detailed error messages.
Advanced commands (admins / reviewer only)
Click to expand
/reformat(experimental) applies the clang-format style changes as additional commit. Note: your branch must have a different name (e.g., yourrepo:feature/XYZ) than the receiving branch (e.g., OpenMS:develop). Otherwise, reformat fails to push.rebuild jenkinswill retrigger Jenkins-based CI buildsSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.