Skip to content

rename updateProteinReferences function to removeDanglingProteinReferences#8500

Merged
timosachsenberg merged 3 commits intodevelopfrom
claude/investigate-protein-references-XlC6z
Dec 17, 2025
Merged

rename updateProteinReferences function to removeDanglingProteinReferences#8500
timosachsenberg merged 3 commits intodevelopfrom
claude/investigate-protein-references-XlC6z

Conversation

@timosachsenberg
Copy link
Copy Markdown
Contributor

@timosachsenberg timosachsenberg commented Dec 15, 2025

The function removes PeptideEvidence entries that reference proteins that no longer exist in the protein hits. The new name more clearly describes this removal behavior, aligning with other IDFilter methods like removeUnreferencedProteins and removeUngroupedProteins.

Changes:

  • Renamed all three overloads of the function
  • Improved documentation with detailed @param and @note descriptions
  • Updated all usages in TOPP tools and library code
  • Updated Python bindings with expanded docstring
  • Updated unit tests

Description

Checklist

  • Make sure that you are listed in the AUTHORS file
  • Add relevant changes and new features to the CHANGELOG file
  • I have commented my code, particularly in hard-to-understand areas
  • New and existing unit tests pass locally with my changes
  • Updated or added python bindings for changed or new classes (Tick if no updates were necessary.)

How can I get additional information on failed tests during CI

Click to expand If your PR is failing you can check out
  • The details of the action statuses at the end of the PR or the "Checks" tab.
  • http://cdash.seqan.de/index.php?project=OpenMS and look for your PR. Use the "Show filters" capability on the top right to search for your PR number.
    If you click in the column that lists the failed tests you will get detailed error messages.

Advanced commands (admins / reviewer only)

Click to expand
  • /reformat (experimental) applies the clang-format style changes as additional commit. Note: your branch must have a different name (e.g., yourrepo:feature/XYZ) than the receiving branch (e.g., OpenMS:develop). Otherwise, reformat fails to push.
  • setting the label "NoJenkins" will skip tests for this PR on jenkins (saves resources e.g., on edits that do not affect tests)
  • commenting with rebuild jenkins will retrigger Jenkins-based CI builds

⚠️ Note: Once you opened a PR try to minimize the number of pushes to it as every push will trigger CI (automated builds and test) and is rather heavy on our infrastructure (e.g., if several pushes per day are performed).

Summary by CodeRabbit

  • Refactor
    • Renamed the public API for protein-reference cleanup across identification and consensus workflows. Behavior unified to remove dangling protein references (per-run aware) after filtering.
    • User impact: callers should migrate to the updated API surface; runtime behavior and processing flows remain functionally consistent.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 15, 2025

Walkthrough

Renamed and reimplemented IDFilter::updateProteinReferences(...) to IDFilter::removeDanglingProteinReferences(...) across three overloads (PeptideIdentificationList, ConsensusMap, ConsensusMap+ref_run), updated call sites, Python bindings, and tests; implementations now use run-to-accessions-based filtering semantics described in comments.

Changes

Cohort / File(s) Summary
Header Declaration
src/openms/include/OpenMS/PROCESSING/ID/IDFilter.h
Renamed three static method overloads from updateProteinReferences to removeDanglingProteinReferences; updated docstrings to describe removal of dangling protein references and per-run matching semantics.
Implementation
src/openms/source/PROCESSING/ID/IDFilter.cpp
Replaced updateProteinReferences overloads with removeDanglingProteinReferences implementations; unified logic around a run_to_accessions mapping and conditional removal of peptides without references.
Algorithm Call Sites
src/openms/source/ANALYSIS/ID/BasicProteinInferenceAlgorithm.cpp
Replaced calls to updateProteinReferences(...) with removeDanglingProteinReferences(...) in single-run, cmap+ref_run, and multi-run pathways.
Tools / TOPP Call Sites
src/topp/FalseDiscoveryRate.cpp, src/topp/IDFilter.cpp, src/topp/IsobaricWorkflow.cpp, src/topp/ProteomicsLFQ.cpp
Updated post-processing calls to use removeDanglingProteinReferences(...) in the same argument positions; control flow unchanged.
Python Bindings
src/pyOpenMS/pxds/IDFilter.pxd
Added/updated declaration for removeDanglingProteinReferences(PeptideIdentificationList&, libcpp_vector[ProteinIdentification]&, bool) and extended wrap-doc; removed the updateProteinReferences binding.
Tests
src/tests/class_tests/openms/source/IDFilter_test.cpp
Updated test call sites to removeDanglingProteinReferences(...); test logic and assertions preserved.
Manifest / Build
manifest_file, CMakeLists.txt
Updated to reflect file changes (build metadata updated as part of the PR).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas needing extra attention:
    • src/openms/source/PROCESSING/ID/IDFilter.cpp: verify the new centralized run_to_accessions construction and filtering covers all previous edge cases (ref_run vs aggregated runs) and preserves intended behavior for removal vs update semantics.
    • Call-site correctness in BasicProteinInferenceAlgorithm.cpp and TOPP tools: ensure the correct overload is selected and boolean flags retain intended meaning.
    • src/pyOpenMS/pxds/IDFilter.pxd: ensure Python bindings and wrap-doc remain consistent with C++ signatures and ownership/locking assumptions.

Possibly related PRs

Suggested reviewers

  • jpfeuffer
  • cbielow

Poem

🐇 I hopped through code with tiny paws,
Renamed a method — applause! applause!
I chased the dangling protein traces,
Tidied hits in many places.
Now peptides sing in tidy rows — hooray, the repo grows!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary change: renaming updateProteinReferences to removeDanglingProteinReferences across the codebase.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/investigate-protein-references-XlC6z

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@timosachsenberg timosachsenberg changed the title rename updateProteinReferences function rename updateProteinReferences function to removeInvalidProteinReferences Dec 16, 2025
@timosachsenberg
Copy link
Copy Markdown
Contributor Author

@jpfeuffer would this be a better name? removeInvalidProteinReferences

Comment thread src/openms/include/OpenMS/PROCESSING/ID/IDFilter.h Outdated
@jpfeuffer
Copy link
Copy Markdown
Contributor

Hmm maybe. Although invalid might feel like they are invalid because of errors or something.

@timosachsenberg
Copy link
Copy Markdown
Contributor Author

Hmm maybe. Although invalid might feel like they are invalid because of errors or something.

removeDanglingProteinReferences() ?

The function removes PeptideEvidence entries that reference proteins
that no longer exist in the protein hits. The new name more clearly
describes this removal behavior, aligning with other IDFilter methods
like removeUnreferencedProteins and removeUngroupedProteins.

Changes:
- Renamed all three overloads of the function
- Improved documentation with detailed @param[in]/[out] annotations
- Updated all usages in TOPP tools and library code
- Updated Python bindings with expanded docstring
- Updated unit tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@timosachsenberg timosachsenberg force-pushed the claude/investigate-protein-references-XlC6z branch from ef976f8 to 6acc344 Compare December 16, 2025 16:28
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/openms/include/OpenMS/PROCESSING/ID/IDFilter.h (1)

1215-1253: keepNBestHits(AnnotatedMSRun&): removeDanglingProteinReferences currently operates on a copy only

In this block you re-wrap peptide_id into temp_vec, call removeDanglingProteinReferences(temp_vec, annotated_data.getProteinIdentifications());, but never copy the cleaned hit back into peptide_id. As a result, the peptide/protein references in annotated_data are not actually updated, despite the comment saying “we still need to update protein references” — this was already the case with updateProteinReferences.

If you want this cleanup to affect the AnnotatedMSRun, consider assigning back after the call:

        temp_vec = {peptide_id};
-        removeDanglingProteinReferences(temp_vec, annotated_data.getProteinIdentifications());
+        removeDanglingProteinReferences(temp_vec, annotated_data.getProteinIdentifications());
+        if (!temp_vec.empty())
+        {
+          peptide_id = temp_vec[0];
+        }

If this was only intended as a safety net and you don’t actually rely on it, alternatively drop the call (and the comment) to avoid confusion.

Based on learnings, keeping behavior and documentation aligned avoids subtle maintenance bugs.

🧹 Nitpick comments (1)
src/openms/source/PROCESSING/ID/IDFilter.cpp (1)

301-405: removeDanglingProteinReferences implementations align with the documented semantics

The three overloads correctly (a) derive valid accessions from the provided protein identifications, (b) rebuild PeptideEvidence lists to keep only evidences pointing to existing proteins, and (c) optionally drop peptide hits without remaining evidences when requested. This matches the header documentation and preserves the behavior of the former updateProteinReferences.

If you touch this again later, consider factoring out the shared “per‑run accession map + evidence filtering” logic between the ConsensusMap and PeptideIdentificationList variants to avoid duplication, but it’s not urgent.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef976f8 and 6acc344.

📒 Files selected for processing (9)
  • src/openms/include/OpenMS/PROCESSING/ID/IDFilter.h (3 hunks)
  • src/openms/source/ANALYSIS/ID/BasicProteinInferenceAlgorithm.cpp (3 hunks)
  • src/openms/source/PROCESSING/ID/IDFilter.cpp (3 hunks)
  • src/pyOpenMS/pxds/IDFilter.pxd (1 hunks)
  • src/tests/class_tests/openms/source/IDFilter_test.cpp (3 hunks)
  • src/topp/FalseDiscoveryRate.cpp (1 hunks)
  • src/topp/IDFilter.cpp (1 hunks)
  • src/topp/IsobaricWorkflow.cpp (2 hunks)
  • src/topp/ProteomicsLFQ.cpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/topp/IDFilter.cpp
  • src/topp/FalseDiscoveryRate.cpp
  • src/pyOpenMS/pxds/IDFilter.pxd
🧰 Additional context used
📓 Path-based instructions (3)
src/openms/**/*.{cpp,h,hpp,cc,cxx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

src/openms/**/*.{cpp,h,hpp,cc,cxx}: Follow the existing C++ coding conventions in the codebase
Use the established naming patterns for classes, methods, and variables
Use OpenMS data structures (e.g., MSExperiment, FeatureMap, PeptideIdentification)
Follow the established error handling patterns
Utilize OpenMS logging mechanisms
Be mindful of memory usage when processing large datasets
Consider algorithmic complexity for data processing operations
Use appropriate OpenMS containers and algorithms

Files:

  • src/openms/source/PROCESSING/ID/IDFilter.cpp
  • src/openms/source/ANALYSIS/ID/BasicProteinInferenceAlgorithm.cpp
  • src/openms/include/OpenMS/PROCESSING/ID/IDFilter.h
src/tests/class_tests/openms/**/*.cpp

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Write unit tests for new functionality

Files:

  • src/tests/class_tests/openms/source/IDFilter_test.cpp
src/openms/include/OpenMS/**/*.h

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Add Doxygen comments for new public methods and classes

Files:

  • src/openms/include/OpenMS/PROCESSING/ID/IDFilter.h
🧠 Learnings (6)
📚 Learning: 2025-08-05T12:43:11.681Z
Learnt from: CR
Repo: OpenMS/OpenMS PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-08-05T12:43:11.681Z
Learning: Applies to src/tests/class_tests/openms/**/*.cpp : Write unit tests for new functionality

Applied to files:

  • src/tests/class_tests/openms/source/IDFilter_test.cpp
📚 Learning: 2025-08-05T12:43:11.681Z
Learnt from: CR
Repo: OpenMS/OpenMS PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-08-05T12:43:11.681Z
Learning: Applies to src/tests/class_tests/openms/*Test.cpp : Follow naming convention: `ClassNameTest.cpp` for `ClassName`

Applied to files:

  • src/tests/class_tests/openms/source/IDFilter_test.cpp
📚 Learning: 2025-09-03T12:58:20.032Z
Learnt from: timosachsenberg
Repo: OpenMS/OpenMS PR: 8177
File: src/tests/topp/CMakeLists.txt:1857-1865
Timestamp: 2025-09-03T12:58:20.032Z
Learning: OpenMS tests: When adding new TOPP tests in src/tests/topp/CMakeLists.txt that compare outputs via FuzzyDiff, always add set_tests_properties("<tool>_<n>_out<m>" PROPERTIES DEPENDS "<tool>_<n>") to avoid flakiness under parallel CTest runs.

Applied to files:

  • src/tests/class_tests/openms/source/IDFilter_test.cpp
📚 Learning: 2025-10-21T13:02:16.431Z
Learnt from: timosachsenberg
Repo: OpenMS/OpenMS PR: 8318
File: src/openms/include/OpenMS/ML/CLUSTERING/HashGrid.h:466-480
Timestamp: 2025-10-21T13:02:16.431Z
Learning: In OpenMS, when making internal methods public (e.g., in HashGrid.h), remove XXX/TODO comments about implementation details rather than documenting them in the public API documentation.

Applied to files:

  • src/tests/class_tests/openms/source/IDFilter_test.cpp
  • src/openms/include/OpenMS/PROCESSING/ID/IDFilter.h
📚 Learning: 2025-08-05T12:43:11.681Z
Learnt from: CR
Repo: OpenMS/OpenMS PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-08-05T12:43:11.681Z
Learning: Applies to src/openms/**/*.{cpp,h,hpp,cc,cxx} : Use OpenMS data structures (e.g., MSExperiment, FeatureMap, PeptideIdentification)

Applied to files:

  • src/tests/class_tests/openms/source/IDFilter_test.cpp
📚 Learning: 2025-08-05T12:43:11.681Z
Learnt from: CR
Repo: OpenMS/OpenMS PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-08-05T12:43:11.681Z
Learning: Update relevant documentation when modifying existing functionality

Applied to files:

  • src/openms/include/OpenMS/PROCESSING/ID/IDFilter.h
🧬 Code graph analysis (4)
src/topp/IsobaricWorkflow.cpp (1)
src/openms/source/PROCESSING/ID/IDFilter.cpp (6)
  • removeDanglingProteinReferences (302-337)
  • removeDanglingProteinReferences (302-302)
  • removeDanglingProteinReferences (339-370)
  • removeDanglingProteinReferences (339-339)
  • removeDanglingProteinReferences (372-405)
  • removeDanglingProteinReferences (372-372)
src/tests/class_tests/openms/source/IDFilter_test.cpp (1)
src/openms/source/PROCESSING/ID/IDFilter.cpp (6)
  • removeDanglingProteinReferences (302-337)
  • removeDanglingProteinReferences (302-302)
  • removeDanglingProteinReferences (339-370)
  • removeDanglingProteinReferences (339-339)
  • removeDanglingProteinReferences (372-405)
  • removeDanglingProteinReferences (372-372)
src/topp/ProteomicsLFQ.cpp (1)
src/openms/source/PROCESSING/ID/IDFilter.cpp (6)
  • removeDanglingProteinReferences (302-337)
  • removeDanglingProteinReferences (302-302)
  • removeDanglingProteinReferences (339-370)
  • removeDanglingProteinReferences (339-339)
  • removeDanglingProteinReferences (372-405)
  • removeDanglingProteinReferences (372-372)
src/openms/include/OpenMS/PROCESSING/ID/IDFilter.h (2)
src/openms/source/PROCESSING/ID/IDFilter.cpp (6)
  • removeDanglingProteinReferences (302-337)
  • removeDanglingProteinReferences (302-302)
  • removeDanglingProteinReferences (339-370)
  • removeDanglingProteinReferences (339-339)
  • removeDanglingProteinReferences (372-405)
  • removeDanglingProteinReferences (372-372)
src/openms/include/OpenMS/METADATA/ProteinIdentification.h (1)
  • vector (58-101)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-win
  • GitHub Check: build-macos-arm
  • GitHub Check: build-lnx-arm
  • GitHub Check: cppcheck-test
🔇 Additional comments (5)
src/openms/include/OpenMS/PROCESSING/ID/IDFilter.h (1)

785-831: Doxygen for removeDanglingProteinReferences is consistent with implementation and naming

The new overloads are well-documented: parameter directions are clear, the per-run matching semantics match the implementations in IDFilter.cpp, and the naming aligns with removeUnreferencedProteins. No issues here.

src/topp/IsobaricWorkflow.cpp (1)

766-795: Renamed calls in FDR cleanup use the new API correctly

Switching from updateProteinReferences to removeDanglingProteinReferences(cmap, rm_pep) preserves the intended behavior: peptide evidences (and optionally peptide hits) are cleaned after protein-FDR filtering and before removing unreferenced proteins / updating groups. The flag wiring via rm_pep is still correct.

src/topp/ProteomicsLFQ.cpp (1)

1314-1342: ConsensusMap clean‑up calls now use removeDanglingProteinReferences with correct semantics

Both updated call sites in inferProteinGroups_ pass true to drop peptide hits without surviving protein references, which is precisely what you want after removing decoy / low-confidence proteins. The rename is consistent and behavior is preserved.

src/tests/class_tests/openms/source/IDFilter_test.cpp (1)

242-273: Updated unit test correctly exercises removeDanglingProteinReferences semantics

This START_SECTION now validates both modes of the new API: it checks that evidences are restricted to surviving protein hits when remove_peptides_without_reference is false, and that peptide hits lacking any remaining evidences are dropped when it is true. This matches the implementation and gives good coverage of the renamed function.

src/openms/source/ANALYSIS/ID/BasicProteinInferenceAlgorithm.cpp (1)

85-94: Protein inference now uses the new overloads in the intended way

All three updated call sites (pep_ids + single run, ConsensusMap + prot_run, and pep_ids + multi-run prot_ids) pass the filtered protein sets into removeDanglingProteinReferences with remove_peptides_without_reference = true, ensuring PSMs referencing proteins that didn’t meet min_peptides_per_protein are removed. The temporary vector trick in the single-run case preserves previous behavior while fitting the new API.

Also applies to: 184-191, 260-263

@jpfeuffer
Copy link
Copy Markdown
Contributor

Yeah better!

@timosachsenberg timosachsenberg enabled auto-merge (squash) December 16, 2025 16:51
@timosachsenberg timosachsenberg changed the title rename updateProteinReferences function to removeInvalidProteinReferences rename updateProteinReferences function to removeDanglingProteinReferences Dec 16, 2025
@timosachsenberg timosachsenberg merged commit c54be7f into develop Dec 17, 2025
24 checks passed
@timosachsenberg timosachsenberg deleted the claude/investigate-protein-references-XlC6z branch December 17, 2025 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants