Skip to content

fix(maxquant): Filter out decoys with decoy column#133

Merged
tonywu1999 merged 1 commit into
develfrom
fix-maxquant
May 27, 2026
Merged

fix(maxquant): Filter out decoys with decoy column#133
tonywu1999 merged 1 commit into
develfrom
fix-maxquant

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

@tonywu1999 tonywu1999 commented May 22, 2026

Motivation and Context

https://groups.google.com/g/msstats/c/NwsByfS2Y5M

I was using a MaxQuant output for the first time with MSstats. I ran MaxQtoMSstatsFormat then dataProcess and realized there were "REV_" (reverse) sequences in the processed_data$ProteinLevelData slot. Looking back at an old MaxQuant output shows that the decoy columns used to be named "Reverse" in the proteinGroup.txt and evidence.txt output files. Using MaxQuant 2.8.0.0 the "Reverse" column is now missing from these files and there is a new column called "Decoy" that seems to represent the same information. Although I couldn't find any documentation of this change, I believe this is why I saw reverse sequences in my processed data.
I reran MSstats but changed "Decoy" column names to "Reverse" in the proteinGroup and evidence data which resulted in MSstats removing decoy sequences and they were no longer present in the processed data.

Motivation and Context

MaxQuant proteomics software has introduced the "Potential.contaminant" column in its output format as an additional means of identifying potentially problematic proteins. The .cleanRawMaxQuant() function previously filtered proteins only based on the Contaminant, Reverse, and Decoy columns. This PR updates the function to also filter out proteins marked in the new Potential.contaminant column, ensuring that the MSstatsConvert package properly handles recent changes to MaxQuant's output format and prevents potentially problematic proteins from being included in downstream analysis.

Changes

  • R/clean_MaxQuant.R:
    • Added "Potentialcontaminant" to the filter_cols vector (line 20) to filter rows where this column contains marked values
    • Updated the informational message (lines 21-22) to include "Potential.contaminant" in the list of filtered protein categories
    • Updated the informational message for remove_by_site = TRUE case (lines 25-26) to also mention "Potential.contaminant" alongside existing filters ("Contaminant", "Reverse", "Decoy", and "Only.identified.by.site")
    • Total changes: +3/-3 lines

Unit Tests

No unit tests were added or modified in this PR. The existing test suite in inst/tinytest/test_cleanRaw.R does test MaxQuant cleaning functionality (lines 40-55), but the test data already contains the Potential.contaminant column in the mq_pg.csv file, so the filtering behavior is implicitly covered by existing tests. No explicit new test cases were created to specifically validate the Potentialcontaminant filtering behavior.

Coding Guidelines

No violations of coding guidelines identified. The changes follow the existing code patterns and maintain consistency with the R coding style used throughout the package.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

📝 Walkthrough

Walkthrough

.cleanRawMaxQuant() now includes Potentialcontaminant as an additional column for filtering contaminants. The function's status message is updated to report Potential.contaminant alongside the existing Contaminant, Reverse, and Decoy filters when removing flagged rows.

Changes

Contaminant Filtering Enhancement

Layer / File(s) Summary
Expand contaminant filter columns and status message
R/clean_MaxQuant.R
The contaminant filter configuration is expanded to include Potentialcontaminant as an additional filter column alongside Contaminant, Reverse, and Decoy. The informational status message is updated to mention Potential.contaminant in the filtering output.

Estimated Code Review Effort

🎯 2 (Simple) | ⏱️ ~5 minutes

Poem

🐰 A contaminant lurks in the data stream,
Potential and actual—now caught by the scheme!
With dots in the names and filters aligned,
MaxQuant data shines, with contaminants consigned!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title mentions filtering decoys with a decoy column, but the actual change adds 'Potential.contaminant' filtering, not decoy-related changes. Update the title to accurately reflect that the change filters Potential.contaminant column alongside existing contaminant-like filters.
✅ Passed checks (4 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description is comprehensive and well-structured, addressing motivation, changes, testing, and coding guidelines.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-maxquant

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tonywu1999 tonywu1999 changed the title fix(maxquant): Fix MaxQuant converter w.r.t. recent MaxQ changes fix(maxquant): Filter out decoys with decoy column May 22, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@R/clean_MaxQuant.R`:
- Around line 20-27: The current filter_cols in clean_MaxQuant.R uses incorrect
MaxQuant header names (e.g., "Potentialcontaminant" and "Decoy") so filtering
silently skips expected columns; change filter_cols to use the literal MaxQuant
column names present in our inputs (e.g., "Contaminant",
"Potential.contaminant", "Reverse") and remove "Decoy"; if remove_by_site is
true append "Only.identified.by.site" to filter_cols and update the msg text to
match these literal names so the log reflects the actual columns being filtered;
locate the filter_cols and msg variables in clean_MaxQuant.R to make this edit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d5e62551-f121-49e2-8429-864235c2d3d7

📥 Commits

Reviewing files that changed from the base of the PR and between b9564f2 and 40b0139.

📒 Files selected for processing (1)
  • R/clean_MaxQuant.R

Comment thread R/clean_MaxQuant.R
@tonywu1999 tonywu1999 merged commit 3145427 into devel May 27, 2026
2 checks passed
@tonywu1999 tonywu1999 deleted the fix-maxquant branch May 27, 2026 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant