Skip to content

Merge MZMinetoMSstatsFormat converter into development#132

Merged
tonywu1999 merged 6 commits into
develfrom
MSstatsConvert/work/20260516_mzmine_converter
May 27, 2026
Merged

Merge MZMinetoMSstatsFormat converter into development#132
tonywu1999 merged 6 commits into
develfrom
MSstatsConvert/work/20260516_mzmine_converter

Conversation

@swaraj-neu
Copy link
Copy Markdown
Contributor

@swaraj-neu swaraj-neu commented May 19, 2026

Summary

Adds MZMinetoMSstatsFormat, a new converter for MZMine metabolomics output, mirroring the structure and conventions of DIANNtoMSstatsFormat. This is Phase 1 of a cross-package task to add metabolomics support to the MSstats family; Phase 2 (the BIO = "Metabolomics" value in MSstatsShiny) will be a separate PR in Vitek-Lab/MSstatsShiny that depends on this one.

What's added

  • R/clean_MZMine.R: internal .cleanRawMZMine() that handles the wide-to-long pivot, column normalization, and the optional compound-name join.
  • R/converters_MZMinetoMSstatsFormat.R — public converter. Signature mirrors DIANNtoMSstatsFormat with metabolomics-appropriate defaults (remove_shared_peptides = FALSE, no decoy/oxidation filters, IsotopeLabelType = "Light").
  • R/MSstatsConvert_core_functions.RMSstatsMZMineFiles class registration + MSstatsClean method dispatch. Dispatch is name-based in this package, so no other registration site needs editing.
  • Fixtures under inst/tinytest/raw_data/MZMine/: a wide-format quant CSV (6 features × 4 samples), spectral-library annotations CSV, and run-annotation CSV.
  • Tests at inst/tinytest/test_converters_MZMinetoMSstatsFormat.R: 38 assertions covering happy path, IsotopeLabelType, NA charge/fragment columns, highest-score-wins annotation policy, mz_rt fallback for unannotated features, and removeProtein_with1Feature behavior.
  • Vignette: new "Metabolomics with MZMine" section in vignettes/msstats_data_format.Rmd.

Column mapping

MSstats column Metabolomics meaning Source
ProteinName Compound name From mzmine_annotations (highest-scoring hit per feature), or mz_rt fallback
PeptideSequence Feature ID row ID from quant file
PrecursorCharge NA
FragmentIon NA
ProductCharge NA
IsotopeLabelType "Light" (matches DIANN's non-SILAC convention)
Run Sample run Per-sample column name, with Peak area stripped
Intensity Peak area Melted from per-sample columns

Design choices worth noting

Optional mzmine_annotations parameter. Most features in real MZMine datasets have no spectral-library annotation (in the example dataset I tested with, 69 of 2,569 features were annotated — about 3%). To make output biologically usable, MZMinetoMSstatsFormat accepts an optional mzmine_annotations data frame that joins compound names by highest score per feature. When NULL, every feature falls back to a mz_rt string (paste0(round(mz, 4), "_", round(rt, 2))). This is the simple spectral-library join only - not the full multi-source SIRIUS/MS2Query/CANOPUS consensus workflow some labs use.

Verification

  • tinytest::test_all("."): 732/732 pass, including the 38 new MZMine assertions.
  • R CMD check on MSstatsConvert: clean. One new NOTE (MZMinetoMSstatsFormat: no visible binding for global variable 'Intensity'); same pattern as DIANN/Skyline/Spectronaut.
  • Cross-package check: R CMD check on ../MSstats shows no MZMine-related findings (pre-existing WARNINGs/NOTEs only). I wasn't able to run cross-checks on MSstatsTMT, MSstatsPTM, or MSstatsLiP because they're not cloned in my immediate workspace - flagging in case you want to verify those before merge.

Heads-up for review

Column-name standardization affects user-facing Run values. .standardizeColnames strips spaces and dots from column names, so the per-sample column "sampleA.mzML Peak area" becomes sampleAmzMLPeakarea internally and the resulting Run value (after the Peakarea suffix strip) is sampleAmzML. This means a user's annotation file must use Run = sampleAmzML, not sampleA.mzML or sampleA. This is consistent with how every MSstatsConvert converter
behaves — but MZMine's column names are uglier than the others, so the transformation is more visible. Worth deciding whether docs should call this out more loudly, the converter should normalize the user's Run values internally, or it's fine as-is.

Closes Phase 1 of the metabolomics integration. Phase 2 (MSstatsShiny BIO = "Metabolomics") will follow in a separate PR.

Motivation and context — short summary

Adds first-class support for untargeted metabolomics MZMine exports to MSstatsConvert. The PR provides a converter and cleaning pipeline that pivots MZMine wide-format " Peak area" tables into MSstats long format, applies metabolomics-appropriate defaults (IsotopeLabelType = "Light", NA for charge/fragment fields), and requires a spectral-library annotations table to derive ProteinName (highest-score wins). Unlike an earlier design note, the implementation requires mzmine_annotations and drops features without a matching annotation (no mz_rt fallback).

Detailed changes

  • New public converter
    • Exported MZMinetoMSstatsFormat(input, annotation = NULL, mzmine_annotations, removeProtein_with1Feature = FALSE, ...) — R/converters_MZMinetoMSstatsFormat.R
      • Validates mzmine_annotations is provided (stops if missing).
      • Runs MSstatsImport → MSstatsClean → MSstatsPreprocess with metabolomics defaults:
        • IsotopeLabelType = "Light"
        • Fraction = 1
        • remove_shared_peptides = FALSE
        • remove_single_feature_proteins controlled by removeProtein_with1Feature
      • Converts Intensity == 0 → NA, balances design via MSstatsBalancedDesign, logs progress.
  • Cleaning internals
    • Added .cleanRawMZMine(msstats_object, mzmine_annotations) — R/clean_MZMine.R
      • Detects per-sample peak-area columns (suffix "Peakarea") and requires rowID metadata.
      • Enforces mzmine_annotations presence and required columns: id, compound_name, score.
      • Coerces score to numeric and errors if any coercion yields NA.
      • Orders annotations by id and descending score and deduplicates to highest-scoring hit per id.
      • Inner-joins annotations to quant table and drops quant rows without a matching annotation (no mz_rt fallback in implementation).
      • Sets PeptideSequence = rowID; PrecursorCharge/FragmentIon/ProductCharge = typed NA; strips trailing "Peakarea" from Run values; melts peak-area columns to Run/Intensity; logs success.
  • S4 integration
    • Added MSstatsMZMineFiles S4 class and registered method:
      • setClass("MSstatsMZMineFiles", contains = "MSstatsInputFiles")
      • setMethod("MSstatsClean", signature = "MSstatsMZMineFiles", .cleanRawMZMine) — R/MSstatsConvert_core_functions.R
  • Tests, fixtures, vignette, and examples
    • Fixtures added under inst/tinytest/raw_data/MZMine: mzmine_input.csv, annotation.csv, mzmine_annotations.csv.
    • New tinytest: inst/tinytest/test_converters_MZMinetoMSstatsFormat.R
      • Verifies schema (11 MSstats columns) and 16-row output for provided fixture (4 annotated features × 4 runs).
      • Tests metabolomics defaults and behavior documented below.
    • Vignette extended: vignettes/msstats_data_format.Rmd includes a "Metabolomics with MZMine" section and runnable example using bundled CSVs.
  • Documentation and package metadata
    • New man pages: man/MZMinetoMSstatsFormat.Rd, man/dot-cleanRawMZMine.Rd.
    • Updated man/MSstatsClean.Rd and man/MSstatsInputFiles.Rd to include MZMine method/class.
    • DESCRIPTION: added clean_MZMine.R and converters_MZMinetoMSstatsFormat.R to Collate.
    • NAMESPACE: exports MZMinetoMSstatsFormat.
    • .Rbuildignore and .gitignore: minor entries added/adjusted.

Unit tests added/modified

  • inst/tinytest/test_converters_MZMinetoMSstatsFormat.R (new; ~38 assertions)
    • Validates output has 11 columns and 16 rows for the provided fixture (4 annotated features × 4 runs).
    • Asserts IsotopeLabelType == "Light" for all rows.
    • Asserts PrecursorCharge, FragmentIon, ProductCharge are NA.
    • Asserts Fraction == 1.
    • Confirms highest-score-wins annotation policy (feature with duplicate annotations selects the highest-scoring compound_name).
    • Confirms specific peptide→protein mappings for fixture features (Caffeine, GlucoseHigh, Lactate).
    • Confirms features absent from mzmine_annotations are filtered out (no mz_rt fallback).
    • Confirms zero-intensity input cells convert to NA in output.
    • Confirms annotation merging with run metadata (Condition, BioReplicate).
    • Tests removeProtein_with1Feature = TRUE behavior: retains only proteins with ≥2 features (expected output: ProteinName == "Caffeine", 8 rows, peptides "1" and "6").
    • Author reports tinytest suite passes (732/732 total tests, including these 38).

Coding guidelines / notable issues

  • roxygen2 compatibility: A pre-existing roxygen2 docstring issue surfaced (an @inheritParams references a function that does not produce a discoverable topic). roxygen2 v7.3.3 fails while v8 tolerates it. To keep the PR scoped, document() was not re-run and the author edited DESCRIPTION; they recommend a separate housekeeping PR to fix the referenced ProteinProspector documentation so roxygen2 v7-compatible docs can be generated.
  • Behavior divergence from earlier design notes: Although earlier notes mentioned an mz_rt fallback for unannotated features, the current implementation requires mzmine_annotations and drops features without annotations. Reviewers should confirm this is intentional for Phase 1.
  • Column-name normalization caveat: MSstatsConvert's internal standardization strips spaces/dots from column names (e.g., "sampleA.mzML Peak area" → "sampleAmzML" after removing spaces/dots and the "Peakarea" suffix). This affects the Run values exposed to users and requires external annotation Run keys to match the standardized names; the vignette documents this but reviewers may prefer converter-side handling or clearer guidance in docs.
  • No other major coding-guideline violations detected; S4 class/method additions follow existing package patterns.

Review Change Stack

@swaraj-neu swaraj-neu requested a review from tonywu1999 May 19, 2026 00:24
@swaraj-neu swaraj-neu self-assigned this May 19, 2026
@swaraj-neu swaraj-neu added the enhancement New feature or request label May 19, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

Warning

Review limit reached

@swaraj-neu, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 12 minutes and 38 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aeaeca06-65ce-4c42-8974-547a50c5e3d5

📥 Commits

Reviewing files that changed from the base of the PR and between e4d8966 and 91feff3.

📒 Files selected for processing (1)
  • R/clean_MZMine.R
📝 Walkthrough

Walkthrough

Adds MZMine metabolomics support: new S4 class (MSstatsMZMineFiles), internal cleaner .cleanRawMZMine (requires annotation table, filters unannotated features, melts Peakarea columns), exported converter MZMinetoMSstatsFormat(), tests, and documentation/vignette with examples.

Changes

MZMine Metabolomics Converter

Layer / File(s) Summary
Package build configuration
.Rbuildignore, .gitignore, DESCRIPTION, NAMESPACE
Build ignore patterns updated to exclude doc and Meta; DESCRIPTION Collate extended to include clean_MZMine.R and converters_MZMinetoMSstatsFormat.R; NAMESPACE exports MZMinetoMSstatsFormat.
S4 class and method infrastructure
R/MSstatsConvert_core_functions.R
Adds MSstatsMZMineFiles S4 class and registers MSstatsClean method for it, delegating to .cleanRawMZMine.
MZMine raw data cleaning
R/clean_MZMine.R
Implements .cleanRawMZMine() that requires mzmine_annotations (id, compound_name, score), validates inputs, selects highest-score annotation per feature, drops unmatched features, melts Peakarea columns to long format, sets peptide/protein/charge placeholders, and returns a typed data.table.
Public converter API
R/converters_MZMinetoMSstatsFormat.R
Exports MZMinetoMSstatsFormat() that enforces mzmine_annotations, runs MSstatsImport + MSstatsClean, builds MSstats annotation, fills fixed feature columns (Fraction=1, IsotopeLabelType="Light"), converts zero intensities to NA, balances design, and returns the processed input.
Test suite
inst/tinytest/test_converters_MZMinetoMSstatsFormat.R
Tinytest validates output schema (11 columns), expected rows (16 rows = 4 annotated features × 4 runs), metabolomics defaults (IsotopeLabelType="Light", ion/charge fields NA, Fraction=1), annotation selection rules, filtering of unannotated features, zero→NA behavior, condition/BioReplicate propagation, and removeProtein_with1Feature=TRUE behavior.
Documentation
man/MSstatsClean.Rd, man/MSstatsInputFiles.Rd, man/MZMinetoMSstatsFormat.Rd, man/dot-cleanRawMZMine.Rd, vignettes/msstats_data_format.Rmd
Adds roxygen-generated docs for the new S4 method/class, exported MZMinetoMSstatsFormat(), internal .cleanRawMZMine(), and a vignette section with runnable examples; documents that mzmine_annotations is required and unmatched features are dropped.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • mstaniak

Poem

🐰 I hopped through peakarea rows with care,

Chose highest-score names from library fare,
Melted wide tables to tidy long song,
Kept only annotated features all along,
Now MSstats and metabolites sing along.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description is comprehensive and well-structured, covering motivation, detailed changes, column mapping, design decisions, verification, and implementation notes. However, it does not follow the provided template structure (missing explicit 'Motivation and Context', 'Changes' bullet list, 'Testing' section, and checklist). Consider reorganizing the description to explicitly match the template sections: 'Motivation and Context' (summary), 'Changes' (bullet list), 'Testing' (test coverage), and 'Checklist Before Requesting a Review'.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main change: adding the MZMinetoMSstatsFormat converter to the development branch, which is the primary objective of this pull request.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch MSstatsConvert/work/20260516_mzmine_converter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@swaraj-neu
Copy link
Copy Markdown
Contributor Author

Pre-existing roxygen2 issue surfaced during this work.

When re-documenting the package with roxygen2 7.3.3, the build fails with "In topic 'MSstatsClean': @inherits failed to find topic '.cleanRawProteinProspector'."

This is unrelated to MZMine. Somewhere in the package an @inheritParams .cleanRawProteinProspector tag points at a function whose roxygen block doesn't produce a discoverable topic. roxygen2 v8 tolerates this silently; v7.3.3 treats it as an error. To keep this PR scoped to MZMine, I worked around it by hand-editing DESCRIPTION rather than re-running document(). Worth a separate housekeeping PR to fix the ProteinProspector docstring.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
inst/tinytest/test_converters_MZMinetoMSstatsFormat.R (1)

98-100: ⚡ Quick win

Strengthen fallback coverage assertion to detect missing mz_rt IDs.

On Lines 98-100, all(ProteinName %in% expected_mz_rt) only checks “no unexpected values”; it does not guarantee every expected fallback appears. Use set equality on unique values to avoid false positives.

Proposed test assertion hardening
 expected_mz_rt = c("123.056_1.23", "245.129_3.45", "367.201_5.67",
                    "489.334_7.89", "555.447_9.1", "123.056_1.45")
-expect_true(all(as.character(output_nolib_dt$ProteinName) %in% expected_mz_rt))
+expect_equal(
+  sort(unique(as.character(output_nolib_dt$ProteinName))),
+  sort(expected_mz_rt)
+)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@inst/tinytest/test_converters_MZMinetoMSstatsFormat.R` around lines 98 - 100,
The test currently only asserts that there are no unexpected ProteinName values
using expected_mz_rt and output_nolib_dt$ProteinName, which can miss missing
expected mz_rt IDs; change the assertion to compare the sets (or sorted uniques)
so it verifies every expected fallback is present and no extras — e.g., replace
the all(... %in% ...) check with a set equality check between
unique(as.character(output_nolib_dt$ProteinName)) and expected_mz_rt (or sorted
variants) to ensure exact match.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@R/clean_MZMine.R`:
- Around line 55-56: The code sorts annotations using setorder(ann, id, -score)
but assumes score is numeric; coerce ann$score to numeric before ordering to
avoid mis-ranking when score is character/factor. Update the logic around the
ann data.table (before setorder) to convert score (e.g., ann[, score :=
as.numeric(score)]) and handle potential NAs (optionally warn if coercion
produces NA) so setorder and the subsequent unique(ann, by = "id") produce
correct ann_top results.

In `@R/converters_MZMinetoMSstatsFormat.R`:
- Around line 8-10: Update the parameter docs for annotation to state that the
`Run` column must match the standardized sample/run names produced by the
converter (i.e., after the function's column-name cleaning which strips the
trailing " Peak area" and normalizes spaces/dots), because the code (look for
the converter function that accepts `annotation` and the internal name-cleaning
logic that normalizes sample column names) uses those cleaned names when
joining; tell users to either provide `Run` values in that standardized form or
to run the same name-cleaning routine on their annotation before passing it in.
- Around line 54-55: The wrapper call to MSstatsImport in
converters_MZMinetoMSstatsFormat.R ignores the variadic arguments (...)
advertised for passing to data.table::fread; update the call where input is
assigned (the MSstatsImport invocation) to forward ... into MSstatsImport so
caller-supplied fread options are preserved, i.e., include ... in the
MSstatsImport argument list when calling MSstatsImport(list(input = input),
"MSstats", "MZMine") so the fread passthrough works as documented.

---

Nitpick comments:
In `@inst/tinytest/test_converters_MZMinetoMSstatsFormat.R`:
- Around line 98-100: The test currently only asserts that there are no
unexpected ProteinName values using expected_mz_rt and
output_nolib_dt$ProteinName, which can miss missing expected mz_rt IDs; change
the assertion to compare the sets (or sorted uniques) so it verifies every
expected fallback is present and no extras — e.g., replace the all(... %in% ...)
check with a set equality check between
unique(as.character(output_nolib_dt$ProteinName)) and expected_mz_rt (or sorted
variants) to ensure exact match.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 60c2c010-0294-4a60-b8ea-1991a39f0d27

📥 Commits

Reviewing files that changed from the base of the PR and between b9564f2 and 589ebb5.

⛔ Files ignored due to path filters (3)
  • inst/tinytest/raw_data/MZMine/annotation.csv is excluded by !**/*.csv
  • inst/tinytest/raw_data/MZMine/mzmine_annotations.csv is excluded by !**/*.csv
  • inst/tinytest/raw_data/MZMine/mzmine_input.csv is excluded by !**/*.csv
📒 Files selected for processing (13)
  • .Rbuildignore
  • .gitignore
  • DESCRIPTION
  • NAMESPACE
  • R/MSstatsConvert_core_functions.R
  • R/clean_MZMine.R
  • R/converters_MZMinetoMSstatsFormat.R
  • inst/tinytest/test_converters_MZMinetoMSstatsFormat.R
  • man/MSstatsClean.Rd
  • man/MSstatsInputFiles.Rd
  • man/MZMinetoMSstatsFormat.Rd
  • man/dot-cleanRawMZMine.Rd
  • vignettes/msstats_data_format.Rmd

Comment thread R/clean_MZMine.R Outdated
Comment thread R/converters_MZMinetoMSstatsFormat.R Outdated
Comment thread R/converters_MZMinetoMSstatsFormat.R Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@R/clean_MZMine.R`:
- Around line 55-57: The score column is being coerced with as.numeric directly
(ann[, score := suppressWarnings(as.numeric(score))]) which will return factor
internal codes if score is a factor; change the coercion to go through character
first (i.e., suppressWarnings(as.numeric(as.character(score)))) so factor levels
are converted correctly, keep the anyNA(ann$score) check and the existing stop
message unchanged to validate numeric coercion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 95a37b9b-01a2-4648-b3c5-3aa94dcfecbf

📥 Commits

Reviewing files that changed from the base of the PR and between 589ebb5 and 7ec49d1.

📒 Files selected for processing (4)
  • R/clean_MZMine.R
  • R/converters_MZMinetoMSstatsFormat.R
  • inst/tinytest/test_converters_MZMinetoMSstatsFormat.R
  • man/MZMinetoMSstatsFormat.Rd
✅ Files skipped from review due to trivial changes (1)
  • man/MZMinetoMSstatsFormat.Rd

Comment thread R/clean_MZMine.R Outdated
@swaraj-neu
Copy link
Copy Markdown
Contributor Author

@tonywu1999

End-to-end test:

MZMinetoMSstatsFormat against the biologist's master_annotation_table.csv

Same input file mzmine-summer25-microbial-results_iimn_gnps_quant.csv.
Same library annotations mzmine-summer25-microbial-results_annotations.csv.
Compared our converter's output to the master_annotation_table the notebook produces.

PART 1: ROW KEY COMPARISON

Their (feature, run) keys: 59087
Our (feature, run) keys: 59087
In both: 59087
Only in theirs: 0
Only in ours: 0

-> Every (feature, run) pair matches. No rows dropped, no extra rows.

PART 2: INTENSITY COMPARISON

status N
match 53491
both_NA 5596

Zero value mismatches.

-> All peak-area values agree. The 5,596 both-NA rows are where the
biologist's file has 0 and ours has NA (we convert zero -> NA on
purpose, per MSstats convention).

PART 3: COMPOUND NAME COMPARISON

Their annotated features (any source): 1796
sirius: 1679
library: 69
ms2query: 48

Their library-annotated features: 69
Our library-annotated features: 69

In both: 69
Only in theirs (we missed naming): 0
Only in ours (their score < 0.7 cutoff): 0

Name mismatches on the 69 shared library-annotated features: 0
-> Perfect agreement on every library-named feature.

Note on the 1,679 features named by SIRIUS in their pipeline:
those show up as mz_rt fallback strings in our output (e.g.
"455.282_0.65"), because Phase 1 only implements the library
annotation source, not SIRIUS / MS2Query / CANOPUS.

Comment thread inst/tinytest/raw_data/MZMine/mzmine_input.csv
Comment thread inst/tinytest/test_converters_MZMinetoMSstatsFormat.R Outdated
Comment thread inst/tinytest/raw_data/MZMine/mzmine_input.csv
Comment thread inst/tinytest/test_converters_MZMinetoMSstatsFormat.R Outdated
Comment thread vignettes/msstats_data_format.Rmd Outdated
Comment thread R/clean_MZMine.R Outdated
Comment thread R/clean_MZMine.R Outdated
Comment thread R/clean_MZMine.R Outdated
Comment thread R/clean_MZMine.R Outdated
Comment thread R/clean_MZMine.R Outdated
Brings metabolomics into the MSstats family by adding an MZMine converter that mirrors the structure of DIANNtoMSstatsFormat. Phase 1 of a two-phase task; Phase 2 (MSstatsShiny BIO=Metabolomics) will be a separate PR.
- Let MSstatsPreprocess fill IsotopeLabelType
- Hardcode IsotopeLabelType in converter
- Remove redundant inherited @params
- Simplify score coercion
- Improve non-numeric score error
- Rename `ann` → feature_to_compound
- Rename melt variable to Run
- Refactor compound-name assignment with explicit data.table join
@swaraj-neu swaraj-neu force-pushed the MSstatsConvert/work/20260516_mzmine_converter branch from 40a8d8a to 53fed81 Compare May 27, 2026 14:35
- Error on NULL/missing mzmine_annotations
- Drop quant-only unmatched features (no mz_rt fallback)
- Log retained feature IDs after join
- Update tests, vignette, and roxygen for filtering + MSI Level 2 scope
- Remove unused .cleanRawMZMine metadata requirements
removeProtein_with1Feature default unchanged (FALSE).
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
R/clean_MZMine.R (1)

54-59: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Coerce score through character before numeric ranking.

At Line 54, as.numeric(score) can mis-rank factor-typed values and select the wrong top annotation at Line 58/59.

Proposed fix
-    feature_to_compound[, score := suppressWarnings(as.numeric(score))]
+    feature_to_compound[, score := suppressWarnings(as.numeric(as.character(score)))]
#!/bin/bash
# Verify the current coercion pattern in the cleaner
rg -n -C2 'score := suppressWarnings\(as.numeric\(score\)\)' R/clean_MZMine.R
# Expected: one hit showing direct as.numeric(score) coercion.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@R/clean_MZMine.R` around lines 54 - 59, The score column coercion can
mis-rank factor values; update the transformation of feature_to_compound$score
to first convert to character and then to numeric (i.e., use
as.numeric(as.character(score)) or equivalent) before calling setorder on id and
-score and before unique-ing by id so that ranking uses correct numeric values;
modify the line where score is assigned (currently using
suppressWarnings(as.numeric(score))) to perform the two-step coercion and
preserve the suppressWarnings behavior if desired.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@R/clean_MZMine.R`:
- Around line 68-73: The retained features log currently uses
feature_to_compound$id which includes annotation IDs not present in the quant
table; change retained_ids to the IDs actually present in the joined quant rows
(i.e., extract unique IDs from the quant/annotation join result used in this
function) and use that list when building retained_msg and calling
getOption("MSstatsLog") / getOption("MSstatsMsg"); ensure you reference the
joined result variable (the object the rest of clean_MZMine.R uses for quant
rows) rather than feature_to_compound$id so the count and comma-separated list
reflect truly retained features.

---

Duplicate comments:
In `@R/clean_MZMine.R`:
- Around line 54-59: The score column coercion can mis-rank factor values;
update the transformation of feature_to_compound$score to first convert to
character and then to numeric (i.e., use as.numeric(as.character(score)) or
equivalent) before calling setorder on id and -score and before unique-ing by id
so that ranking uses correct numeric values; modify the line where score is
assigned (currently using suppressWarnings(as.numeric(score))) to perform the
two-step coercion and preserve the suppressWarnings behavior if desired.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5407ceaa-703b-40e4-ac3a-3ce842c3bf78

📥 Commits

Reviewing files that changed from the base of the PR and between 40a8d8a and e4d8966.

⛔ Files ignored due to path filters (3)
  • inst/tinytest/raw_data/MZMine/annotation.csv is excluded by !**/*.csv
  • inst/tinytest/raw_data/MZMine/mzmine_annotations.csv is excluded by !**/*.csv
  • inst/tinytest/raw_data/MZMine/mzmine_input.csv is excluded by !**/*.csv
📒 Files selected for processing (13)
  • .Rbuildignore
  • .gitignore
  • DESCRIPTION
  • NAMESPACE
  • R/MSstatsConvert_core_functions.R
  • R/clean_MZMine.R
  • R/converters_MZMinetoMSstatsFormat.R
  • inst/tinytest/test_converters_MZMinetoMSstatsFormat.R
  • man/MSstatsClean.Rd
  • man/MSstatsInputFiles.Rd
  • man/MZMinetoMSstatsFormat.Rd
  • man/dot-cleanRawMZMine.Rd
  • vignettes/msstats_data_format.Rmd
✅ Files skipped from review due to trivial changes (3)
  • .gitignore
  • NAMESPACE
  • man/MSstatsInputFiles.Rd
🚧 Files skipped from review as they are similar to previous changes (2)
  • man/MZMinetoMSstatsFormat.Rd
  • DESCRIPTION

Comment thread R/clean_MZMine.R
@swaraj-neu swaraj-neu requested a review from tonywu1999 May 27, 2026 19:21
@tonywu1999 tonywu1999 merged commit 4221ab4 into devel May 27, 2026
2 checks passed
@tonywu1999 tonywu1999 deleted the MSstatsConvert/work/20260516_mzmine_converter branch May 27, 2026 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants