feat: align QPX parquet output with quantms template schema by timosachsenberg · Pull Request #8974 · OpenMS/OpenMS

timosachsenberg · 2026-03-23T21:17:10Z

Summary

Align all 3 QPX parquet exports (feature, PSM, protein groups) with the quantms parquet exchange format specification
Rename output files to quantms.feature.parquet, quantms.psm.parquet, quantms.pg.parquet
Rename ConsensusFeatureExportSchema to QPXFeatureSchema and add new QPXPSMSchema (existing PSMSchema kept unchanged for the import path)
Fix column names, types, nullability, and struct field schemas to exactly match the quantms template

Changes

Schema & naming

Old	New
`precursor_charge` (int32)	`charge` (int16)
`reference_file_name`	`run_file_name`
`start_ion_mobility` / `stop_ion_mobility`	`ion_mobility_start` / `ion_mobility_stop`
`is_decoy` (int32)	`is_decoy` (bool)
`unique` (int32)	`unique` (bool)
`scan` (string/int32)	`scan` (list<int32>)
`cv_params` (string)	`cv_params` (list<struct>)
`intensities` (sample_accession/channel)	`intensities` (label/intensity)
`pg_accessions` (list<string>)	`pg_accessions` (list<struct>)

New columns

mass_error_ppm, missed_cleavages, pg_positions, id_run_file_name
PSM: cross_links, mz_array, intensity_array, charge_array, ion_type_array, ion_mobility_array

Removed columns

Feature: quality, score, score_type, spectrum_reference, feature_metavalues, scan_reference_file_name
PSM: score, score_type, higher_score_better, rank, peptide_identification_index, psm_metavalues, spectrum_metavalues, run_identifier, spectrum_reference

Protein groups

Fixed nullability on all struct child fields to match template (not null on label, intensity, score_name, score_value, etc.)

Files modified (9)

ArrowSchemaRegistry.h / .cpp — new QPXFeatureSchema + QPXPSMSchema
ConsensusMapArrowExport.cpp — rewritten feature export
QPXFile.cpp — rewritten PSM export
ProteinGroupArrowExport.cpp — nullability fixes
ProteomicsLFQ.cpp / IsobaricWorkflow.cpp — filename updates
ArrowSchemaRegistry_test.cpp / QPXFile_test.cpp — updated tests

Test plan

ArrowSchemaRegistry_test passes (schema validation)
QPXFile_test passes (PSM export round-trip)
TOPP_ProteomicsLFQ_qpx passes (end-to-end QPX generation)
TOPP_ProteomicsLFQ_qpx_features passes (feature schema check)
TOPP_ProteomicsLFQ_qpx_psms passes (PSM schema check)
TOPP_ProteomicsLFQ_qpx_pg passes (PG schema check)
All 3 output schemas verified identical to quantms template via pyarrow comparison

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added QPX PSM and feature Parquet/Arrow exports with charge, mass-error PPM, missed cleavages, is_decoy, ion mobility, spectral arrays, cross‑links, CV params, richer protein‑group annotations and protein-group positions.
Bug Fixes / Improvements
- Tightened field nullability and numeric types for more consistent, non-null outputs; adjusted export semantics and data shapes.
Tests
- Updated schema and export tests to validate the new PSM/feature/protein-group formats.
Chores
- Parquet output filenames standardized to quantms.*.

coderabbitai · 2026-03-23T21:31:57Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Replaces the legacy ConsensusFeature export with QPX-compliant Arrow/Parquet schemas: introduces QPXPSMSchema and QPXFeatureSchema, updates exporters to emit new columns and nested types, tightens nullability, revises PSM/feature export logic, and standardizes Parquet filenames to quantms.*.parquet.

Changes

Cohort / File(s)	Summary
Arrow Schema Definitions `src/openms/include/OpenMS/FORMAT/ArrowSchemaRegistry.h`, `src/openms/source/FORMAT/ArrowSchemaRegistry.cpp`	Removed `ConsensusFeatureExportSchema`; added `QPXPSMSchema` and `QPXFeatureSchema` with new column constants (e.g., `CHARGE`, `MASS_ERROR_PPM`, `RUN_FILE_NAME`, `MZ_ARRAY`, `INTENSITY_ARRAY`, `ION_MOBILITY_ARRAY`) and new nested type helpers (`modificationsType()`, `additionalScoresType()`, `cvParamsType()`, `crossLinksType()`, `additionalIntensitiesType()`, `pgAccessionsType()`, `pgPositionsType()`).
Feature / ConsensusMap Export `src/openms/source/FORMAT/ConsensusMapArrowExport.cpp`	Switched to `QPXFeatureSchema`: removed old builders, added `run_file_name`, `charge` (int16), `missed_cleavages`, `mass_error_ppm`, `pg_accessions`/`pg_positions`, ion-mobility start/stop; `scan` is now `list<int32>`; reworked modifications and protein-evidence encoding; `is_decoy` as boolean; introduced Reserve/Finish macros and schema validation against `QPXFeatureSchema`.
QPX PSM Export `src/openms/source/FORMAT/QPXFile.cpp`	Reworked PSM export to `QPXPSMSchema`: `charge` as int16, `calculated_mz`/`observed_mz` as float (non-null), added `mass_error_ppm`, `missed_cleavages`, `cv_params`, `cross_links`, spectral-array placeholders (`mz_array`, `intensity_array`, `charge_array`, `ion_type_array`, `ion_mobility_array`), `scan` as `list<int32>`, and changed default/null handling (e.g., non-null `run_file_name`, `is_decoy`).
Protein Group Export `src/openms/source/FORMAT/ProteinGroupArrowExport.cpp`	Tightened nullability of nested structs/lists (`intensities`, `additional_intensities`, `peptides`, `additional_scores`, `cv_params`); marked several top-level fields non-nullable (`pg_accessions`, `anchor_protein`, `run_file_name`, `is_decoy`, `peptides`); changed peptide-count emission (emit `0` instead of Null in specific cases).
Tests: Schema & Export Expectations `src/tests/class_tests/openms/source/ArrowSchemaRegistry_test.cpp`, `src/tests/class_tests/openms/source/QPXFile_test.cpp`	Replaced `ConsensusFeatureExportSchema` assertions with `QPXPSMSchema` (24 fields) and `QPXFeatureSchema` (31 fields) expectations; updated column names, ordering, types (e.g., `charge` INT16, mz fields FLOAT, `scan` list), and content checks to match the new schemas and export behavior.
Workflows, CLI & Test Harness `src/topp/IsobaricWorkflow.cpp`, `src/topp/ProteomicsLFQ.cpp`, `src/tests/topp/CMakeLists.txt`, `src/topp/ProteinQuantifier.cpp`	Updated Parquet output filenames/help text to `quantms.feature.parquet`, `quantms.psm.parquet`, `quantms.pg.parquet`; added optional `out_qpx` handling in ProteinQuantifier to emit the three QPX Parquet files and adjusted CTest expected file paths.

Sequence Diagram(s)

sequenceDiagram
    participant Exporter as Exporter (ConsensusMap / QPXFile / PGExport)
    participant SchemaReg as ArrowSchemaRegistry
    participant Arrow as Arrow Builders
    participant Parquet as Parquet Writer
    Exporter->>SchemaReg: request schema (QPXFeature/QPXPSM)
    SchemaReg-->>Exporter: return schema definition
    Exporter->>Arrow: create builders per schema
    Arrow-->>Exporter: builders ready
    Exporter->>Arrow: append rows (nested structs, lists, scalars)
    Arrow-->>Exporter: row builders filled
    Exporter->>Arrow: finish arrays
    Arrow-->>Exporter: arrays finalized
    Exporter->>Parquet: validate table against schema
    Parquet-->>Exporter: validation OK
    Exporter->>Parquet: write quantms.*.parquet
    Parquet-->>Exporter: write complete / file paths

Possibly related PRs

feat: centralize Arrow/Parquet schemas in ArrowSchemaRegistry #8970: Directly overlaps — modifies ArrowSchemaRegistry and introduces the new QPX PSM/Feature schemas referenced here.
Add FeatureMapArrowIO for Parquet-based FeatureMap round-trip #8768: Related changes to PSM Arrow/Parquet export implementation (QPXFile.cpp) and schema/column adjustments.
Add QPX feature-format Arrow/DataFrame/Parquet export for ConsensusMap #8729: Related feature-level QPX Arrow/Parquet export changes touching ConsensusMap export and schema updates.

Suggested reviewers

jpfeuffer
pjones

Poem

"I hopped through rows with carrot zeal,
New QPX fields tucked in my meal,
Parquet files named neat and bright,
Arrow nests tucked in the night,
The rabbit signs the schema — code feels real."

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: align QPX parquet output with quantms template schema' accurately and concisely summarizes the main change: aligning QPX parquet exports with the quantms specification and renaming output files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/qpx-quantms-schema-alignment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

src/tests/class_tests/openms/source/ArrowSchemaRegistry_test.cpp (1)
1030-1033: Assert the scan element type too.

This currently only proves that scan is a list. A regression to list<int64> would still pass even though this schema is supposed to pin scan to list<int32>.
Suggested assertion
   TEST_EQUAL(s->field(13)->type()->id(), arrow::Type::LIST)
   TEST_EQUAL(s->field(13)->nullable(), false)
+  const auto scan_type = std::static_pointer_cast<arrow::ListType>(s->field(13)->type());
+  TEST_EQUAL(scan_type->value_type()->id(), arrow::Type::INT32)
As per coding guidelines, src/tests/class_tests/**/*_test.cpp: Write unit tests for new functionality and ensure existing tests pass before suggesting changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tests/class_tests/openms/source/ArrowSchemaRegistry_test.cpp` around
lines 1030 - 1033, Add an assertion that the "scan" list's element type is
int32: locate the existing checks on s->field(13) (name "scan", type LIST) and
add an assertion that the list's element type id is arrow::Type::INT32 by
retrieving the list's value type (e.g. static_cast/ptr cast to arrow::ListType
and call value_type()->id() or value_field()->type()->id()) and comparing it to
arrow::Type::INT32 so a regression to list<int64> will fail.
src/openms/source/FORMAT/ArrowSchemaRegistry.cpp (1)
403-437: Factor the shared QPX nested types.

modificationsType(), additionalScoresType(), and cvParamsType() are duplicated verbatim between QPXPSMSchema and QPXFeatureSchema. Since both classes are supposed to stay template-identical, extracting these into shared internal helpers will make the next schema update much harder to apply inconsistently.

Also applies to: 485-519
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openms/source/FORMAT/ArrowSchemaRegistry.cpp` around lines 403 - 437,
Extract the duplicated Arrow nested type builders used in
QPXPSMSchema::modificationsType, QPXPSMSchema::additionalScoresType,
QPXPSMSchema::cvParamsType (and the identical methods in QPXFeatureSchema) into
shared internal helper functions (e.g., makeScoresType(), makePositionStruct(),
makeModificationsListType(), makeCvParamsListType) placed in an unnamed
namespace or as static helpers in this translation unit; replace the bodies of
QPXPSMSchema::modificationsType, ::additionalScoresType and ::cvParamsType (and
the corresponding QPXFeatureSchema methods) to call those helpers so both
classes reuse the same arrow::struct_/arrow::list constructions and avoid
duplication during future schema updates.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openms/source/FORMAT/ConsensusMapArrowExport.cpp`:
- Around line 607-669: The code currently deduplicates peptide evidences by
accession (seen_accs) which loses repeated mappings; instead, only use seen_accs
when populating the unique protein_accs vector (for anchor_protein/unique), but
collect and push an EvidenceInfo for every entry from
best_hit->getPeptideEvidences() into evidences. Concretely: in the loop over
best_hit->getPeptideEvidences(), always create and push an EvidenceInfo (ei) for
each ev, and only call seen_accs.insert(acc) / push_back into protein_accs when
the accession is first encountered; leave the export loop that appends to
pg_accessions_builder / pga_struct_b / pga_acc_b / pga_start_b / pga_end_b /
pga_pre_b / pga_post_b unchanged so every PeptideEvidence is emitted (also apply
the same change in the analogous block around lines 708-709).
- Around line 537-554: The code currently reads the spectrum reference from the
generic meta value "spectrum_reference" which misses the dedicated
PeptideIdentification member; update the block that builds the scan (the logic
around pep_ids, spec_ref, scan_builder and scan_val_b) to call
PeptideIdentification::getSpectrumReference() first (on pep_ids[0]) and assign
that to spec_ref, and only if that returns empty fall back to checking
metaValueExists("spectrum_reference") / getMetaValue("spectrum_reference"); then
pass spec_ref to extractScan and append scan_num to scan_val_b as before.

In `@src/openms/source/FORMAT/QPXFile.cpp`:
- Around line 199-203: The export currently uses
ProteinIdentification::getPrimaryMSRunPath() and drops the hit metavalue
"reference_file_name", causing every PSM to be stamped with the first
identifier-level file; change the logic in QPXFile.cpp to prefer a per-hit or
per-PeptideIdentification file reference (check hit meta
"run_file_name"/"reference_file_name" or PeptideIdentification::getMetaValue())
when available and only fall back to
ProteinIdentification::getPrimaryMSRunPath() if no per-PSM/PeptideIdentification
file info exists or if the identifier-level path is unambiguous; update the code
paths around the excluded_hit_mvs handling and where run_file_name is resolved
(the blocks that reference excluded_hit_mvs and build PSM output) to read
per-hit/PeptideIdentification meta first and preserve reference_file_name if
used at PSM granularity.

---

Nitpick comments:
In `@src/openms/source/FORMAT/ArrowSchemaRegistry.cpp`:
- Around line 403-437: Extract the duplicated Arrow nested type builders used in
QPXPSMSchema::modificationsType, QPXPSMSchema::additionalScoresType,
QPXPSMSchema::cvParamsType (and the identical methods in QPXFeatureSchema) into
shared internal helper functions (e.g., makeScoresType(), makePositionStruct(),
makeModificationsListType(), makeCvParamsListType) placed in an unnamed
namespace or as static helpers in this translation unit; replace the bodies of
QPXPSMSchema::modificationsType, ::additionalScoresType and ::cvParamsType (and
the corresponding QPXFeatureSchema methods) to call those helpers so both
classes reuse the same arrow::struct_/arrow::list constructions and avoid
duplication during future schema updates.

In `@src/tests/class_tests/openms/source/ArrowSchemaRegistry_test.cpp`:
- Around line 1030-1033: Add an assertion that the "scan" list's element type is
int32: locate the existing checks on s->field(13) (name "scan", type LIST) and
add an assertion that the list's element type id is arrow::Type::INT32 by
retrieving the list's value type (e.g. static_cast/ptr cast to arrow::ListType
and call value_type()->id() or value_field()->type()->id()) and comparing it to
arrow::Type::INT32 so a regression to list<int64> will fail.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6afe0f20-15eb-4ebe-8460-c8928ec91e9e

📥 Commits

Reviewing files that changed from the base of the PR and between c9a2677 and 1a3b1b8.

📒 Files selected for processing (9)

src/openms/include/OpenMS/FORMAT/ArrowSchemaRegistry.h
src/openms/source/FORMAT/ArrowSchemaRegistry.cpp
src/openms/source/FORMAT/ConsensusMapArrowExport.cpp
src/openms/source/FORMAT/ProteinGroupArrowExport.cpp
src/openms/source/FORMAT/QPXFile.cpp
src/tests/class_tests/openms/source/ArrowSchemaRegistry_test.cpp
src/tests/class_tests/openms/source/QPXFile_test.cpp
src/topp/IsobaricWorkflow.cpp
src/topp/ProteomicsLFQ.cpp

src/openms/source/FORMAT/ConsensusMapArrowExport.cpp

src/openms/source/FORMAT/QPXFile.cpp

coderabbitai

♻️ Duplicate comments (1)

src/openms/source/FORMAT/QPXFile.cpp (1)

187-195: ⚠️ Potential issue | 🟠 Major

Resolve run_file_name and scan at PSM granularity.

Line 199 strips reference_file_name / scan out of the metadata path, but Lines 431-454 only rebuild those columns from the identifier-level MS run list and parsed spectrum_reference. In merged IDs or imports that already carry per-hit / per-PeptideIdentification values, this still exports the wrong raw file (first MS run) or an empty scan list even though the PSM already has the source metadata.

Possible fix

-  std::map<String, String> id_to_filename;
+  std::map<String, StringList> id_to_filenames;
   for (const auto& prot_id : protein_identifications)
   {
     StringList ms_runs;
     prot_id.getPrimaryMSRunPath(ms_runs);
     if (!ms_runs.empty())
     {
-      id_to_filename[prot_id.getIdentifier()] = ms_runs[0];
+      id_to_filenames[prot_id.getIdentifier()] = ms_runs;
     }
   }

   static const std::unordered_set<std::string> excluded_hit_mvs = {
     "target_decoy", "predicted_RT", "predicted_rt", "ion_mobility", "IM",
-    "scan", "reference_file_name"
+    "scan", "reference_file_name", "run_file_name"
   };
 ...
-  auto fn_it = id_to_filename.find(pep_id.getIdentifier());
+  auto fn_it = id_to_filenames.find(pep_id.getIdentifier());
 ...
-      if (fn_it != id_to_filename.end())
+      if (hit.metaValueExists("run_file_name"))
+      {
+        (void)run_file_name_builder.Append(hit.getMetaValue("run_file_name").toString());
+      }
+      else if (hit.metaValueExists("reference_file_name"))
+      {
+        (void)run_file_name_builder.Append(hit.getMetaValue("reference_file_name").toString());
+      }
+      else if (pep_id.metaValueExists("run_file_name"))
+      {
+        (void)run_file_name_builder.Append(pep_id.getMetaValue("run_file_name").toString());
+      }
+      else if (pep_id.metaValueExists("reference_file_name"))
+      {
+        (void)run_file_name_builder.Append(pep_id.getMetaValue("reference_file_name").toString());
+      }
+      else if (fn_it != id_to_filenames.end() && fn_it->second.size() == 1)
       {
-        (void)run_file_name_builder.Append(fn_it->second);
+        (void)run_file_name_builder.Append(fn_it->second[0]);
       }
       else
       {
         (void)run_file_name_builder.Append("");
       }
 ...
-      if (!spec_ref.empty())
+      if (hit.metaValueExists("scan"))
+      {
+        (void)scan_value_b->Append(static_cast<int32_t>(static_cast<int>(hit.getMetaValue("scan"))));
+      }
+      else if (pep_id.metaValueExists("scan"))
+      {
+        (void)scan_value_b->Append(static_cast<int32_t>(static_cast<int>(pep_id.getMetaValue("scan"))));
+      }
+      else if (!spec_ref.empty())
       {
         Int scan_num = extractScan(spec_ref);

Also applies to: 199-203, 228-229, 431-454

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/openms/source/FORMAT/QPXFile.cpp` around lines 187 - 195, The export
currently uses id_to_filename (built from protein_identifications via
getPrimaryMSRunPath) and parsed spectrum_reference to populate
reference_file_name/run_file_name and scan, which ignores per-PSM or
per-PeptideIdentification metadata; update the logic in QPXFile.cpp (places
around id_to_filename population and the column rebuild at the section handling
spectrum_reference / scan around the existing spectrum parsing) to first check
and use per-hit or per-PeptideIdentification fields (e.g., inspect PeptideHit
metadata and PeptideIdentification members for run_file_name,
reference_file_name, and scan) and only fall back to
id_to_filename[prot_id.getIdentifier()] or parsing spectrum_reference when those
per-PSM values are absent, ensuring scan numbers and source file names are
resolved at PSM granularity.

🧹 Nitpick comments (2)

src/openms/source/FORMAT/QPXFile.cpp (1)
164-185: Use '\n' in the new error logs.

The added reserve/finish error paths still use std::endl, which forces an unnecessary flush on each branch. Please switch the new messages to '\n' to stay consistent with the repo logging guidance. As per coding guidelines, "Use OpenMS logging macros and OpenMS::LogStream; avoid std::cout/err directly; avoid std::endl for performance (use \n)"

Also applies to: 526-555, 563-565
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openms/source/FORMAT/QPXFile.cpp` around lines 164 - 185, Replace the use
of std::endl in the new Reserve/Finish error logging branches with '\n' to avoid
unnecessary flushes; specifically update the error messages that reference
charge_builder, pep_builder, is_decoy_builder, calculated_mz_builder,
observed_mz_builder, mass_error_ppm_builder, predicted_rt_builder,
run_file_name_builder, rt_builder, ion_mobility_builder, and
missed_cleavages_builder in QPXFile.cpp (and the similar occurrences around the
ranges noted: ~526-555 and 563-565) so each OPENMS_LOG_ERROR << ... ends with
'\n' instead of std::endl. Ensure no other behavior changes are made and
preserve the existing message text and ToString() usage.
src/tests/class_tests/openms/source/ArrowSchemaRegistry_test.cpp (1)
1023-1067: Assert the child type of the new primitive-list fields.

These sections only verify that scan, protein_accessions, mz_array, intensity_array, charge_array, ion_type_array, ion_mobility_array, gg_accessions, and gg_names are LIST. A regression like list<int32> → list<int64> or list<float32> → list<float64> would still pass, which weakens coverage for the schema-alignment change.
Example assertions
   TEST_EQUAL(s->field(13)->type()->id(), arrow::Type::LIST)
+  TEST_EQUAL(s->field(13)->type()->Equals(arrow::list(arrow::int32())), true)
...
   TEST_EQUAL(s->field(19)->type()->id(), arrow::Type::LIST)
+  TEST_EQUAL(s->field(19)->type()->Equals(arrow::list(arrow::float32())), true)
As per coding guidelines, "Write unit tests for new functionality and ensure existing tests pass before suggesting changes"

Also applies to: 1191-1194, 1247-1254
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tests/class_tests/openms/source/ArrowSchemaRegistry_test.cpp` around
lines 1023 - 1067, The tests only assert that many columns (e.g. "scan",
"protein_accessions", "mz_array", "intensity_array", "charge_array",
"ion_type_array", "ion_mobility_array", also "gg_accessions" and "gg_names") are
LISTs but do not assert the child element type; update the assertions to check
the list child type for each field by inspecting
s->field(N)->type()->field(0)->type()->id() (or equivalent Arrow API) and
compare to the correct arrow::Type (e.g., INT32 for scan/charge_array, FLOAT for
mz_array/intensity_array/ion_mobility_array, STRING/UTF8 for
protein_accessions/ion_type_array/gg_names, etc.); add these child-type checks
adjacent to the existing LIST assertions and apply the same fixes at the other
indicated blocks (around the other ranges cited).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/openms/source/FORMAT/QPXFile.cpp`:
- Around line 187-195: The export currently uses id_to_filename (built from
protein_identifications via getPrimaryMSRunPath) and parsed spectrum_reference
to populate reference_file_name/run_file_name and scan, which ignores per-PSM or
per-PeptideIdentification metadata; update the logic in QPXFile.cpp (places
around id_to_filename population and the column rebuild at the section handling
spectrum_reference / scan around the existing spectrum parsing) to first check
and use per-hit or per-PeptideIdentification fields (e.g., inspect PeptideHit
metadata and PeptideIdentification members for run_file_name,
reference_file_name, and scan) and only fall back to
id_to_filename[prot_id.getIdentifier()] or parsing spectrum_reference when those
per-PSM values are absent, ensuring scan numbers and source file names are
resolved at PSM granularity.

---

Nitpick comments:
In `@src/openms/source/FORMAT/QPXFile.cpp`:
- Around line 164-185: Replace the use of std::endl in the new Reserve/Finish
error logging branches with '\n' to avoid unnecessary flushes; specifically
update the error messages that reference charge_builder, pep_builder,
is_decoy_builder, calculated_mz_builder, observed_mz_builder,
mass_error_ppm_builder, predicted_rt_builder, run_file_name_builder, rt_builder,
ion_mobility_builder, and missed_cleavages_builder in QPXFile.cpp (and the
similar occurrences around the ranges noted: ~526-555 and 563-565) so each
OPENMS_LOG_ERROR << ... ends with '\n' instead of std::endl. Ensure no other
behavior changes are made and preserve the existing message text and ToString()
usage.

In `@src/tests/class_tests/openms/source/ArrowSchemaRegistry_test.cpp`:
- Around line 1023-1067: The tests only assert that many columns (e.g. "scan",
"protein_accessions", "mz_array", "intensity_array", "charge_array",
"ion_type_array", "ion_mobility_array", also "gg_accessions" and "gg_names") are
LISTs but do not assert the child element type; update the assertions to check
the list child type for each field by inspecting
s->field(N)->type()->field(0)->type()->id() (or equivalent Arrow API) and
compare to the correct arrow::Type (e.g., INT32 for scan/charge_array, FLOAT for
mz_array/intensity_array/ion_mobility_array, STRING/UTF8 for
protein_accessions/ion_type_array/gg_names, etc.); add these child-type checks
adjacent to the existing LIST assertions and apply the same fixes at the other
indicated blocks (around the other ranges cited).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c4dcc616-c842-4d4f-87e9-b4107cedbd53

📥 Commits

Reviewing files that changed from the base of the PR and between 1a3b1b8 and d8a9360.

📒 Files selected for processing (3)

src/openms/source/FORMAT/QPXFile.cpp
src/tests/class_tests/openms/source/ArrowSchemaRegistry_test.cpp
src/tests/topp/CMakeLists.txt

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/openms/source/FORMAT/ConsensusMapArrowExport.cpp (1)

434-456: ⚠️ Potential issue | 🔴 Critical

Guard findScoreType(...PEP) here.

Unlike QPXFile::exportToArrow, Line 437 is unguarded. IDScoreSwitcherAlgorithm::findScoreType() throws when no PEP-like score exists, which turns a missing optional score into a full export failure.

Possible fix

     if (has_id)
     {
-      auto result = idsa.findScoreType(pep_ids[0], IDScoreSwitcherAlgorithm::ScoreType::PEP);
+      IDScoreSwitcherAlgorithm::ScoreSearchResult result{};
+      try
+      {
+        result = idsa.findScoreType(pep_ids[0], IDScoreSwitcherAlgorithm::ScoreType::PEP);
+      }
+      catch (...)
+      {
+      }
       if (!result.score_name.empty() && best_hit)
       {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/openms/source/FORMAT/ConsensusMapArrowExport.cpp` around lines 434 - 456,
The call to idsa.findScoreType(pep_ids[0],
IDScoreSwitcherAlgorithm::ScoreType::PEP) can throw when no PEP-like score
exists; wrap or guard that call in a presence check or try/catch so a missing
PEP does not abort export. Specifically, before calling idsa.findScoreType use
the same existence check pattern used in QPXFile::exportToArrow (or catch the
exception from idsa.findScoreType), and then only access
result.score_name/is_main_score_type and use pep_builder.Append,
pep_builder.AppendNull, best_hit and pep_ids accordingly when a valid result is
returned; otherwise call pep_builder.AppendNull to preserve export.

♻️ Duplicate comments (3)

src/openms/source/FORMAT/ConsensusMapArrowExport.cpp (2)

612-618: ⚠️ Potential issue | 🟠 Major

Emit pg_positions from the evidences you already collected.

Lines 621-636 preserve accession/start/end for every peptide mapping, but Line 717 still writes pg_positions = null for every row. That drops the new positional column entirely and loses repeated mappings that pg_accessions now preserves.

Possible fix

-    // === pg_positions (empty for now — no positional data available at consensus level) ===
-    (void)pg_positions_builder.AppendNull();
+    (void)pg_positions_builder.Append();
+    for (const auto& ei : evidences)
+    {
+      if (ei.start == PeptideEvidence::UNKNOWN_POSITION || ei.end == PeptideEvidence::UNKNOWN_POSITION)
+      {
+        continue;
+      }
+      (void)pgp_struct_b->Append();
+      (void)pgp_acc_b->Append(ei.acc);
+      (void)pgp_start_b->Append(ei.start);
+      (void)pgp_end_b->Append(ei.end);
+    }

Also applies to: 640-677, 716-717

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/openms/source/FORMAT/ConsensusMapArrowExport.cpp` around lines 612 - 618,
The code collects per-peptide mapping info into protein_accs and evidences
(struct EvidenceInfo) but still emits pg_positions as null; update the export
logic that writes pg_positions to build and write a positions string from
evidences (one entry per protein_accs item, preserving repeats and ordering)
using each EvidenceInfo's start/end (and optionally pre/post if expected),
matching the pg_accessions output format (i.e., same delimiter and ordering as
protein_accs) so repeated mappings are represented; locate the writer that
currently sets pg_positions = null and replace it with code that iterates
evidences to produce the positions field.

609-610: ⚠️ Potential issue | 🟠 Major

Populate id_run_file_name instead of hard-coding nulls.

Line 610 makes this new column permanently null, even when the best hit or PeptideIdentification already carries reference_file_name / run_file_name. That drops identification-run provenance for multi-run consensus exports.

Possible fix

-    // === id_run_file_name ===
-    (void)id_run_file_name_builder.AppendNull();
+    // === id_run_file_name ===
+    String id_run_file_name;
+    if (best_hit && best_hit->metaValueExists("reference_file_name"))
+    {
+      id_run_file_name = best_hit->getMetaValue("reference_file_name").toString();
+    }
+    else if (best_hit && best_hit->metaValueExists("run_file_name"))
+    {
+      id_run_file_name = best_hit->getMetaValue("run_file_name").toString();
+    }
+    else if (!pep_ids.empty() && pep_ids[0].metaValueExists("reference_file_name"))
+    {
+      id_run_file_name = pep_ids[0].getMetaValue("reference_file_name").toString();
+    }
+    else if (!pep_ids.empty() && pep_ids[0].metaValueExists("run_file_name"))
+    {
+      id_run_file_name = pep_ids[0].getMetaValue("run_file_name").toString();
+    }
+    if (id_run_file_name.empty())
+    {
+      (void)id_run_file_name_builder.AppendNull();
+    }
+    else
+    {
+      (void)id_run_file_name_builder.Append(id_run_file_name);
+    }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/openms/source/FORMAT/ConsensusMapArrowExport.cpp` around lines 609 - 610,
The id_run_file_name column is always set to null by calling
id_run_file_name_builder.AppendNull(); instead populate it from the
identification provenance: extract run file name from the PeptideIdentification
(meta keys "reference_file_name" or "run_file_name") or from the best PeptideHit
if present, then call id_run_file_name_builder.Append(<that string>) instead of
AppendNull when a value exists; keep AppendNull only as the fallback. Update the
logic near ConsensusMapArrowExport.cpp where id_run_file_name_builder is used so
the builder appends the actual run name taken from the
PeptideIdentification/PeptideHit metadata (use the existing best-hit retrieval
code paths to find the value).

src/openms/source/FORMAT/QPXFile.cpp (1)

431-451: ⚠️ Potential issue | 🟠 Major

Only use the identifier-level run fallback when it is unambiguous.

The new fallback at Lines 447-450 still goes through id_to_filename, which currently keeps only ms_runs[0]. When one ProteinIdentification covers multiple primary MS runs, unresolved PSMs get stamped with the first file, so run_file_name becomes wrong for merged search results.

Possible fix

-  std::map<String, String> id_to_filename;
+  std::map<String, StringList> id_to_filenames;
   for (const auto& prot_id : protein_identifications)
   {
     StringList ms_runs;
     prot_id.getPrimaryMSRunPath(ms_runs);
     if (!ms_runs.empty())
     {
-      id_to_filename[prot_id.getIdentifier()] = ms_runs[0];
+      id_to_filenames[prot_id.getIdentifier()] = ms_runs;
     }
   }
...
-    auto fn_it = id_to_filename.find(pep_id.getIdentifier());
+    auto fn_it = id_to_filenames.find(pep_id.getIdentifier());
...
-        if (run_file.empty() && fn_it != id_to_filename.end())
+        if (run_file.empty() && fn_it != id_to_filenames.end() && fn_it->second.size() == 1)
         {
-          run_file = fn_it->second;
+          run_file = fn_it->second[0];
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/openms/source/FORMAT/QPXFile.cpp` around lines 431 - 451, The fallback to
identifier-level filenames can assign the wrong run when an identifier covers
multiple MS runs; modify the block that uses id_to_filename/fn_it so you only
append fn_it->second when that mapping is unambiguous: e.g., check that fn_it !=
id_to_filename.end() AND that the identifier maps to a single filename (or
id_to_filename contains exactly one entry for this identifier / a precomputed
unambiguous set) before using run_file = fn_it->second; update the logic around
run_file, fn_it, id_to_filename and run_file_name_builder in the same scope as
pep_id and hit to avoid defaulting to the first ms_runs entry when multiple runs
exist.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/openms/source/FORMAT/ConsensusMapArrowExport.cpp`:
- Around line 434-456: The call to idsa.findScoreType(pep_ids[0],
IDScoreSwitcherAlgorithm::ScoreType::PEP) can throw when no PEP-like score
exists; wrap or guard that call in a presence check or try/catch so a missing
PEP does not abort export. Specifically, before calling idsa.findScoreType use
the same existence check pattern used in QPXFile::exportToArrow (or catch the
exception from idsa.findScoreType), and then only access
result.score_name/is_main_score_type and use pep_builder.Append,
pep_builder.AppendNull, best_hit and pep_ids accordingly when a valid result is
returned; otherwise call pep_builder.AppendNull to preserve export.

---

Duplicate comments:
In `@src/openms/source/FORMAT/ConsensusMapArrowExport.cpp`:
- Around line 612-618: The code collects per-peptide mapping info into
protein_accs and evidences (struct EvidenceInfo) but still emits pg_positions as
null; update the export logic that writes pg_positions to build and write a
positions string from evidences (one entry per protein_accs item, preserving
repeats and ordering) using each EvidenceInfo's start/end (and optionally
pre/post if expected), matching the pg_accessions output format (i.e., same
delimiter and ordering as protein_accs) so repeated mappings are represented;
locate the writer that currently sets pg_positions = null and replace it with
code that iterates evidences to produce the positions field.
- Around line 609-610: The id_run_file_name column is always set to null by
calling id_run_file_name_builder.AppendNull(); instead populate it from the
identification provenance: extract run file name from the PeptideIdentification
(meta keys "reference_file_name" or "run_file_name") or from the best PeptideHit
if present, then call id_run_file_name_builder.Append(<that string>) instead of
AppendNull when a value exists; keep AppendNull only as the fallback. Update the
logic near ConsensusMapArrowExport.cpp where id_run_file_name_builder is used so
the builder appends the actual run name taken from the
PeptideIdentification/PeptideHit metadata (use the existing best-hit retrieval
code paths to find the value).

In `@src/openms/source/FORMAT/QPXFile.cpp`:
- Around line 431-451: The fallback to identifier-level filenames can assign the
wrong run when an identifier covers multiple MS runs; modify the block that uses
id_to_filename/fn_it so you only append fn_it->second when that mapping is
unambiguous: e.g., check that fn_it != id_to_filename.end() AND that the
identifier maps to a single filename (or id_to_filename contains exactly one
entry for this identifier / a precomputed unambiguous set) before using run_file
= fn_it->second; update the logic around run_file, fn_it, id_to_filename and
run_file_name_builder in the same scope as pep_id and hit to avoid defaulting to
the first ms_runs entry when multiple runs exist.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d36fbe18-3460-40fe-b266-6606c0522728

📥 Commits

Reviewing files that changed from the base of the PR and between d8a9360 and 3e623f1.

📒 Files selected for processing (4)

src/openms/source/FORMAT/ArrowSchemaRegistry.cpp
src/openms/source/FORMAT/ConsensusMapArrowExport.cpp
src/openms/source/FORMAT/QPXFile.cpp
src/tests/class_tests/openms/source/ArrowSchemaRegistry_test.cpp

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/topp/ProteinQuantifier.cpp (1)

996-1009: Extract the shared protein-annotation setup into one helper.

This block duplicates the mzTab export preparation above. Pulling the annotate/insert/swap sequence into a shared helper will keep the QPX and mzTab paths from drifting the next time inference_in_cxml handling changes.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/topp/ProteinQuantifier.cpp` around lines 996 - 1009, Extract the
duplicated annotate/insert/swap sequence into a single helper (e.g.,
annotateAndAttachProteins) and call it from both places: move the logic that
obtains protein_quants via quantifier.getProteinResults(), calls
quantifier.annotateQuantificationsToProteins(protein_quants, proteins_), and
then either inserts proteins_ into consensus.getProteinIdentifications() or
swaps with consensus.getProteinIdentifications()[0] depending on
inference_in_cxml into that helper; the helper should accept the quantifier (or
protein_quants), proteins_ (by reference), consensus (or a reference to
consensus.getProteinIdentifications()), and the inference_in_cxml flag so
callers simply invoke annotateAndAttachProteins(...) instead of duplicating the
block.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/topp/ProteinQuantifier.cpp`:
- Around line 1013-1016: Replace the silent error logs after each failed Parquet
export with a thrown OpenMS exception derived from Exception::Base: when
ConsensusMapArrowExport::exportToParquet(...) returns false (and the analogous
checks at the other two locations), throw an appropriate Exception (e.g.,
Exception::UnableToCreateFile or other Exception::Base-derived) constructed with
a clear message, __FILE__, __LINE__, and OPENMS_PRETTY_FUNCTION; remove the
OPENMS_LOG_ERROR-only path so the program fails fast on the first failed export.
- Around line 1062-1066: The thrown Exception::RequiredParameterNotGiven in the
conditional that checks out/peptide_out/out_qpx currently mentions "out_qpx"
even in builds without Parquet support; change the exception message so
"out_qpx" is only included when the same Parquet compile-time macro used to
register the out_qpx option is defined (i.e., mirror the macro used elsewhere
where out_qpx is conditionally registered), by building the message string
conditionally (via `#ifdef` the Parquet macro or by checking the same compile-time
flag) before calling Exception::RequiredParameterNotGiven and otherwise only
mentioning "out/peptide_out".

---

Nitpick comments:
In `@src/topp/ProteinQuantifier.cpp`:
- Around line 996-1009: Extract the duplicated annotate/insert/swap sequence
into a single helper (e.g., annotateAndAttachProteins) and call it from both
places: move the logic that obtains protein_quants via
quantifier.getProteinResults(), calls
quantifier.annotateQuantificationsToProteins(protein_quants, proteins_), and
then either inserts proteins_ into consensus.getProteinIdentifications() or
swaps with consensus.getProteinIdentifications()[0] depending on
inference_in_cxml into that helper; the helper should accept the quantifier (or
protein_quants), proteins_ (by reference), consensus (or a reference to
consensus.getProteinIdentifications()), and the inference_in_cxml flag so
callers simply invoke annotateAndAttachProteins(...) instead of duplicating the
block.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5f223088-85b1-4cc3-a280-8c0fb4c4b9c0

📥 Commits

Reviewing files that changed from the base of the PR and between 3e623f1 and 980cf1f.

📒 Files selected for processing (1)

src/topp/ProteinQuantifier.cpp

coderabbitai · 2026-03-23T22:23:49Z

src/topp/ProteinQuantifier.cpp

+      if (!ConsensusMapArrowExport::exportToParquet(consensus, out_qpx + "/quantms.feature.parquet"))
+      {
+        OPENMS_LOG_ERROR << "Failed to write features Parquet file" << std::endl;
+      }


⚠️ Potential issue | 🟠 Major

Fail fast when a requested Parquet export cannot be written.

Lines 1013, 1031, and 1037 only log and continue, so the tool can still return EXECUTION_OK with a partial quantms.*.parquet set. Please throw on the first failed export instead of continuing. As per coding guidelines, exceptions must derive from Exception::Base; throw with file/line/OPENMS_PRETTY_FUNCTION.

Also applies to: 1031-1034, 1037-1040

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/topp/ProteinQuantifier.cpp` around lines 1013 - 1016, Replace the silent error logs after each failed Parquet export with a thrown OpenMS exception derived from Exception::Base: when ConsensusMapArrowExport::exportToParquet(...) returns false (and the analogous checks at the other two locations), throw an appropriate Exception (e.g., Exception::UnableToCreateFile or other Exception::Base-derived) constructed with a clear message, __FILE__, __LINE__, and OPENMS_PRETTY_FUNCTION; remove the OPENMS_LOG_ERROR-only path so the program fails fast on the first failed export.

coderabbitai · 2026-03-23T22:23:49Z

src/topp/ProteinQuantifier.cpp

+    if (out.empty() && peptide_out.empty() && out_qpx.empty())
    {
      throw Exception::RequiredParameterNotGiven(__FILE__, __LINE__,
                                                 OPENMS_PRETTY_FUNCTION,
-                                                 "out/peptide_out");
+                                                 "out/peptide_out/out_qpx");


⚠️ Potential issue | 🟡 Minor

Don't advertise out_qpx in builds that don't compile Parquet support.

In non-Parquet builds, Line 1066 still tells users to provide out_qpx, but that option is not registered there. Please keep the exception text aligned with the compiled feature set.

Suggested fix

+#ifdef WITH_PARQUET + const char* required_outputs = "out/peptide_out/out_qpx"; +#else + const char* required_outputs = "out/peptide_out"; +#endif if (out.empty() && peptide_out.empty() && out_qpx.empty()) { throw Exception::RequiredParameterNotGiven(__FILE__, __LINE__, OPENMS_PRETTY_FUNCTION, - "out/peptide_out/out_qpx"); + required_outputs); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/topp/ProteinQuantifier.cpp` around lines 1062 - 1066, The thrown Exception::RequiredParameterNotGiven in the conditional that checks out/peptide_out/out_qpx currently mentions "out_qpx" even in builds without Parquet support; change the exception message so "out_qpx" is only included when the same Parquet compile-time macro used to register the out_qpx option is defined (i.e., mirror the macro used elsewhere where out_qpx is conditionally registered), by building the message string conditionally (via `#ifdef` the Parquet macro or by checking the same compile-time flag) before calling Exception::RequiredParameterNotGiven and otherwise only mentioning "out/peptide_out".

coderabbitai

🧹 Nitpick comments (1)

src/openms/include/OpenMS/FORMAT/ArrowSchemaRegistry.h (1)
204-208: Add @ingroup to the new public schema docs.

Both new exported structs have @brief, but they still need the module @ingroup tag so they get attached to the generated FORMAT/API docs.

As per coding guidelines, src/openms/include/**/*.h: "Doxygen documentation with @brief required (not auto-generated from first line); use @defgroup/@ingroup."

Also applies to: 247-251
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openms/include/OpenMS/FORMAT/ArrowSchemaRegistry.h` around lines 204 -
208, The Doxygen for the new exported struct QPXPSMSchema (and the nearby second
exported struct defined just below it) is missing an `@ingroup` tag so they don't
appear in the generated FORMAT/API docs; update the Doxygen block for
QPXPSMSchema and the adjacent exported struct to include the appropriate module
group (e.g., add an "@ingroup FORMAT" line) so both structs are attached to the
correct doxygen group per the project's documentation guidelines.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/openms/include/OpenMS/FORMAT/ArrowSchemaRegistry.h`:
- Around line 204-208: The Doxygen for the new exported struct QPXPSMSchema (and
the nearby second exported struct defined just below it) is missing an `@ingroup`
tag so they don't appear in the generated FORMAT/API docs; update the Doxygen
block for QPXPSMSchema and the adjacent exported struct to include the
appropriate module group (e.g., add an "@ingroup FORMAT" line) so both structs
are attached to the correct doxygen group per the project's documentation
guidelines.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8092531c-e3f4-4553-8aa3-c952db116935

📥 Commits

Reviewing files that changed from the base of the PR and between 980cf1f and 9b4b2b4.

📒 Files selected for processing (1)

src/openms/include/OpenMS/FORMAT/ArrowSchemaRegistry.h

Update the QPX feature, PSM, and protein group parquet exports to match the quantms parquet exchange format specification. This ensures interoperability with downstream quantms tools and viewers. Schema changes (all 3 files): - Rename ConsensusFeatureExportSchema -> QPXFeatureSchema - Add new QPXPSMSchema (keep PSMSchema for import path) - Column renames: precursor_charge->charge, reference_file_name-> run_file_name, start/stop_ion_mobility->ion_mobility_start/stop - Type fixes: is_decoy/unique int->bool, charge int32->int16, scan scalar->list<int32>, cv_params string->list<struct>, modifications position string->int32 with amino_acid+scores, intensities field names, pg_accessions list<string>->list<struct> - New columns: mass_error_ppm, missed_cleavages, pg_positions, id_run_file_name, cross_links, peak arrays (mz/intensity/charge/ ion_type/ion_mobility arrays) - Removed columns: quality, score, score_type, spectrum_reference, feature_metavalues, scan_reference_file_name, rank, peptide_identification_index, psm/spectrum_metavalues, run_identifier, higher_score_better - Correct nullability on all fields and struct children - Rename output files: features.parquet->quantms.feature.parquet, psms.parquet->quantms.psm.parquet, protein_groups.parquet->quantms.pg.parquet Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix CMakeLists.txt TOPP test references to use new filenames (quantms.feature/psm/pg.parquet instead of old names) - Add QPXPSMSchema unit tests to ArrowSchemaRegistry_test.cpp (field count, column names, types, nullability, validation) - Use actual finalized array length for MakeArrayOfNull in QPXFile.cpp instead of fragile pre-loop estimate Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Emit all PeptideEvidences in pg_accessions (not just first per accession), preserving repeated accessions at different positions - Use PeptideIdentification::getSpectrumReference() before falling back to metavalue "spectrum_reference" for scan extraction - Prefer per-hit/per-PeptideIdentification file reference for run_file_name in QPXFile, fall back to identifier-level path - Deduplicate shared Arrow type builders: QPXFeatureSchema delegates modificationsType/additionalScoresType/cvParamsType to QPXPSMSchema - Add scan list element type assertion (int32) in schema tests - Fix CMakeLists.txt QPX test filenames to use quantms.* naming Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add out_qpx parameter to ProteinQuantifier for exporting QPX parquet files (quantms.feature.parquet, quantms.psm.parquet, quantms.pg.parquet). Only supported for consensusXML input; warns and skips for featureXML and idXML inputs. Reuses existing ConsensusMapArrowExport, QPXFile, and ProteinGroupArrowExport classes. Protein quantifications are annotated to protein groups before export (sharing the same preparation as the mzTab export path). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Throw InvalidParameter instead of silently returning success when out_qpx is the only output and input is featureXML/idXML. When other outputs (out/peptide_out) are also set, still warn and skip QPX. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add brief descriptions to struct declarations and all static type helper methods to satisfy docstring coverage requirements. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

QPXFile::exportToArrow now restores PSMSchema output for lossless round-trips used by FeatureMapArrowIO and ConsensusMapArrowIO. New exportPSMsToQPXArrow produces QPXPSMSchema for exchange format. exportToParquet uses the QPX-specific method. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Update QPX output filenames to quantms.* scheme (#8974) - Add GenericWrapper removal (BREAKING) (#8981) - Add ModifiedSincSmoother new algorithm (#8217) - Add experimental BrukerTimsFile/BRUKER_TDF format support (#8975) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Update QPX output filenames to quantms.* scheme (#8974) - Add GenericWrapper removal (BREAKING) (#8981) - Add ModifiedSincSmoother new algorithm (#8217) - Add experimental BrukerTimsFile/BRUKER_TDF format support (#8975) Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

src/openms/source/FORMAT/ConsensusMapArrowExport.cpp Show resolved Hide resolved

src/openms/source/FORMAT/ConsensusMapArrowExport.cpp Show resolved Hide resolved

src/openms/source/FORMAT/QPXFile.cpp Show resolved Hide resolved

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

timosachsenberg and others added 9 commits March 24, 2026 10:05

docs: add doxygen comments to QPXPSMSchema and QPXFeatureSchema

71ca50a

Add brief descriptions to struct declarations and all static type helper methods to satisfy docstring coverage requirements. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test: update QPXFile_test to use exportPSMsToQPXArrow

47e3d50

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test: update ConsensusMapArrowExport_test for QPXFeatureSchema column…

11ddd49

… names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

timosachsenberg force-pushed the feature/qpx-quantms-schema-alignment branch from 696c43a to 11ddd49 Compare March 24, 2026 09:05

timosachsenberg enabled auto-merge (squash) March 24, 2026 10:53

timosachsenberg disabled auto-merge March 24, 2026 10:55

timosachsenberg merged commit bae1edb into develop Mar 24, 2026
41 of 42 checks passed

timosachsenberg deleted the feature/qpx-quantms-schema-alignment branch March 24, 2026 10:55

github-actions bot mentioned this pull request Mar 25, 2026

[changelog-sync] docs: sync CHANGELOG with recent changes (2026-03-25) #8990

Merged

Uh oh!

Conversation

timosachsenberg commented Mar 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Schema & naming

New columns

Removed columns

Protein groups

Files modified (9)

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

timosachsenberg commented Mar 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 23, 2026 •

edited

Loading