Skip to content

Make Arrow/Parquet a required dependency#8991

Merged
timosachsenberg merged 4 commits intodevelopfrom
feature/make-arrow-parquet-required
Mar 25, 2026
Merged

Make Arrow/Parquet a required dependency#8991
timosachsenberg merged 4 commits intodevelopfrom
feature/make-arrow-parquet-required

Conversation

@timosachsenberg
Copy link
Copy Markdown
Contributor

@timosachsenberg timosachsenberg commented Mar 25, 2026

Summary

  • Remove the WITH_PARQUET CMake option — Arrow/Parquet is now a required dependency (was already ON by default)
  • Remove all #ifdef WITH_PARQUET / #ifndef WITH_PARQUET preprocessor guards and their fallback code paths from 86 files
  • Remove target_compile_definitions(... WITH_PARQUET=1) from library and test CMake files
  • Remove -DWITH_PARQUET=ON from CI workflows (no longer needed)
  • Simplify pyOpenMS CMakeLists.txt to always build _arrow_zerocopy module
  • Clean up doxygen @cond WITH_PARQUET guards and doc placeholder generation

Test plan

  • Full CI matrix passes (Linux, macOS, Windows)
  • pyOpenMS wheel builds succeed
  • All Parquet/Arrow class tests pass
  • All TOPP tool tests (ParquetConverter, QPXConverter, OpenSwathWorkflow XIC, TargetedFileConverter parquet roundtrip) pass
  • pyOpenMS _arrow_zerocopy module loads and tests pass

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Build Changes
    • Arrow/Parquet are now required and enabled by default (no extra build flag).
  • New Features
    • Parquet import/export, converters and IO are available by default across tools, command‑line utilities and Python bindings.
  • Documentation
    • Docs and Python module text updated to reflect unconditional Arrow/Parquet availability.
  • Tests
    • Parquet-related tests and test registrations run unconditionally.

Arrow/Parquet is now always built — no CMake option needed. This removes
the WITH_PARQUET option, all #ifdef/#ifndef WITH_PARQUET preprocessor
guards, compile definitions, CMake conditionals, and CI flag overrides
across 86 files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

Removed the WITH_PARQUET build option and all conditional compilation around Arrow/Parquet; Arrow and Parquet discovery, targets, headers, sources, tests, TOPP tools, and pyOpenMS bindings are now required and built/registered unconditionally.

Changes

Cohort / File(s) Summary
CMake build & discovery
CMakeLists.txt, cmake/OpenMSConfig.cmake.in, cmake/cmake_findExternalLibs.cmake, cmake/package_general.cmake
Deleted WITH_PARQUET option and wrappers; Arrow/Parquet find_package and target selection now run unconditionally.
CI & capture scripts
.github/workflows/openms_ci_matrix_full.yml, .github/workflows/pyopenms-wheels-cibuildwheel.yml, tools/ci/capture-env.sh
Removed -DWITH_PARQUET=ON flags and removed WITH_PARQUET from env capture; CI no longer toggles Parquet via these envs.
Source lists & targets
src/openms/CMakeLists.txt, src/openms/include/OpenMS/FORMAT/sources.cmake, src/openms/source/FORMAT/sources.cmake, src/openms/source/FORMAT/DATAACCESS/sources.cmake, src/tests/class_tests/openms/CMakeLists.txt, src/tests/class_tests/openms/executables.cmake, src/tests/topp/CMakeLists.txt, src/topp/executables.cmake
Moved many Arrow/Parquet sources, tests and target linkages from conditional inclusion to unconditional; removed propagation of WITH_PARQUET compile definitions.
Public headers — Arrow/Parquet I/O
src/openms/include/OpenMS/FORMAT/... (ArrowSchemaRegistry.h, ParquetFile.h, QPXFile.h, ZipRandomAccessFile.h, ConsensusMapArrowExport.h, ConsensusMapArrowIO.h, FeatureMapArrowIO.h, MSExperimentArrowExport.h, ProteinGroupArrowExport.h, ProteinIdentificationArrowIO.h, etc.)
Removed #ifdef WITH_PARQUET guards so declarations and Arrow types are always visible to translation units.
Sources — Arrow/Parquet implementations
src/openms/source/... (ParquetFile.cpp, ArrowSchemaRegistry.cpp, ConsensusMapArrowExport.cpp, ConsensusMapArrowIO.cpp, FeatureMapArrowIO.cpp, MSExperimentArrowExport.cpp, ProteinGroupArrowExport.cpp, ProteinIdentificationArrowIO.cpp, QPXFile.cpp, ZipRandomAccessFile.cpp, XICParquetFile.cpp, XIMParquetFile.cpp, ParquetFilter.cpp, etc.)
Removed compile-time guards; implementations and Arrow/Parquet includes now always compiled.
Parquet consumers & OpenSwath
src/openms/source/FORMAT/DATAACCESS/MSChromatogramParquetConsumer.cpp, src/openms/source/FORMAT/DATAACCESS/MobilogramParquetConsumer.cpp, src/openms/source/ANALYSIS/OPENSWATH/*
Consumers, readers and writers now always include and execute Parquet logic; previous NotImplemented/no-op fallbacks removed.
TOPP tools & integrations
src/topp/* (IsobaricWorkflow.cpp, OpenSwathAssayGenerator.cpp, OpenSwathDecoyGenerator.cpp, OpenSwathWorkflow.cpp, ProteinQuantifier.cpp, ProteomicsLFQ.cpp, TargetedFileConverter.cpp, TextExporter.cpp, ParquetConverter.cpp, QPXConverter.cpp)
Parquet/QPX inputs, outputs and export paths are now registered and compiled unconditionally; Doxygen conditionals removed.
pyOpenMS build, bindings & docs
src/pyOpenMS/CMakeLists.txt, src/pyOpenMS/bindings/* (arrow_zerocopy.cpp, bind_format.cpp), src/pyOpenMS/pxds/*, src/pyOpenMS/pyopenms/*, src/pyOpenMS/CLAUDE.md, src/pyOpenMS/README.md
Unconditionally build _arrow_zerocopy, always register/link Arrow target, and always expose Parquet-related Python bindings; docs/comments updated to remove WITH_PARQUET wording.
Tests
src/tests/... and src/pyOpenMS/tests/... (many listed in summary)
Removed WITH_PARQUET guards from unit/integration tests; Parquet-related tests and test registrations are now compiled and executed unconditionally; some skip logic simplified/removed.
Docs & agent guidance
AGENTS.md, doc/CMakeLists.txt, doc/doxygen/public/TOPP.doxygen
Marked Arrow/Parquet as required/always enabled; removed placeholder doc-generation for missing Parquet and removed Doxygen conditional blocks.

Sequence Diagram(s)

sequenceDiagram
    participant Dev as CI/Developer
    participant CMake
    participant Finder as "find_package(Arrow/Parquet)"
    participant Compiler
    participant Runtime as "TOPP / pyOpenMS"

    Dev->>CMake: configure (no WITH_PARQUET)
    CMake->>Finder: discover Arrow & Parquet (required)
    Finder-->>CMake: provide OPENMS_ARROW_TARGET / OPENMS_PARQUET_TARGET
    CMake->>Compiler: add Arrow/Parquet targets, sources, modules
    Compiler-->>Runtime: build binaries & Python modules with Arrow/Parquet
    Runtime->>Finder: call Arrow/Parquet APIs (I/O, consumers, bindings)
Loading

Possibly related PRs

Suggested reviewers

  • jpfeuffer
  • pjones
  • poshul

Poem

🐰 In my burrow I code and play,
Guards hopped off and ran away.
Arrow pipes now hum and sing,
Parquet doors forever spring. 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely summarizes the main objective: making Arrow/Parquet a required dependency instead of optional, removing all conditional compilation and build flags.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/make-arrow-parquet-required

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/topp/OpenSwathWorkflow.cpp (2)

1269-1276: ⚠️ Potential issue | 🟠 Major

Preserve the parent directory when prefixing multi-run -out_chrom files.

This rewrite prefixes the whole path, not just the filename. results/chrom.xic becomes sample_results/chrom.xic, and /tmp/chrom.xic becomes sample_/tmp/chrom.xic, so multi-run chromatogram output can land in the wrong location or fail. Please mirror the File::path / File::basename split already used for out_mobilogram.

Suggested fix
     String out_chrom_current = out_chrom;
     if (!out_chrom.empty() && run_groups.size() > 1)
     {
-      // For multi-run, use basename prefix to make unique filenames
-      String base_name = out_chrom.substr(0, out_chrom.find_last_of('.'));
-      String extension = out_chrom.substr(out_chrom.find_last_of('.'));
-      out_chrom_current = file_basename + "_" + base_name + extension;
+      // Preserve parent directory when creating per-run filenames.
+      String parent = File::path(out_chrom);
+      String filename = File::basename(out_chrom);
+      String stem = filename.substr(0, filename.find_last_of('.'));
+      String extension = filename.substr(filename.find_last_of('.'));
+      String fname_with_prefix = file_basename + "_" + stem + extension;
+      out_chrom_current = (parent == "." ? fname_with_prefix : parent + "/" + fname_with_prefix);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/topp/OpenSwathWorkflow.cpp` around lines 1269 - 1276, The current
multi-run prefixing logic incorrectly prefixes the entire out_chrom path; update
the code that sets out_chrom_current (using out_chrom, file_basename) to split
out_chrom into directory path and basename (mirror the approach used for
out_mobilogram/File::path and File::basename), then prefix only the basename
with file_basename and rejoin with the original directory so the parent
directory is preserved; locate the logic around out_chrom_current/out_chrom and
replace the basename/ext manipulation with a path+basename split, prefix
basename with file_basename + "_" and append extension, then combine path +
prefixed_basename to form out_chrom_current.

980-993: ⚠️ Potential issue | 🔴 Critical

-append_oswpq currently drops existing archive contents.

For .oswpq outputs, this path always writes into a fresh temp directory and then deletes/rebuilds the target archive. Nothing hydrates parquet_dir from the existing .oswpq, so a second invocation with -append_oswpq overwrites prior runs instead of appending them. Please seed the temp directory from the existing archive before write(), and only replace the final archive after the new package is assembled successfully.

Also applies to: 1346-1367

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/topp/OpenSwathWorkflow.cpp` around lines 980 - 993, The append flow for
.oswpq currently always creates an empty parquet_dir and rebuilds the archive,
losing prior runs; update the logic around parquet_temp_dir/parquet_dir (the
blocks using getFlag_("append_oswpq"), parquet_temp_dir, parquet_dir and calling
OpenSwathOSWParquetWriter::write()) to, when append_oswpq is set and the target
.oswpq exists, first extract/unpack the existing .oswpq into parquet_dir so
existing run data is present before calling parquet_writer.write(), and only
replace the final .oswpq after the new write succeeds (create the new archive in
a temp file and atomically rename/replace the original). Apply the same change
to the other identical block (around the later code referenced at the second
instance) so both code paths hydrate the temp dir from the existing archive and
only overwrite the archive upon successful assembly.
src/pyOpenMS/tests/unittests/test_XIMParquetFile.py (1)

30-38: ⚠️ Potential issue | 🟠 Major

Stop skipping the required parquet bindings.

After this PR, missing XIMParquetFile or a "Parquet support" constructor failure is a regression, not a supported configuration. Keeping these pytest.skip() paths can let a broken pyOpenMS build pass this suite as skipped instead of failed.

Suggested fix
 def _get_xim():
     import pyopenms as poms

-    # Check if XIMParquetFile class exists
-    if not hasattr(poms, "XIMParquetFile"):
-        pytest.skip("pyopenms built without parquet support")
-
-    try:
-        return poms.XIMParquetFile(_xim_path())
-    except RuntimeError as e:
-        if "Parquet support" in str(e):
-            pytest.skip("pyopenms built without parquet support")
-        raise
+    if not hasattr(poms, "XIMParquetFile"):
+        pytest.fail("XIMParquetFile must be available when Arrow/Parquet is required")
+    return poms.XIMParquetFile(_xim_path())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/pyOpenMS/tests/unittests/test_XIMParquetFile.py` around lines 30 - 38,
The test currently skips when the XIMParquetFile class is missing or when its
constructor raises a RuntimeError containing "Parquet support", which hides
regressions; remove the pytest.skip paths and instead assert
presence/constructability: replace the hasattr(poms, "XIMParquetFile") check and
the try/except around poms.XIMParquetFile(_xim_path()) with an explicit
assertion that XIMParquetFile exists and that constructing it does not raise (or
let any exception bubble so the test fails), referencing XIMParquetFile and the
_xim_path() call to locate the code to change.
src/pyOpenMS/tests/unittests/test_XICParquetFile.py (1)

30-38: ⚠️ Potential issue | 🟠 Major

Stop skipping the required parquet bindings.

This helper still treats missing XICParquetFile or a "Parquet support" constructor error as skippable. With Arrow/Parquet now mandatory, that will hide a broken pyOpenMS build instead of failing the regression test.

Suggested fix
 def _get_xic():
     import pyopenms as poms

-    # Check if XICParquetFile class exists
-    if not hasattr(poms, 'XICParquetFile'):
-        pytest.skip("pyopenms built without parquet support")
-
-    try:
-        return poms.XICParquetFile(_xic_path())
-    except RuntimeError as e:
-        if "Parquet support" in str(e):
-            pytest.skip("pyopenms built without parquet support")
-        raise
+    if not hasattr(poms, "XICParquetFile"):
+        pytest.fail("XICParquetFile must be available when Arrow/Parquet is required")
+    return poms.XICParquetFile(_xic_path())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/pyOpenMS/tests/unittests/test_XICParquetFile.py` around lines 30 - 38,
Remove the logic that treats missing parquet bindings as skippable: do not call
pytest.skip when poms lacks XICParquetFile or when constructing
poms.XICParquetFile(_xic_path()) raises a RuntimeError mentioning "Parquet
support"; instead assert or call pytest.fail so the test fails loudly. Locate
the helper that references poms and XICParquetFile (the hasattr check and the
try/except around poms.XICParquetFile(_xic_path())) and replace the skip
branches with explicit failures (e.g., assert hasattr(poms, 'XICParquetFile') or
pytest.fail with a clear message, and re-raise or pytest.fail on constructor
RuntimeError) so missing Arrow/Parquet bindings cause test failures rather than
skipped tests.
🧹 Nitpick comments (2)
src/pyOpenMS/pyopenms/addons/msexperiment.py (1)

157-167: Updated warning message is appropriate.

The removal of the -DWITH_PARQUET=ON rebuild suggestion is correct since this flag no longer exists. However, now that Arrow/Parquet is a required dependency, this import failure path should be rare (indicating a broken/incomplete installation). Consider whether the warning should reflect that this is an unexpected state rather than a normal fallback scenario.

💡 Optional: Clarify that this is an unexpected state
     try:
         from pyopenms._arrow_zerocopy import spectra_to_arrow, chromatograms_to_arrow
         _use_zerocopy = True
     except ImportError:
         _use_zerocopy = False
         warnings.warn(
-            "pyopenms._arrow_zerocopy not available — falling back to slow Python "
-            "Arrow export. This module requires Arrow/Parquet for 4-14x faster export.",
+            "pyopenms._arrow_zerocopy not available — falling back to slower Python "
+            "Arrow export. This is unexpected as Arrow/Parquet is a required dependency. "
+            "The installation may be incomplete.",
             stacklevel=2,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/pyOpenMS/pyopenms/addons/msexperiment.py` around lines 157 - 167, The
ImportError fallback treating missing pyopenms._arrow_zerocopy as a normal path
should be updated to reflect that Arrow/Parquet are required and the import
failure indicates a broken/incomplete installation; in the except ImportError
block (symbols: pyopenms._arrow_zerocopy, spectra_to_arrow,
chromatograms_to_arrow, _use_zerocopy) set _use_zerocopy = False but change the
warnings.warn message to state this is unexpected (e.g.,
"pyopenms._arrow_zerocopy not found — Arrow/Parquet are required; this indicates
a broken or incomplete installation, falling back to slower Python export") and
keep stacklevel=2 so diagnostics point to user code.
src/tests/class_tests/openms/CMakeLists.txt (1)

81-116: Formatting: inconsistent indentation for library arguments.

The unconditional target_link_libraries calls have no indentation for the library arguments, whereas the guarded blocks below (lines 118-147) properly indent the arguments. Consider aligning for consistency.

🎨 Suggested formatting fix
 target_link_libraries(Arrow_test
-${OPENMS_ARROW_TARGET}
-${OPENMS_PARQUET_TARGET}
+  ${OPENMS_ARROW_TARGET}
+  ${OPENMS_PARQUET_TARGET}
 )
 target_link_libraries(MSExperimentArrowExport_test
-${OPENMS_ARROW_TARGET}
-${OPENMS_PARQUET_TARGET}
+  ${OPENMS_ARROW_TARGET}
+  ${OPENMS_PARQUET_TARGET}
 )
 target_link_libraries(ConsensusMapArrowExport_test
-${OPENMS_ARROW_TARGET}
-${OPENMS_PARQUET_TARGET}
+  ${OPENMS_ARROW_TARGET}
+  ${OPENMS_PARQUET_TARGET}
 )
 target_link_libraries(QPXFile_test
-${OPENMS_ARROW_TARGET}
-${OPENMS_PARQUET_TARGET}
+  ${OPENMS_ARROW_TARGET}
+  ${OPENMS_PARQUET_TARGET}
 )
 target_link_libraries(ProteinIdentificationArrowIO_test
-${OPENMS_ARROW_TARGET}
-${OPENMS_PARQUET_TARGET}
+  ${OPENMS_ARROW_TARGET}
+  ${OPENMS_PARQUET_TARGET}
 )
 target_link_libraries(FeatureMapArrowIO_test
-${OPENMS_ARROW_TARGET}
-${OPENMS_PARQUET_TARGET}
+  ${OPENMS_ARROW_TARGET}
+  ${OPENMS_PARQUET_TARGET}
 )
 target_link_libraries(ConsensusMapArrowIO_test
-${OPENMS_ARROW_TARGET}
-${OPENMS_PARQUET_TARGET}
+  ${OPENMS_ARROW_TARGET}
+  ${OPENMS_PARQUET_TARGET}
 )
 target_link_libraries(Libzip_test
-${OPENMS_ARROW_TARGET}
-${OPENMS_PARQUET_TARGET}
+  ${OPENMS_ARROW_TARGET}
+  ${OPENMS_PARQUET_TARGET}
 )
 target_link_libraries(ZipRandomAccessFile_test
-${OPENMS_ARROW_TARGET}
-${OPENMS_PARQUET_TARGET}
+  ${OPENMS_ARROW_TARGET}
+  ${OPENMS_PARQUET_TARGET}
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tests/class_tests/openms/CMakeLists.txt` around lines 81 - 116, The
target_link_libraries calls for Arrow_test, MSExperimentArrowExport_test,
ConsensusMapArrowExport_test, QPXFile_test, ProteinIdentificationArrowIO_test,
FeatureMapArrowIO_test, ConsensusMapArrowIO_test, Libzip_test, and
ZipRandomAccessFile_test have unindented library arguments; update each of these
target_link_libraries invocations so that the library names
(${OPENMS_ARROW_TARGET} and ${OPENMS_PARQUET_TARGET}) are indented on the
following line (matching the style used in the guarded blocks) to ensure
consistent CMake indentation and readability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/pyOpenMS/CMakeLists.txt`:
- Around line 303-326: Replace the permissive fallback with a mandatory,
versioned Arrow lookup and pick the same static/shared target as core OpenMS: if
OPENMS_ARROW_TARGET is not set, call find_package(Arrow 23 CONFIG REQUIRED) (not
QUIET), then set OPENMS_ARROW_TARGET to Arrow::arrow_static or
Arrow::arrow_shared based on ARROW_USE_STATIC (or by checking TARGET
Arrow::arrow_static / Arrow::arrow_shared) so the chosen target matches core;
only add/build the _arrow_zerocopy module (nanobind_add_module and
target_link_libraries(_arrow_zerocopy ... ${OPENMS_ARROW_TARGET})) when
OPENMS_ARROW_TARGET is defined.

In `@src/pyOpenMS/tests/unittests/test_arrow_zerocopy.py`:
- Around line 13-18: The test currently swallows ImportError for
pyopenms._arrow_zerocopy by skipping the module, which hides build/package
regressions; instead remove the try/except skip and import spectra_to_arrow and
chromatograms_to_arrow directly from pyopenms._arrow_zerocopy (or, if you prefer
an explicit failure, replace the pytest.skip in the except with pytest.fail) so
that an ImportError fails the test and surfaces the broken build. Use the
module/function names pyopenms._arrow_zerocopy, spectra_to_arrow, and
chromatograms_to_arrow to locate and update the import.

In `@src/tests/class_tests/openms/executables.cmake`:
- Around line 294-303: The test target OpenSwathOSWParquetRoundTrip_test is
registered unconditionally but depends on OpenSwath headers; move the symbol
OpenSwathOSWParquetRoundTrip_test out of the unconditional list(APPEND
format_executables_list and place it inside the existing NOT DISABLE_OPENSWATH
guarded block where TransitionParquetFile_test, OpenSwathOSWParquetReader_test,
and OpenSwathOSWParquetWriter_test are registered so the test is only added when
DISABLE_OPENSWATH is off; ensure you remove it from the top-level list and add
it alongside those three test names in the guarded block.

---

Outside diff comments:
In `@src/pyOpenMS/tests/unittests/test_XICParquetFile.py`:
- Around line 30-38: Remove the logic that treats missing parquet bindings as
skippable: do not call pytest.skip when poms lacks XICParquetFile or when
constructing poms.XICParquetFile(_xic_path()) raises a RuntimeError mentioning
"Parquet support"; instead assert or call pytest.fail so the test fails loudly.
Locate the helper that references poms and XICParquetFile (the hasattr check and
the try/except around poms.XICParquetFile(_xic_path())) and replace the skip
branches with explicit failures (e.g., assert hasattr(poms, 'XICParquetFile') or
pytest.fail with a clear message, and re-raise or pytest.fail on constructor
RuntimeError) so missing Arrow/Parquet bindings cause test failures rather than
skipped tests.

In `@src/pyOpenMS/tests/unittests/test_XIMParquetFile.py`:
- Around line 30-38: The test currently skips when the XIMParquetFile class is
missing or when its constructor raises a RuntimeError containing "Parquet
support", which hides regressions; remove the pytest.skip paths and instead
assert presence/constructability: replace the hasattr(poms, "XIMParquetFile")
check and the try/except around poms.XIMParquetFile(_xim_path()) with an
explicit assertion that XIMParquetFile exists and that constructing it does not
raise (or let any exception bubble so the test fails), referencing
XIMParquetFile and the _xim_path() call to locate the code to change.

In `@src/topp/OpenSwathWorkflow.cpp`:
- Around line 1269-1276: The current multi-run prefixing logic incorrectly
prefixes the entire out_chrom path; update the code that sets out_chrom_current
(using out_chrom, file_basename) to split out_chrom into directory path and
basename (mirror the approach used for out_mobilogram/File::path and
File::basename), then prefix only the basename with file_basename and rejoin
with the original directory so the parent directory is preserved; locate the
logic around out_chrom_current/out_chrom and replace the basename/ext
manipulation with a path+basename split, prefix basename with file_basename +
"_" and append extension, then combine path + prefixed_basename to form
out_chrom_current.
- Around line 980-993: The append flow for .oswpq currently always creates an
empty parquet_dir and rebuilds the archive, losing prior runs; update the logic
around parquet_temp_dir/parquet_dir (the blocks using getFlag_("append_oswpq"),
parquet_temp_dir, parquet_dir and calling OpenSwathOSWParquetWriter::write())
to, when append_oswpq is set and the target .oswpq exists, first extract/unpack
the existing .oswpq into parquet_dir so existing run data is present before
calling parquet_writer.write(), and only replace the final .oswpq after the new
write succeeds (create the new archive in a temp file and atomically
rename/replace the original). Apply the same change to the other identical block
(around the later code referenced at the second instance) so both code paths
hydrate the temp dir from the existing archive and only overwrite the archive
upon successful assembly.

---

Nitpick comments:
In `@src/pyOpenMS/pyopenms/addons/msexperiment.py`:
- Around line 157-167: The ImportError fallback treating missing
pyopenms._arrow_zerocopy as a normal path should be updated to reflect that
Arrow/Parquet are required and the import failure indicates a broken/incomplete
installation; in the except ImportError block (symbols:
pyopenms._arrow_zerocopy, spectra_to_arrow, chromatograms_to_arrow,
_use_zerocopy) set _use_zerocopy = False but change the warnings.warn message to
state this is unexpected (e.g., "pyopenms._arrow_zerocopy not found —
Arrow/Parquet are required; this indicates a broken or incomplete installation,
falling back to slower Python export") and keep stacklevel=2 so diagnostics
point to user code.

In `@src/tests/class_tests/openms/CMakeLists.txt`:
- Around line 81-116: The target_link_libraries calls for Arrow_test,
MSExperimentArrowExport_test, ConsensusMapArrowExport_test, QPXFile_test,
ProteinIdentificationArrowIO_test, FeatureMapArrowIO_test,
ConsensusMapArrowIO_test, Libzip_test, and ZipRandomAccessFile_test have
unindented library arguments; update each of these target_link_libraries
invocations so that the library names (${OPENMS_ARROW_TARGET} and
${OPENMS_PARQUET_TARGET}) are indented on the following line (matching the style
used in the guarded blocks) to ensure consistent CMake indentation and
readability.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0e453a69-4aea-4b5f-96cf-e0b91acbea00

📥 Commits

Reviewing files that changed from the base of the PR and between 1ac2733 and 13a2405.

📒 Files selected for processing (86)
  • .github/workflows/openms_ci_matrix_full.yml
  • .github/workflows/pyopenms-wheels-cibuildwheel.yml
  • AGENTS.md
  • CMakeLists.txt
  • cmake/OpenMSConfig.cmake.in
  • cmake/cmake_findExternalLibs.cmake
  • cmake/package_general.cmake
  • doc/CMakeLists.txt
  • doc/doxygen/public/TOPP.doxygen
  • src/openms/CMakeLists.txt
  • src/openms/include/OpenMS/FORMAT/ArrowSchemaRegistry.h
  • src/openms/include/OpenMS/FORMAT/ConsensusMapArrowExport.h
  • src/openms/include/OpenMS/FORMAT/ConsensusMapArrowIO.h
  • src/openms/include/OpenMS/FORMAT/FeatureMapArrowIO.h
  • src/openms/include/OpenMS/FORMAT/MSExperimentArrowExport.h
  • src/openms/include/OpenMS/FORMAT/ParquetFile.h
  • src/openms/include/OpenMS/FORMAT/ProteinGroupArrowExport.h
  • src/openms/include/OpenMS/FORMAT/ProteinIdentificationArrowIO.h
  • src/openms/include/OpenMS/FORMAT/QPXFile.h
  • src/openms/include/OpenMS/FORMAT/ZipRandomAccessFile.h
  • src/openms/include/OpenMS/FORMAT/sources.cmake
  • src/openms/source/ANALYSIS/OPENSWATH/OpenSwathOSWParquetReader.cpp
  • src/openms/source/ANALYSIS/OPENSWATH/OpenSwathOSWParquetWriter.cpp
  • src/openms/source/ANALYSIS/OPENSWATH/TransitionParquetFile.cpp
  • src/openms/source/APPLICATIONS/OpenSwathBase.cpp
  • src/openms/source/APPLICATIONS/ToolHandler.cpp
  • src/openms/source/FORMAT/ArrowSchemaRegistry.cpp
  • src/openms/source/FORMAT/ConsensusMapArrowExport.cpp
  • src/openms/source/FORMAT/ConsensusMapArrowIO.cpp
  • src/openms/source/FORMAT/DATAACCESS/MSChromatogramParquetConsumer.cpp
  • src/openms/source/FORMAT/DATAACCESS/MobilogramParquetConsumer.cpp
  • src/openms/source/FORMAT/DATAACCESS/sources.cmake
  • src/openms/source/FORMAT/FeatureMapArrowIO.cpp
  • src/openms/source/FORMAT/MSExperimentArrowExport.cpp
  • src/openms/source/FORMAT/ParquetFile.cpp
  • src/openms/source/FORMAT/ProteinGroupArrowExport.cpp
  • src/openms/source/FORMAT/ProteinIdentificationArrowIO.cpp
  • src/openms/source/FORMAT/QPXFile.cpp
  • src/openms/source/FORMAT/XICParquetFile.cpp
  • src/openms/source/FORMAT/XIMParquetFile.cpp
  • src/openms/source/FORMAT/ZipRandomAccessFile.cpp
  • src/openms/source/FORMAT/sources.cmake
  • src/pyOpenMS/CLAUDE.md
  • src/pyOpenMS/CMakeLists.txt
  • src/pyOpenMS/README.md
  • src/pyOpenMS/bindings/arrow_zerocopy.cpp
  • src/pyOpenMS/bindings/bind_format.cpp
  • src/pyOpenMS/pxds/ConsensusMapArrowIO.pxd
  • src/pyOpenMS/pxds/FeatureMapArrowIO.pxd
  • src/pyOpenMS/pxds/ProteinIdentificationArrowIO.pxd
  • src/pyOpenMS/pxds/QPXFile.pxd
  • src/pyOpenMS/pyopenms/__init__.py
  • src/pyOpenMS/pyopenms/addons/msexperiment.py
  • src/pyOpenMS/tests/benchmark_pyopenms.py
  • src/pyOpenMS/tests/unittests/test_XICParquetFile.py
  • src/pyOpenMS/tests/unittests/test_XIMParquetFile.py
  • src/pyOpenMS/tests/unittests/test_arrow_io_classes.py
  • src/pyOpenMS/tests/unittests/test_arrow_zerocopy.py
  • src/tests/class_tests/openms/CMakeLists.txt
  • src/tests/class_tests/openms/executables.cmake
  • src/tests/class_tests/openms/source/ArrowSchemaRegistry_test.cpp
  • src/tests/class_tests/openms/source/ConsensusMapArrowExport_test.cpp
  • src/tests/class_tests/openms/source/ConsensusMapArrowIO_test.cpp
  • src/tests/class_tests/openms/source/FeatureMapArrowIO_test.cpp
  • src/tests/class_tests/openms/source/Libzip_test.cpp
  • src/tests/class_tests/openms/source/MSChromatogramParquetConsumer_test.cpp
  • src/tests/class_tests/openms/source/MobilogramParquetConsumer_test.cpp
  • src/tests/class_tests/openms/source/OpenSwathOSWParquetReader_test.cpp
  • src/tests/class_tests/openms/source/OpenSwathOSWParquetRoundTrip_test.cpp
  • src/tests/class_tests/openms/source/ProteinIdentificationArrowIO_test.cpp
  • src/tests/class_tests/openms/source/TransitionParquetFile_test.cpp
  • src/tests/class_tests/openms/source/XICParquetFile_test.cpp
  • src/tests/class_tests/openms/source/XIMParquetFile_test.cpp
  • src/tests/topp/CMakeLists.txt
  • src/topp/IsobaricWorkflow.cpp
  • src/topp/OpenSwathAssayGenerator.cpp
  • src/topp/OpenSwathDecoyGenerator.cpp
  • src/topp/OpenSwathWorkflow.cpp
  • src/topp/ParquetConverter.cpp
  • src/topp/ProteinQuantifier.cpp
  • src/topp/ProteomicsLFQ.cpp
  • src/topp/QPXConverter.cpp
  • src/topp/TargetedFileConverter.cpp
  • src/topp/TextExporter.cpp
  • src/topp/executables.cmake
  • tools/ci/capture-env.sh
💤 Files with no reviewable changes (58)
  • src/topp/ParquetConverter.cpp
  • src/openms/source/FORMAT/FeatureMapArrowIO.cpp
  • src/openms/include/OpenMS/FORMAT/QPXFile.h
  • src/openms/source/FORMAT/ProteinGroupArrowExport.cpp
  • src/topp/QPXConverter.cpp
  • src/tests/class_tests/openms/source/ConsensusMapArrowIO_test.cpp
  • doc/CMakeLists.txt
  • src/openms/include/OpenMS/FORMAT/FeatureMapArrowIO.h
  • src/openms/source/APPLICATIONS/ToolHandler.cpp
  • src/openms/include/OpenMS/FORMAT/MSExperimentArrowExport.h
  • cmake/OpenMSConfig.cmake.in
  • src/tests/class_tests/openms/source/XICParquetFile_test.cpp
  • src/openms/include/OpenMS/FORMAT/ConsensusMapArrowExport.h
  • doc/doxygen/public/TOPP.doxygen
  • src/tests/class_tests/openms/source/ConsensusMapArrowExport_test.cpp
  • src/tests/class_tests/openms/source/TransitionParquetFile_test.cpp
  • src/tests/class_tests/openms/source/MobilogramParquetConsumer_test.cpp
  • src/openms/include/OpenMS/FORMAT/ProteinIdentificationArrowIO.h
  • src/tests/class_tests/openms/source/OpenSwathOSWParquetRoundTrip_test.cpp
  • tools/ci/capture-env.sh
  • src/openms/source/ANALYSIS/OPENSWATH/TransitionParquetFile.cpp
  • src/openms/source/FORMAT/ConsensusMapArrowExport.cpp
  • src/openms/source/FORMAT/MSExperimentArrowExport.cpp
  • src/openms/source/FORMAT/ZipRandomAccessFile.cpp
  • src/openms/source/FORMAT/ConsensusMapArrowIO.cpp
  • src/openms/include/OpenMS/FORMAT/ConsensusMapArrowIO.h
  • src/tests/class_tests/openms/source/OpenSwathOSWParquetReader_test.cpp
  • src/openms/source/FORMAT/ParquetFile.cpp
  • src/tests/class_tests/openms/source/ArrowSchemaRegistry_test.cpp
  • src/tests/class_tests/openms/source/FeatureMapArrowIO_test.cpp
  • src/openms/source/FORMAT/ProteinIdentificationArrowIO.cpp
  • src/openms/include/OpenMS/FORMAT/ArrowSchemaRegistry.h
  • src/topp/OpenSwathAssayGenerator.cpp
  • CMakeLists.txt
  • src/tests/class_tests/openms/source/Libzip_test.cpp
  • src/openms/source/FORMAT/ArrowSchemaRegistry.cpp
  • src/tests/class_tests/openms/source/MSChromatogramParquetConsumer_test.cpp
  • src/openms/include/OpenMS/FORMAT/ProteinGroupArrowExport.h
  • src/openms/source/FORMAT/QPXFile.cpp
  • src/topp/ProteomicsLFQ.cpp
  • src/tests/class_tests/openms/source/XIMParquetFile_test.cpp
  • src/topp/OpenSwathDecoyGenerator.cpp
  • src/openms/source/ANALYSIS/OPENSWATH/OpenSwathOSWParquetWriter.cpp
  • src/openms/include/OpenMS/FORMAT/ZipRandomAccessFile.h
  • src/openms/source/FORMAT/XIMParquetFile.cpp
  • src/topp/ProteinQuantifier.cpp
  • src/topp/IsobaricWorkflow.cpp
  • src/openms/source/APPLICATIONS/OpenSwathBase.cpp
  • src/openms/source/FORMAT/DATAACCESS/MSChromatogramParquetConsumer.cpp
  • .github/workflows/openms_ci_matrix_full.yml
  • src/topp/TargetedFileConverter.cpp
  • src/tests/class_tests/openms/source/ProteinIdentificationArrowIO_test.cpp
  • src/openms/include/OpenMS/FORMAT/ParquetFile.h
  • src/openms/source/FORMAT/XICParquetFile.cpp
  • src/openms/source/FORMAT/DATAACCESS/MobilogramParquetConsumer.cpp
  • src/openms/source/ANALYSIS/OPENSWATH/OpenSwathOSWParquetReader.cpp
  • src/topp/TextExporter.cpp
  • src/pyOpenMS/bindings/bind_format.cpp

timosachsenberg and others added 2 commits March 25, 2026 09:09
Doxygen 1.9.8 fails to register the @page when it is indented inside
a /** */ block and followed by raw HTML at column 0 (<CENTER>). Align
with the pattern used by all other TOPP tools.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Make standalone pyOpenMS Arrow lookup mandatory and version-constrained (Arrow 23 CONFIG REQUIRED), matching core OpenMS
- Respect ARROW_USE_STATIC preference in standalone Arrow target selection
- Remove try/except skip in test_arrow_zerocopy.py — ImportError should fail, not skip
- Move OpenSwathOSWParquetRoundTrip_test into NOT DISABLE_OPENSWATH guard

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/pyOpenMS/CMakeLists.txt (1)

320-339: ⚠️ Potential issue | 🟡 Minor

Keep pyopenms_stubs in sync with the new required module.

Line 339 adds _arrow_zerocopy to the normal build, but pyopenms_stubs still depends on a separate list that excludes it. Building pyopenms_stubs directly can therefore run before _arrow_zerocopy exists, leaving the new module out of generated stubs and potentially breaking stub generation once it is imported.

🧩 Proposed fix
   if(PYOPENMS_GENERATE_STUBS)
-    set(_STUB_DEPS _pyopenms)
-    foreach(domain ${PYOPENMS_DOMAINS})
-      list(APPEND _STUB_DEPS "_pyopenms_${domain}")
-    endforeach()
-
     add_custom_target(pyopenms_stubs
       COMMAND ${Python_EXECUTABLE} -m nanobind.stubgen
               -m pyopenms
@@
-      DEPENDS ${_STUB_DEPS}
+      DEPENDS pyopenms_compile
       COMMENT "Generating .pyi stub files for pyopenms"
     )
   endif()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/pyOpenMS/CMakeLists.txt` around lines 320 - 339, The pyopenms stubs
target isn't updated to depend on the new _arrow_zerocopy module, so building
pyopenms_stubs can run before _arrow_zerocopy exists; update the pyopenms_stubs
dependency list to include the new target (referencing _arrow_zerocopy and the
existing PYOPENMS_DOMAINS/_pyopenms_* targets) or add
add_dependencies(pyopenms_stubs _arrow_zerocopy) where pyopenms_compile and
other stub dependencies are declared so stub generation waits for the new
module.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/pyOpenMS/CMakeLists.txt`:
- Around line 320-339: The pyopenms stubs target isn't updated to depend on the
new _arrow_zerocopy module, so building pyopenms_stubs can run before
_arrow_zerocopy exists; update the pyopenms_stubs dependency list to include the
new target (referencing _arrow_zerocopy and the existing
PYOPENMS_DOMAINS/_pyopenms_* targets) or add add_dependencies(pyopenms_stubs
_arrow_zerocopy) where pyopenms_compile and other stub dependencies are declared
so stub generation waits for the new module.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8c083734-9afb-4a58-ad01-e0197fa923d4

📥 Commits

Reviewing files that changed from the base of the PR and between c9f485e and dc7aaaf.

📒 Files selected for processing (3)
  • src/pyOpenMS/CMakeLists.txt
  • src/pyOpenMS/tests/unittests/test_arrow_zerocopy.py
  • src/tests/class_tests/openms/executables.cmake

- Fix multi-run out_chrom path prefixing to preserve parent directory
  (mirror the File::path/File::basename split used for mobilograms)
- Extract existing .oswpq archive before appending new runs so prior
  data is preserved when -append_oswpq is set
- Replace pytest.skip with assert in XIC/XIMParquetFile tests so
  missing bindings fail loudly instead of silently skipping
- Update _arrow_zerocopy ImportError warning to indicate broken install
- Use pyopenms_compile as stubs dependency (includes _arrow_zerocopy)
- Fix CMake target_link_libraries indentation for Arrow/Parquet tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/topp/OpenSwathWorkflow.cpp (1)

1276-1283: Path preservation fix looks correct, but consider applying consistently to other debug outputs.

The fix properly preserves the parent directory for multi-run chromatogram output paths.

However, similar multi-run path handling at lines 1147-1149 (irt_trafo_out) and 1156-1158 (irt_mzml_out) still uses the old pattern that doesn't preserve parent directories:

String base_name = irt_trafo_out.substr(0, irt_trafo_out.find_last_of('.'));
String extension = irt_trafo_out.substr(irt_trafo_out.find_last_of('.'));
irt_trafo_out = file_basename + "_" + base_name + extension;

If a user specifies -Debugging:irt_trafo /output/dir/trafo.trafoXML, this would produce run1_/output/dir/trafo.trafoXML instead of /output/dir/run1_trafo.trafoXML.

♻️ Suggested fix for consistency (lines 1147-1149)
     if (!irt_trafo_out.empty() && run_groups.size() > 1)
     {
-      // For multi-run, use basename prefix to make unique filenames
-      String base_name = irt_trafo_out.substr(0, irt_trafo_out.find_last_of('.'));
-      String extension = irt_trafo_out.substr(irt_trafo_out.find_last_of('.'));
-      irt_trafo_out = file_basename + "_" + base_name + extension;
+      // Preserve parent directory when creating per-run filenames.
+      String parent = File::path(irt_trafo_out);
+      String filename = File::basename(irt_trafo_out);
+      String stem = filename.substr(0, filename.find_last_of('.'));
+      String extension = filename.substr(filename.find_last_of('.'));
+      String fname_with_prefix = file_basename + "_" + stem + extension;
+      irt_trafo_out = (parent == "." ? fname_with_prefix : parent + "/" + fname_with_prefix);
     }

Similar change would apply to irt_mzml_out at lines 1156-1158.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/topp/OpenSwathWorkflow.cpp` around lines 1276 - 1283, The irt_trafo_out
and irt_mzml_out multi-run filename handling still prepends file_basename to the
full path (producing run1_/output/dir/trafo.trafoXML) instead of preserving the
parent directory; update the logic for irt_trafo_out and irt_mzml_out to mirror
the fix used for out_chrom: use
File::path(irt_trafo_out)/File::basename(irt_trafo_out) (and similarly for
irt_mzml_out) to extract parent, stem and extension, then build
fname_with_prefix = file_basename + "_" + stem + extension and set the final
path to either fname_with_prefix or parent + "/" + fname_with_prefix based on
parent == ".".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/topp/OpenSwathWorkflow.cpp`:
- Around line 1276-1283: The irt_trafo_out and irt_mzml_out multi-run filename
handling still prepends file_basename to the full path (producing
run1_/output/dir/trafo.trafoXML) instead of preserving the parent directory;
update the logic for irt_trafo_out and irt_mzml_out to mirror the fix used for
out_chrom: use File::path(irt_trafo_out)/File::basename(irt_trafo_out) (and
similarly for irt_mzml_out) to extract parent, stem and extension, then build
fname_with_prefix = file_basename + "_" + stem + extension and set the final
path to either fname_with_prefix or parent + "/" + fname_with_prefix based on
parent == ".".

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 161beca0-dc7d-470d-99d9-0ff08c514ae4

📥 Commits

Reviewing files that changed from the base of the PR and between dc7aaaf and 887d415.

📒 Files selected for processing (6)
  • src/pyOpenMS/CMakeLists.txt
  • src/pyOpenMS/pyopenms/addons/msexperiment.py
  • src/pyOpenMS/tests/unittests/test_XICParquetFile.py
  • src/pyOpenMS/tests/unittests/test_XIMParquetFile.py
  • src/tests/class_tests/openms/CMakeLists.txt
  • src/topp/OpenSwathWorkflow.cpp
✅ Files skipped from review due to trivial changes (1)
  • src/pyOpenMS/pyopenms/addons/msexperiment.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/pyOpenMS/tests/unittests/test_XICParquetFile.py
  • src/pyOpenMS/CMakeLists.txt

@timosachsenberg timosachsenberg merged commit 573199f into develop Mar 25, 2026
19 of 20 checks passed
@timosachsenberg timosachsenberg deleted the feature/make-arrow-parquet-required branch March 25, 2026 11:19
github-actions bot pushed a commit that referenced this pull request Mar 26, 2026
Add missing entries for:
- #8991: Arrow/Parquet made required dependency; WITH_PARQUET CMake option removed
- #8993: BREAKING IMFormat enum rename (CONCATENATED→IM_PEAK, MULTIPLE_SPECTRA→IM_SPECTRUM)
- #8997: Fix TOPPView performance regression when right-clicking in large mzML
- #8999: BrukerTimsFile tiered scan→1/K0 calibration with RationalScan2ImConverter

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
timosachsenberg pushed a commit that referenced this pull request Mar 26, 2026
Add missing entries for:
- #8991: Arrow/Parquet made required dependency; WITH_PARQUET CMake option removed
- #8993: BREAKING IMFormat enum rename (CONCATENATED→IM_PEAK, MULTIPLE_SPECTRA→IM_SPECTRUM)
- #8997: Fix TOPPView performance regression when right-clicking in large mzML
- #8999: BrukerTimsFile tiered scan→1/K0 calibration with RationalScan2ImConverter

Co-authored-by: GitHub Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant