Skip to content

Add OME-Arrow demonstration notebook#210

Merged
d33bs merged 5 commits intocytomining:mainfrom
d33bs:ome-arrow-demo
Apr 16, 2026
Merged

Add OME-Arrow demonstration notebook#210
d33bs merged 5 commits intocytomining:mainfrom
d33bs:ome-arrow-demo

Conversation

@d33bs
Copy link
Copy Markdown
Member

@d33bs d33bs commented Apr 15, 2026

Description

This PR adds a brief demonstration notebook which shows how OME-Arrow interacts with CytoDataFrame.

What kind of change(s) are included?

  • Documentation (changes docs or other related content)
  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

  • I have read and followed the CONTRIBUTING.md guidelines.
  • I have searched for existing content to ensure this is not a duplicate.
  • I have performed a self-review of these additions (including spelling, grammar, and related).
  • These changes pass all pre-commit checks.
  • I have added comments to my code to help provide understanding
  • I have added a test which covers the code changes found within this PR
  • I have deleted all non-relevant text in this pull request template.

Summary by CodeRabbit

  • Documentation

    • Added a new example demonstrating how to use OME-Arrow and CytoDataFrame together, including image cropping and data export to Parquet format.
  • Chores

    • Updated development tool versions and configurations.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 15, 2026

📝 Walkthrough

Walkthrough

The PR adds a new documentation example demonstrating OME-Arrow and CytoDataFrame interoperability, updates the examples index, and upgrades pre-commit hook versions alongside configuration adjustments to support the new example with broader Ruff ignore patterns.

Changes

Cohort / File(s) Summary
Pre-commit Hook Updates
.pre-commit-config.yaml
Bumped pyproject-fmt from v2.21.0 to v2.21.1 and ruff-pre-commit from v0.15.8 to v0.15.10.
Documentation Example
docs/src/examples.md, docs/src/examples/omearrow_and_cytodataframe.py
Added new example page showcasing OME-Arrow and CytoDataFrame interoperability: loading OME-TIFF, cropping images, creating PyArrow tables, and persisting to Parquet. Linked in examples index.
Tooling Configuration
pyproject.toml
Expanded Ruff per-file ignores to apply E501 and E402 rules across all docs/src/examples/\\*.py files. Reorganized tool configuration sections without changing values.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

A rabbit hopped through code with glee,
New examples for all eyes to see,
OME-Arrow danced with CytoDataFrame,
Version bumps aligned their aim,
Configuration tweaks complete the game! 🐰✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add OME-Arrow demonstration notebook' directly and clearly describes the main change: adding a new example notebook demonstrating OME-Arrow functionality with CytoDataFrame.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@d33bs d33bs marked this pull request as ready for review April 15, 2026 20:28
@d33bs d33bs requested a review from jenna-tomkinson as a code owner April 15, 2026 20:28
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
docs/src/examples/omearrow_and_cytodataframe.py (2)

53-58: Avoid fixed output filenames in docs execution.

Writing example.ome.parquet to the current directory can leave artifacts and conflict across repeated/parallel runs. Prefer a temporary directory.

♻️ Suggested temporary output usage
+from pathlib import Path
+from tempfile import TemporaryDirectory
+
 # write a parquet table
-pq.write_table(table, "example.ome.parquet")
-
-# read the parquet table with cytodataframe
-# (showing the OME-Arrow image that was written)
-CytoDataFrame("example.ome.parquet")
+with TemporaryDirectory() as tmp_dir:
+    parquet_path = Path(tmp_dir) / "example.ome.parquet"
+    pq.write_table(table, parquet_path)
+
+    # read the parquet table with cytodataframe
+    # (showing the OME-Arrow image that was written)
+    CytoDataFrame(str(parquet_path))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/examples/omearrow_and_cytodataframe.py` around lines 53 - 58, The
snippet writes a fixed filename via pq.write_table("example.ome.parquet") which
can leave artifacts and clash across runs; change it to write into a temporary
directory or temp file and pass that path to CytoDataFrame instead. Use Python's
tempfile (e.g., TemporaryDirectory or NamedTemporaryFile) to create a unique
path, join the temp path with the parquet filename, call pq.write_table(table,
temp_path) and then construct CytoDataFrame(temp_path); ensure the temp resource
remains available for the read operation.

26-29: Make the sample-image path execution-context safe.

The current relative path only works from specific working directories. A small fallback lookup makes the example much more robust.

♻️ Suggested path handling
+from pathlib import Path
+
 # load a tiff using OME Arrow
-oa_img = OMEArrow(
-    "../../../tests/data/cytotable/JUMP_plate_BR00117006/images/orig/r01c01f01p01-ch2sk1fk1fl1.tiff"
-)
+candidate_paths = [
+    Path("../../../tests/data/cytotable/JUMP_plate_BR00117006/images/orig/r01c01f01p01-ch2sk1fk1fl1.tiff"),
+    Path("tests/data/cytotable/JUMP_plate_BR00117006/images/orig/r01c01f01p01-ch2sk1fk1fl1.tiff"),
+]
+tiff_path = next((p for p in candidate_paths if p.exists()), None)
+if tiff_path is None:
+    raise FileNotFoundError("Could not locate demo TIFF file in expected repo paths.")
+oa_img = OMEArrow(str(tiff_path))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/examples/omearrow_and_cytodataframe.py` around lines 26 - 29, The
example uses a hard-coded relative TIFF path which fails outside specific
working directories; update the OMEArrow instantiation to resolve the sample
image relative to the example file location (use __file__ or
Path(__file__).resolve().parent) and fall back to a small search (e.g., glob for
the filename pattern) if the file is not found, then pass that resolved path
into OMEArrow so OMEArrow(...) always receives an absolute,
execution-context-safe path.
pyproject.toml (1)

113-113: Scope E501 ignore more narrowly.

Applying E501 to all docs/src/examples/*.py may mask accidental long lines in future examples. Consider keeping the glob for E402 and restricting E501 to files that specifically need it.

♻️ Suggested config adjustment
-lint.per-file-ignores."docs/src/examples/*.py" = [ "E402", "E501" ]
+lint.per-file-ignores."docs/src/examples/*.py" = [ "E402" ]
+lint.per-file-ignores."docs/src/examples/omearrow_and_cytodataframe.py" = [ "E501" ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` at line 113, The per-file ignore rule
lint.per-file-ignores."docs/src/examples/*.py" currently silences both E402 and
E501 for every example file; narrow E501 so long-line checks still run by
keeping the glob entry for E402 only and moving E501 to specific files or
tighter globs (e.g., add separate lint.per-file-ignores entries for the
individual example filenames that truly need E501 or a more specific pattern),
updating the lint.per-file-ignores mapping accordingly so only E402 is applied
to "docs/src/examples/*.py" and E501 is applied only where necessary.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/src/examples/omearrow_and_cytodataframe.py`:
- Around line 53-58: The snippet writes a fixed filename via
pq.write_table("example.ome.parquet") which can leave artifacts and clash across
runs; change it to write into a temporary directory or temp file and pass that
path to CytoDataFrame instead. Use Python's tempfile (e.g., TemporaryDirectory
or NamedTemporaryFile) to create a unique path, join the temp path with the
parquet filename, call pq.write_table(table, temp_path) and then construct
CytoDataFrame(temp_path); ensure the temp resource remains available for the
read operation.
- Around line 26-29: The example uses a hard-coded relative TIFF path which
fails outside specific working directories; update the OMEArrow instantiation to
resolve the sample image relative to the example file location (use __file__ or
Path(__file__).resolve().parent) and fall back to a small search (e.g., glob for
the filename pattern) if the file is not found, then pass that resolved path
into OMEArrow so OMEArrow(...) always receives an absolute,
execution-context-safe path.

In `@pyproject.toml`:
- Line 113: The per-file ignore rule
lint.per-file-ignores."docs/src/examples/*.py" currently silences both E402 and
E501 for every example file; narrow E501 so long-line checks still run by
keeping the glob entry for E402 only and moving E501 to specific files or
tighter globs (e.g., add separate lint.per-file-ignores entries for the
individual example filenames that truly need E501 or a more specific pattern),
updating the lint.per-file-ignores mapping accordingly so only E402 is applied
to "docs/src/examples/*.py" and E501 is applied only where necessary.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b07eb67b-0dc8-4ddc-a2ba-2b4e7c05e60c

📥 Commits

Reviewing files that changed from the base of the PR and between 00c5943 and 0e84970.

📒 Files selected for processing (5)
  • .pre-commit-config.yaml
  • docs/src/examples.md
  • docs/src/examples/omearrow_and_cytodataframe.ipynb
  • docs/src/examples/omearrow_and_cytodataframe.py
  • pyproject.toml

Copy link
Copy Markdown
Member

@gwaybio gwaybio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be beyond scope of this PR, but I also wonder if the benchmark notebook belongs somewhere more prominently? (i can be convinced either way - it may actually make sense to only highlight those results in a future paper and therefore bundle that analysis notebook in the same location of other paper

@d33bs
Copy link
Copy Markdown
Member Author

d33bs commented Apr 16, 2026

Thanks @gwaybio !

I also wonder if the benchmark notebook belongs somewhere more prominently?

The benchmarks for OME-Arrow currently reside in another repo: https://github.com/WayScience/ome-arrow-benchmarks , where it felt healthy to focus on only benchmarking work (because it's so different from the package-specific focus). Within OME-Arrow we do include some benchmarking in the form of a smoke-test to check that we don't regress on performance as changes proceed.

Regarding prominence / ease-of-understanding, I do think a paper would eventually help here. The modular / component-based repo design (as opposed to monorepo) has some flaws when it comes to how easy it is to find things. I feel it ends up making the work more sustainable over time though - because we don't know which parts will survive and in what ways they might need to change.

@d33bs d33bs merged commit 91cad90 into cytomining:main Apr 16, 2026
9 checks passed
@d33bs d33bs deleted the ome-arrow-demo branch April 16, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants