# OME-Zarr Metadata Enrichment — Starter Notebook

This notebook walks through the OME-Zarr outputs from the brieflow pipeline test data.
It demonstrates how to inspect, enrich, and validate zarr store metadata so the undergrad
can implement each enrichment step for real.

**Prerequisites:** Run `bash run_brieflow_omezarr.sh` from the `small_test_analysis/` directory first.

In [1]:
import json
import sys
from pathlib import Path

import numpy as np
import pandas as pd
import zarr

# Add brieflow workflow to path so we can import its libraries
sys.path.insert(0, str(Path("../../../workflow").resolve()))
from lib.shared.io import read_image, save_image

# Root paths
ZARR_ROOT = Path("../brieflow_output_zarr")
assert ZARR_ROOT.exists(), f"Run the zarr pipeline first: {ZARR_ROOT}"

## 1. Explore the output directory structure

The pipeline writes zarr stores directly into HCS-compliant plate zarr directories.
Each module has a `{plate}.zarr/` store with the nested hierarchy `{row}/{col}/{tile}/`,
plus supporting tabular data in `parquets/`, `tsvs/`, and `eval/`:

In [2]:
for module in ["preprocess", "sbs", "phenotype"]:
    module_dir = ZARR_ROOT / module
    if not module_dir.exists():
        continue
    subdirs = sorted(d.name for d in module_dir.iterdir() if d.is_dir())
    print(f"{module}/  →  {subdirs}")

preprocess/  →  ['ic_fields', 'metadata', 'phenotype', 'sbs']
sbs/  →  ['1.zarr', 'eval', 'parquets', 'tsvs']
phenotype/  →  ['1.zarr', 'eval', 'parquets', 'tsvs']


## 2. Read pipeline metadata (parquet)

The preprocess module extracts hardware metadata from the raw images.
This is where pixel sizes, channel info, and other acquisition parameters live.

In [3]:
# Read phenotype metadata
ph_meta = pd.read_parquet(
    ZARR_ROOT / "preprocess/metadata/phenotype/1/A/1/combined_metadata.parquet"
)
print(f"Shape: {ph_meta.shape}")
print(f"Columns: {list(ph_meta.columns)}")
ph_meta

Shape: (3, 15)
Columns: ['x_pos', 'y_pos', 'z_pos', 'pfs_offset', 'plate', 'well', 'tile', 'filename', 'channels', 'pixel_size_x', 'pixel_size_y', 'pixel_size_z', 'objective_magnification', 'zoom_magnification', 'binning_xy']


Unnamed: 0,x_pos,y_pos,z_pos,pfs_offset,plate,well,tile,filename,channels,pixel_size_x,pixel_size_y,pixel_size_z,objective_magnification,zoom_magnification,binning_xy
0,41197.1,-36826.3,4034.02,,1,A1,2,small_test_data/phenotype/empty_images/P001_Ph...,4,1.625,1.625,1.5,4,1,
1,38958.4,-36815.7,3165.58,,1,A1,5,small_test_data/phenotype/real_images/P001_Phe...,4,0.325,0.325,1.5,20,1,
2,41958.0,-32393.4,3153.32,,1,A1,141,small_test_data/phenotype/real_images/P001_Phe...,4,0.325,0.325,1.5,20,1,


In [4]:
# Key fields for metadata enrichment
print("Pixel sizes (from hardware):")
print(f"  X: {ph_meta['pixel_size_x'].iloc[0]} µm")
print(f"  Y: {ph_meta['pixel_size_y'].iloc[0]} µm")
print(f"  Z: {ph_meta['pixel_size_z'].iloc[0]} µm")
print(f"Objective: {ph_meta['objective_magnification'].iloc[0]}x")
print(f"Channels: {ph_meta['channels'].iloc[0]}")

Pixel sizes (from hardware):
  X: 1.625 µm
  Y: 1.625 µm
  Z: 1.5 µm
Objective: 4x
Channels: 4


## 3. Inspect a per-tile zarr store

Each tile is written directly into the HCS plate zarr at `{plate}.zarr/{row}/{col}/{tile}/`.
Image stores within each tile (e.g., `aligned.zarr`, `illumination_corrected.zarr`) are
independent OME-Zarr stores with pyramid levels.

In [5]:
# Pick a phenotype tile — direct-write path: {module}/{plate}.zarr/{row}/{col}/{tile}/
tile_store = ZARR_ROOT / "preprocess/phenotype/1.zarr/A/1/2/image.zarr"

# Read zarr.json (OME-NGFF v0.5 uses zarr v3 format)
with open(tile_store / "zarr.json") as f:
    tile_meta = json.load(f)

print("=== zarr.json ===")
print(json.dumps(tile_meta, indent=2))

=== zarr.json ===
{
  "attributes": {
    "ome": {
      "version": "0.5",
      "multiscales": [
        {
          "datasets": [
            {
              "path": "0",
              "coordinateTransformations": [
                {
                  "scale": [
                    1.0,
                    1.0,
                    1.0
                  ],
                  "type": "scale"
                }
              ]
            },
            {
              "path": "1",
              "coordinateTransformations": [
                {
                  "scale": [
                    1.0,
                    1.0,
                    1.0
                  ],
                  "type": "scale"
                }
              ]
            },
            {
              "path": "2",
              "coordinateTransformations": [
                {
                  "scale": [
                    1.0,
                    1.0,
                    1.0
                  ],
                  

In [6]:
# Inspect the OME multiscale metadata
ome = tile_meta["attributes"]["ome"]
ms = ome["multiscales"][0]

print("Axes:")
for ax in ms["axes"]:
    print(f"  {ax['name']} ({ax['type']})  ←  MISSING: 'unit' key")

print("\nPyramid levels:")
for ds in ms["datasets"]:
    scales = ds["coordinateTransformations"][0]["scale"]
    print(f"  level {ds['path']}: scale={scales}  ←  MISSING: real pixel sizes")

print("\nOmero channels:")
omero = tile_meta["attributes"].get("omero", {})
for ch in omero.get("channels", []):
    print(f"  {ch['label']}  ←  MISSING: real name, contrast window")

Axes:
  c (channel)  ←  MISSING: 'unit' key
  y (space)  ←  MISSING: 'unit' key
  x (space)  ←  MISSING: 'unit' key

Pyramid levels:
  level 0: scale=[1.0, 1.0, 1.0]  ←  MISSING: real pixel sizes
  level 1: scale=[1.0, 1.0, 1.0]  ←  MISSING: real pixel sizes
  level 2: scale=[1.0, 1.0, 1.0]  ←  MISSING: real pixel sizes
  level 3: scale=[1.0, 1.0, 1.0]  ←  MISSING: real pixel sizes
  level 4: scale=[1.0, 1.0, 1.0]  ←  MISSING: real pixel sizes

Omero channels:
  c0  ←  MISSING: real name, contrast window
  c1  ←  MISSING: real name, contrast window
  c2  ←  MISSING: real name, contrast window
  c3  ←  MISSING: real name, contrast window


In [7]:
# Read image data at different pyramid levels
root = zarr.open_group(str(tile_store), mode="r")
for level in ["0", "1", "2", "3", "4"]:
    if level in root:
        arr = root[level]
        print(
            f"Level {level}: shape={arr.shape}, dtype={arr.dtype}, chunks={arr.chunks}"
        )

Level 0: shape=(4, 2400, 2400), dtype=uint16, chunks=(4, 1024, 1024)
Level 1: shape=(4, 1200, 1200), dtype=uint16, chunks=(4, 1024, 1024)
Level 2: shape=(4, 600, 600), dtype=uint16, chunks=(4, 600, 600)
Level 3: shape=(4, 300, 300), dtype=uint16, chunks=(4, 300, 300)
Level 4: shape=(4, 150, 150), dtype=uint16, chunks=(4, 150, 150)


In [8]:
# Compare: using the high-level read_image() API
img = read_image(tile_store)
print(f"read_image() shape: {img.shape}, dtype: {img.dtype}")
print(
    f"  (singleton channel dim was squeezed: original zarr shape was {root['0'].shape})"
)

read_image() shape: (4, 2400, 2400), dtype: uint16
  (singleton channel dim was squeezed: original zarr shape was (4, 2400, 2400))


## 4. Inspect a label store

Segmentation outputs (nuclei, cells) live under a `labels/` group within each tile:
`{plate}.zarr/{row}/{col}/{tile}/labels/nuclei.zarr`

In [9]:
# Labels now live inside the plate zarr: {plate}.zarr/{row}/{col}/{tile}/labels/
label_store = ZARR_ROOT / "sbs/1.zarr/A/1/0/labels/nuclei.zarr"
with open(label_store / "zarr.json") as f:
    label_meta = json.load(f)

is_label = "image-label" in label_meta.get("attributes", {})
print(f"Is label store: {is_label}")
print(f"Label dtype: {zarr.open_group(str(label_store), mode='r')['0'].dtype}")
print(f"  <- Should be int32 for segmentation masks")

Is label store: True
Label dtype: int64
  <- Should be int32 for segmentation masks


## 5. Inspect the HCS plate structure

With direct-write, there is **no separate `hcs/` directory**. The plate zarr IS the
output directory. The finalize step writes metadata-only `zarr.json` files at plate,
row, and well levels to make the hierarchy navigable.

**Structure:**
```
{module}/{plate}.zarr/
├── zarr.json                    <- plate metadata (rows, columns, wells)
└── {row}/
    ├── zarr.json                <- row group
    └── {col}/
        ├── zarr.json            <- well metadata (lists fields/tiles)
        └── {tile}/
            ├── aligned.zarr/    <- image stores (Snakemake writes these)
            ├── ...
            └── labels/
                ├── zarr.json    <- labels group
                ├── nuclei.zarr/
                └── cells.zarr/
```

In [10]:
# Inspect plate-level metadata for SBS and phenotype
for module in ["sbs", "phenotype"]:
    plate_path = ZARR_ROOT / f"{module}/1.zarr"
    with open(plate_path / "zarr.json") as f:
        pmeta = json.load(f)
    plate = pmeta["attributes"]["ome"]["plate"]
    wells = [w["path"] for w in plate["wells"]]
    print(f"--- {module} plate 1 ---")
    print(f"  Rows: {[r['name'] for r in plate['rows']]}")
    print(f"  Columns: {[c['name'] for c in plate['columns']]}")
    print(f"  Wells: {wells}")

    # Show fields in first well
    well_dir = plate_path / wells[0]
    with open(well_dir / "zarr.json") as f:
        wmeta = json.load(f)
    fields = [img["path"] for img in wmeta["attributes"]["ome"]["well"]["images"]]
    print(f"  Fields in {wells[0]}: {fields}")
    print()

--- sbs plate 1 ---
  Rows: ['A']
  Columns: ['1', '2']
  Wells: ['A/1', 'A/2']
  Fields in A/1: ['0', '2', '32']

--- phenotype plate 1 ---
  Rows: ['A']
  Columns: ['1', '2']
  Wells: ['A/1', 'A/2']
  Fields in A/1: ['141', '2', '5']



In [11]:
# Inspect plate + well + field metadata for SBS
plate_path = ZARR_ROOT / "sbs/1.zarr"
print("=== SBS plate metadata ===")
with open(plate_path / "zarr.json") as f:
    print(json.dumps(json.load(f), indent=2))

print("\n=== Well A/1 metadata ===")
with open(plate_path / "A/1/zarr.json") as f:
    print(json.dumps(json.load(f), indent=2))

=== SBS plate metadata ===
{
  "zarr_format": 3,
  "node_type": "group",
  "attributes": {
    "ome": {
      "version": "0.5",
      "plate": {
        "acquisitions": [
          {
            "id": 0,
            "name": "default"
          }
        ],
        "columns": [
          {
            "name": "1"
          },
          {
            "name": "2"
          }
        ],
        "rows": [
          {
            "name": "A"
          }
        ],
        "wells": [
          {
            "path": "A/1",
            "rowIndex": 0,
            "columnIndex": 0
          },
          {
            "path": "A/2",
            "rowIndex": 0,
            "columnIndex": 1
          }
        ]
      }
    }
  }
}

=== Well A/1 metadata ===
{
  "zarr_format": 3,
  "node_type": "group",
  "attributes": {
    "ome": {
      "version": "0.5",
      "well": {
        "images": [
          {
            "path": "0",
            "acquisition": 0
          },
          {
            "path"

In [12]:
# Field contents: SBS vs phenotype (direct-write — no symlinks)
for module, field_path in [
    ("sbs", ZARR_ROOT / "sbs/1.zarr/A/1/0"),
    ("phenotype", ZARR_ROOT / "phenotype/1.zarr/A/1/2"),
]:
    print(f"--- {module} field ---")
    for item in sorted(field_path.iterdir()):
        kind = "dir" if item.is_dir() else "file"
        print(f"  {item.name:35s}  ({kind})")
    print()

--- sbs field ---
  aligned.zarr                         (dir)
  illumination_corrected.zarr          (dir)
  labels                               (dir)
  log_filtered.zarr                    (dir)
  max_filtered.zarr                    (dir)
  peaks.zarr                           (dir)
  standard_deviation.zarr              (dir)

--- phenotype field ---
  aligned.zarr                         (dir)
  illumination_corrected.zarr          (dir)
  labels                               (dir)



In [13]:
# Labels: inspect the labels group within each tile
for module, field_path in [
    ("sbs", ZARR_ROOT / "sbs/1.zarr/A/1/0"),
    ("phenotype", ZARR_ROOT / "phenotype/1.zarr/A/1/2"),
]:
    labels_path = field_path / "labels"
    print(f"--- {module} labels ---")
    with open(labels_path / "zarr.json") as f:
        lmeta = json.load(f)
    label_names = lmeta["attributes"]["ome"]["labels"]
    print(f"  Available: {label_names}")

    # Read a label store
    nuc = read_image(labels_path / "nuclei.zarr")
    print(f"  Nuclei shape: {nuc.shape}, dtype: {nuc.dtype}")
    print(f"  Unique labels: {len(np.unique(nuc))}")
    print()

--- sbs labels ---
  Available: ['cells', 'nuclei']
  Nuclei shape: (1200, 1200), dtype: int64
  Unique labels: 3686

--- phenotype labels ---
  Available: ['cells', 'identified_cytoplasms', 'nuclei']
  Nuclei shape: (2400, 2400), dtype: uint16
  Unique labels: 1



In [14]:
# Preprocess has separate plate zarrs per modality (sbs/, phenotype/)
print("Preprocess plate zarrs:")
for modality in ["sbs", "phenotype"]:
    plate_path = ZARR_ROOT / f"preprocess/{modality}/1.zarr"
    if not plate_path.exists():
        continue
    with open(plate_path / "zarr.json") as f:
        pmeta = json.load(f)
    wells = [w["path"] for w in pmeta["attributes"]["ome"]["plate"]["wells"]]
    print(f"  preprocess/{modality}/1.zarr: wells={wells}")

# Show preprocess SBS field — has per-cycle image subgroups
field = ZARR_ROOT / "preprocess/sbs/1.zarr/A/1/2"
print(f"\nPreprocess SBS field contents (per-cycle images):")
for item in sorted(field.iterdir()):
    kind = "dir" if item.is_dir() else "file"
    print(f"  {item.name:30s}  ({kind})")

Preprocess plate zarrs:
  preprocess/sbs/1.zarr: wells=['A/1', 'A/2']
  preprocess/phenotype/1.zarr: wells=['A/1', 'A/2']

Preprocess SBS field contents (per-cycle images):
  1                               (dir)
  10                              (dir)
  11                              (dir)
  2                               (dir)
  3                               (dir)
  4                               (dir)
  5                               (dir)
  6                               (dir)
  7                               (dir)
  8                               (dir)
  9                               (dir)


---

## 6. Metadata enrichment tasks

The sections below are templates for each enrichment the undergrad will implement.
Each shows what the metadata currently looks like, what it *should* look like
per [OME-NGFF v0.5](https://ngff.openmicroscopy.org/latest/), and a prototype
for writing it.

### References
- [OME-NGFF v0.5 spec](https://ngff.openmicroscopy.org/latest/)
- [HCS plate layout](https://ngff.openmicroscopy.org/latest/#hcs-layout)
- [BioHub spec](../../../zarr3_biohub_spec.md)

### 6a. Axis units

**Current:** Axes have `name` and `type` but no `unit`.

**Target:** Spatial axes should have `"unit": "micrometer"` per OME-NGFF spec.

In [15]:
# Current axes
print("Current axes (missing units):")
for ax in ms["axes"]:
    print(f"  {ax}")

# What they should look like:
target_axes = [
    {"name": "c", "type": "channel"},
    {"name": "y", "type": "space", "unit": "micrometer"},
    {"name": "x", "type": "space", "unit": "micrometer"},
]
print("\nTarget axes (with units):")
for ax in target_axes:
    print(f"  {ax}")

Current axes (missing units):
  {'name': 'c', 'type': 'channel'}
  {'name': 'y', 'type': 'space'}
  {'name': 'x', 'type': 'space'}

Target axes (with units):
  {'name': 'c', 'type': 'channel'}
  {'name': 'y', 'type': 'space', 'unit': 'micrometer'}
  {'name': 'x', 'type': 'space', 'unit': 'micrometer'}


### 6b. Pixel sizes in coordinateTransformations

**Current:** All scale factors are `[1.0, 1.0, 1.0]` (placeholder).

**Target:** Use real pixel sizes from the hardware metadata parquet.
At level 0, scale = `[1.0, pixel_size_y, pixel_size_x]`.
At level N, scale = `[1.0, pixel_size_y * 2^N, pixel_size_x * 2^N]`.

In [16]:
# Get pixel sizes from hardware metadata
px_x = ph_meta["pixel_size_x"].iloc[0]
px_y = ph_meta["pixel_size_y"].iloc[0]
print(f"Pixel size from metadata: X={px_x} µm, Y={px_y} µm")

# Build correct coordinateTransformations for 5 pyramid levels
coarsening_factor = 2
n_levels = 5
print("\nTarget coordinateTransformations:")
for i in range(n_levels):
    scale_y = px_y * (coarsening_factor**i)
    scale_x = px_x * (coarsening_factor**i)
    print(f"  level {i}: scale=[1.0, {scale_y:.4f}, {scale_x:.4f}]")

Pixel size from metadata: X=1.625 µm, Y=1.625 µm

Target coordinateTransformations:
  level 0: scale=[1.0, 1.6250, 1.6250]
  level 1: scale=[1.0, 3.2500, 3.2500]
  level 2: scale=[1.0, 6.5000, 6.5000]
  level 3: scale=[1.0, 13.0000, 13.0000]
  level 4: scale=[1.0, 26.0000, 26.0000]


### 6c. Channel names

**Current:** Channels are labeled `c0`, `c1`, `c2`, `c3` (generic).

**Target:** Meaningful names from config, e.g., `DAPI`, `COXIV`, `CENPA`, `WGA`.

In [17]:
# Current
print("Current channel labels:")
for ch in omero.get("channels", []):
    print(f"  {ch['label']}")

# From config
phenotype_channels = ["DAPI", "COXIV", "CENPA", "WGA"]
sbs_channels = ["DAPI", "G", "T", "A", "C"]

print(f"\nTarget phenotype channels: {phenotype_channels}")
print(f"Target SBS channels: {sbs_channels}")

Current channel labels:
  c0
  c1
  c2
  c3

Target phenotype channels: ['DAPI', 'COXIV', 'CENPA', 'WGA']
Target SBS channels: ['DAPI', 'G', 'T', 'A', 'C']


### 6d. Contrast limits (rendering window)

**Current:** No `window` key in channel metadata.

**Target:** Each channel gets `"window": {"start": min, "end": max, "min": 0, "max": dtype_max}`
computed from 1st/99th percentile intensity.

In [18]:
# Compute contrast limits from actual data
img = read_image(tile_store)
print(f"Image shape: {img.shape} (channels, y, x)")

for i, ch_name in enumerate(phenotype_channels):
    ch_data = img[i]
    p1 = float(np.percentile(ch_data, 1))
    p99 = float(np.percentile(ch_data, 99))
    dtype_max = (
        int(np.iinfo(ch_data.dtype).max)
        if np.issubdtype(ch_data.dtype, np.integer)
        else 1.0
    )
    print(
        f"  {ch_name}: window={{start: {p1:.0f}, end: {p99:.0f}, min: 0, max: {dtype_max}}}"
    )

Image shape: (4, 2400, 2400) (channels, y, x)
  DAPI: window={start: 103, end: 124, min: 0, max: 65535}
  COXIV: window={start: 1584, end: 1952, min: 0, max: 65535}
  CENPA: window={start: 116, end: 156, min: 0, max: 65535}
  WGA: window={start: 1648, end: 2080, min: 0, max: 65535}


### 6e. Label dtype and segmentation metadata

**Current:** Labels may be float or have inconsistent dtypes.

**Target:** Segmentation masks should be `int32`. The `image-label` attribute can
include source info (method, label identity).

In [19]:
# Check label dtypes across modules
for label_path in sorted(ZARR_ROOT.rglob("nuclei.zarr")):
    # Skip paths inside preprocess (those are raw images, not labels)
    rel = label_path.relative_to(ZARR_ROOT)
    if "labels" not in str(rel):
        continue
    root = zarr.open_group(str(label_path), mode="r")
    dtype = root["0"].dtype
    status = "OK" if dtype == np.int32 else f"NEEDS CONVERSION (currently {dtype})"
    print(f"  {rel}: {status}")

  phenotype/1.zarr/A/1/141/labels/nuclei.zarr: NEEDS CONVERSION (currently int64)
  phenotype/1.zarr/A/1/2/labels/nuclei.zarr: NEEDS CONVERSION (currently uint16)
  phenotype/1.zarr/A/1/5/labels/nuclei.zarr: NEEDS CONVERSION (currently int64)
  phenotype/1.zarr/A/2/141/labels/nuclei.zarr: NEEDS CONVERSION (currently int64)
  phenotype/1.zarr/A/2/2/labels/nuclei.zarr: NEEDS CONVERSION (currently uint16)
  phenotype/1.zarr/A/2/5/labels/nuclei.zarr: NEEDS CONVERSION (currently int64)
  sbs/1.zarr/A/1/0/labels/nuclei.zarr: NEEDS CONVERSION (currently int64)
  sbs/1.zarr/A/1/2/labels/nuclei.zarr: NEEDS CONVERSION (currently uint16)
  sbs/1.zarr/A/1/32/labels/nuclei.zarr: NEEDS CONVERSION (currently int64)
  sbs/1.zarr/A/2/0/labels/nuclei.zarr: NEEDS CONVERSION (currently int64)
  sbs/1.zarr/A/2/2/labels/nuclei.zarr: NEEDS CONVERSION (currently uint16)
  sbs/1.zarr/A/2/32/labels/nuclei.zarr: NEEDS CONVERSION (currently int64)


---

## 7. Prototype: writing enriched metadata to a zarr store

This shows how to modify a zarr store's `zarr.json` in place.
The undergrad should use this pattern to implement each enrichment.

In [20]:
import copy
import shutil

# Work on a copy so we don't modify pipeline outputs
demo_store = Path("./demo_enriched.zarr")
if demo_store.exists():
    shutil.rmtree(demo_store)
shutil.copytree(tile_store, demo_store)

# Read current metadata
with open(demo_store / "zarr.json") as f:
    meta = json.load(f)

# --- Enrich axes with units ---
ms_meta = meta["attributes"]["ome"]["multiscales"][0]
for ax in ms_meta["axes"]:
    if ax["type"] == "space":
        ax["unit"] = "micrometer"

# --- Enrich pixel sizes ---
px_x = ph_meta["pixel_size_x"].iloc[0]
px_y = ph_meta["pixel_size_y"].iloc[0]
for i, ds in enumerate(ms_meta["datasets"]):
    factor = 2**i
    ds["coordinateTransformations"] = [
        {"type": "scale", "scale": [1.0, px_y * factor, px_x * factor]}
    ]

# --- Enrich channel names + contrast limits ---
img = read_image(tile_store)
channels_enriched = []
for i, name in enumerate(phenotype_channels):
    ch_data = img[i]
    p1, p99 = float(np.percentile(ch_data, 1)), float(np.percentile(ch_data, 99))
    dtype_max = (
        int(np.iinfo(ch_data.dtype).max)
        if np.issubdtype(ch_data.dtype, np.integer)
        else 1.0
    )
    channels_enriched.append(
        {
            "label": name,
            "active": True,
            "color": "FFFFFF",
            "window": {"start": p1, "end": p99, "min": 0, "max": dtype_max},
        }
    )
meta["attributes"]["omero"]["channels"] = channels_enriched

# --- Write back ---
with open(demo_store / "zarr.json", "w") as f:
    json.dump(meta, f, indent=2)

print("Enriched zarr.json:")
print(json.dumps(meta, indent=2))

Enriched zarr.json:
{
  "attributes": {
    "ome": {
      "version": "0.5",
      "multiscales": [
        {
          "datasets": [
            {
              "path": "0",
              "coordinateTransformations": [
                {
                  "type": "scale",
                  "scale": [
                    1.0,
                    1.625,
                    1.625
                  ]
                }
              ]
            },
            {
              "path": "1",
              "coordinateTransformations": [
                {
                  "type": "scale",
                  "scale": [
                    1.0,
                    3.25,
                    3.25
                  ]
                }
              ]
            },
            {
              "path": "2",
              "coordinateTransformations": [
                {
                  "type": "scale",
                  "scale": [
                    1.0,
                    6.5,
                    

In [21]:
# Verify the enriched store still reads correctly
img_enriched = read_image(demo_store)
img_original = read_image(tile_store)
print(
    f"Data unchanged after metadata enrichment: {np.array_equal(img_enriched, img_original)}"
)

# Cleanup
shutil.rmtree(demo_store)

Data unchanged after metadata enrichment: True


---

## 8. What needs to happen in the pipeline

Once the enrichment logic is prototyped here, integrate it into the pipeline:

1. **`save_image()` / `write_image_omezarr()`** — Accept and write richer metadata
   (pixel sizes, channel names, contrast limits)
2. **Snakemake rules** — Pass config values (pixel_size_um, channel names) as `params`
3. **Scripts** — Forward `snakemake.params` metadata into `save_image()` calls
4. **Config YAML** — Add `pixel_size_um`, channel name lists if not already present

### iohub for metadata automation

[iohub](https://github.com/czbiohub-sf/iohub) (CZ Biohub) can read/write OME-Zarr
with rich metadata handling. It may be useful for:
- Automatically deriving metadata from pipeline data (channel names from config,
  pixel sizes from hardware metadata parquets)
- Writing compliant HCS metadata more efficiently than manual zarr.json editing
- Validating our output against the OME-NGFF spec

Evaluate whether iohub can replace or complement the manual metadata writing approach.

### Key question: does metadata propagate through read -> process -> write?

Test this by:
1. Write metadata to a store
2. Read with `read_image()`
3. Process the array (e.g., crop, filter)
4. Write to a new store with `save_image()`
5. Check: is the metadata preserved?

Currently `read_image()` returns a numpy array and discards metadata,
so metadata will NOT propagate automatically. The pipeline needs explicit forwarding.

### Visual validation with Napari

After enriching metadata, verify it renders correctly in Napari using
`tests/viewer/load_omezarr_in_napari.py`.

```bash
# On your laptop (not cluster -- needs a display)
conda create -n napari-viz -c conda-forge python=3.11 napari zarr numpy -y
conda activate napari-viz

# Per-tile store (direct-write path):
python tests/viewer/load_omezarr_in_napari.py output/sbs/1.zarr/A/1/0/aligned.zarr

# HCS field (labels nested under labels/):
python tests/viewer/load_omezarr_in_napari.py output/sbs/1.zarr/A/1/0
```

What to check:
- Channel names visible (not `c0`, `c1`, ...)
- Scale bar shows correct physical coordinates
- Contrast limits produce sensible default rendering
- Segmentation labels overlay correctly on images

In [22]:
# Demonstrate: metadata does NOT survive read → write roundtrip
demo_src = Path("./demo_src.zarr")
demo_dst = Path("./demo_dst.zarr")

# Write with pixel sizes
save_image(img_original, demo_src, pixel_size=0.65, channel_names=phenotype_channels)

# Read back (only gets array, metadata is lost)
arr = read_image(demo_src)

# Write to new store (no metadata forwarded)
save_image(arr, demo_dst)

# Compare metadata
with open(demo_src / "zarr.json") as f:
    src_meta = json.load(f)
with open(demo_dst / "zarr.json") as f:
    dst_meta = json.load(f)

src_channels = [ch["label"] for ch in src_meta["attributes"]["omero"]["channels"]]
dst_channels = [ch["label"] for ch in dst_meta["attributes"]["omero"]["channels"]]
print(f"Source channels: {src_channels}")
print(f"Dest channels:   {dst_channels}  ← reverted to generic names")
print("\n→ Metadata must be explicitly forwarded through pipeline scripts.")

# Cleanup
shutil.rmtree(demo_src)
shutil.rmtree(demo_dst)

Source channels: ['DAPI', 'COXIV', 'CENPA', 'WGA']
Dest channels:   ['c0', 'c1', 'c2', 'c3']  ← reverted to generic names

→ Metadata must be explicitly forwarded through pipeline scripts.
