Support for fixed-shape N-dimensional CTable columns by FrancescAlted · Pull Request #637 · Blosc/python-blosc2

FrancescAlted · 2026-05-18T08:11:51Z

Summary:

Added blosc2.ndarray(...) schema support for per-row fixed-shape array columns.
Extended CTable append/extend/read paths to validate and preserve ndarray item shapes.
Added logical Column.shape, ndim, size, and item-shape helpers.
Added ndarray-column reductions and row-wise generated-column helpers.
Improved display/info output for ndarray-backed columns and nested column namespaces.
Extended Arrow/Parquet/CSV/Pandas interop to handle ndarray columns where supported.
Added validation around groupby/indexing Cython helpers and fixed object-key group_reduce sorting with None.
Added/updated tests covering ndarray columns, nested namespace info, groupby fallback sorting, and low-level validation paths.

Changed: - src/blosc2/ctable.py - ndarray column metadata: item_shape, item_ndim, item_size, logical ndim/size - tuple inner-axis indexing: t.embedding[:, 0], t.image[:, :, :, 0] - direct comparison guards for full ndarray columns - string where() ndarray-column guard + 1-D row-mask validation - scalar-only guards for sort/index/describe/cov - axis-aware reductions: sum/mean/min/max/std/norm(axis=...) - RowTransformer + Column.row_transformer - add_generated_column() - generated-column append/extend autofill - generated-column staleness and refresh: - refresh_generated_column() - refresh_generated_columns(source=...) - compact display for ndarray cells - Column.summary() - Arrow FixedSizeList export/import for ndarray columns - src/blosc2/groupby.py - group-by/aggregate guards for ndarray columns - src/blosc2/__init__.py - exports RowTransformer - Added tests: - tests/ctable/test_ctable_ndarray_columns.py

…mitations

…d is raised now

to_csv() — ndarray column cells are serialized as JSON arrays (e.g., "[1.0, 2.0, 3.0]"). Null ndarray cells write empty CSV fields, matching the scalar null convention. from_csv() — ndarray column cells are parsed from JSON arrays and stacked into the proper (nrows, *item_shape) storage. Empty cells for nullable ndarray columns restore the null sentinel. Wrong-shaped JSON arrays raise a clear ValueError with the expected item_shape. _csv_ndarray_col_to_array() — new static helper for the JSON→ndarray conversion path. to_pandas() — new method. Scalar columns become regular DataFrame columns. Ndarray columns become object-dtype columns whose cells hold NumPy arrays of per-row item_shape. from_pandas() — new classmethod. Builds a CTable from a DataFrame using an explicit row_cls schema. Object-dtype columns are NOT automatically inferred as ndarray — the schema must declare blosc2.ndarray fields explicitly. Validates column name matching and rejects non-object columns for ndarray specs.

Copilot

Pull request overview

This PR adds first-class support for fixed-shape N-dimensional array (“ndarray”) columns in blosc2.CTable, including schema representation, storage/layout changes, validation, reductions, generated-column helpers, and multiple interop paths (Arrow/Parquet/CSV/Pandas), plus expanded test coverage.

Changes:

Introduces NDArraySpec / blosc2.ndarray(...) schema support and compiler/validation plumbing for fixed-shape per-row arrays.
Extends CTable/Column to preserve ndarray item shapes through append/extend/read, enable tuple inner slicing, axis-aware reductions, and generated columns via RowTransformer (with stale tracking + refresh APIs).
Updates groupby/indexing Cython helpers with additional validation and improves group-reduce sorting behavior with None.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/ndarray/test_indexing.py	Adds tests ensuring Cython indexing helpers validate input shapes/lengths.
tests/ctable/test_groupby.py	Adds tests for object-key sorting (including `None`) and shape validation for checked kernels.
tests/ctable/test_ctable_ndarray_columns.py	New comprehensive tests for ndarray columns (metadata, slicing, reductions, nullable behavior, Arrow roundtrip, generated columns + staleness).
tests/ctable/test_ctable_dataclass_schema.py	Tests ndarray schema roundtrip/persistence and nested namespace info helpers.
tests/ctable/test_csv_interop.py	Adds CSV and Pandas interop tests for fixed-shape ndarray columns (including nullable behavior).
tests/ctable/test_column.py	Updates expectations for `Column.info_items` (logical `shape`/`nrows` instead of logical/physical length).
src/blosc2/schema.py	Adds `NDArraySpec` and `ndarray()` schema constructor with metadata serialization.
src/blosc2/schema_vectorized.py	Adds batch validation logic for ndarray columns (including null-sentinel handling).
src/blosc2/schema_validation.py	Extends null-masking logic to recognize ndarray null sentinels during row validation.
src/blosc2/schema_compiler.py	Adds schema compiler support for `NDArraySpec` (metadata kind mapping, display width, annotation validation, deserialization).
src/blosc2/indexing_ext.pyx	Adds validation guards for array dimensionality, length matching, and bounds in indexing kernels.
src/blosc2/groupby.py	Adds ndarray guardrails for group-by/agg, optional-kernel fallback handling, and improved object-key sorting with `None`.
src/blosc2/groupby_ext.pyx	Adds shape validation for checked dense groupby kernels.
src/blosc2/ctable.py	Core implementation: ndarray physical shapes, coercion, tuple indexing, reductions w/ `axis`, `RowTransformer`, generated columns + staleness/refresh, improved info/display, Arrow/CSV/Pandas interop, and additional safety guards.
src/blosc2/init.py	Exports `ndarray`, `NDArraySpec`, and `RowTransformer`; keeps compatibility attrs on `blosc2.ndarray`.
plans/ctable-ndarray-cols.md	Design/status plan document for ndarray columns and generated-column behavior.
plans/ctable-ndarray-cols-copilot-sonnet.md	Additional design/implementation planning notes for ndarray columns.
doc/reference/reduction_functions.rst	Updates reduction function reference listing (currently drops `group_reduce`).
doc/reference/ctable.rst	Enhances `CTable` reference docs around `__getitem__`/`__getattr__` and attribute-vs-item column access.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…_batches()

…generated columns

Copilot

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 6 comments.

    order = list(acc)
    if sort:
-        order.sort(key=lambda k: (1, "") if k is _NAN_KEY else (0, display[k]))
+        order.sort(key=lambda k: _sortable_key_part(display[k]))
    groups = np.asarray([display[k] for k in order], dtype=keys.dtype)


+        new_cols: dict[str, blosc2.NDArray] = {}
+        for col in schema.columns:
+            shape = cls._column_physical_shape(col, capacity)
+            chunks, blocks = cls._column_chunks_blocks(col, shape)
+            new_cols[col.name] = mem_storage.create_column(
+                col.name,
+                dtype=col.dtype,
+                shape=shape,
+                chunks=chunks,
+                blocks=blocks,
+                cparams=None,
+                dparams=None,
+            )


+            if isinstance(nv, float) and np.isnan(nv):
+                elem_mask = np.isnan(arr)
+            else:
+                elem_mask = arr == nv
+            inner_axes = tuple(range(1, elem_mask.ndim))
+            return elem_mask.all(axis=inner_axes) if inner_axes else elem_mask.astype(np.bool_)
        if isinstance(nv, float) and np.isnan(nv):


+        if isinstance(col.spec, NDArraySpec):
+            try:
+                arr = np.asarray(val, dtype=col.spec.dtype)
+                is_null = arr.shape == col.spec.item_shape and bool(
+                    np.isnan(arr).all() if isinstance(nv, float) and math.isnan(nv) else (arr == nv).all()
+                )


+            if stripped == "" and null_value is not None:
+                rows.append(np.full(item_shape, null_value, dtype=dtype))
+            else:
+                arr = np.array(json.loads(stripped), dtype=dtype)
+                if arr.shape != item_shape:
+                    raise ValueError(
+                        f"Column {col.name!r}: expected item shape {item_shape}, got {arr.shape}"
+                    )
+                rows.append(arr)
+


+        if self.nullable:
+            d["nullable"] = True
+        if hasattr(self, "null_value"):
+            d["null_value"] = self.null_value
+        return d


…e_sort_key()

- DictionarySpec - ListSpec - VLStringSpec / VLBytesSpec - other varlen scalar specs It now fills data via obj.extend(...), so the normal CTable ingestion paths handle dictionary/list/varlen/ndarray/scalar columns consistently. Pandas missing values (None, NaN, pandas.NA) are normalized to None for special/object-style columns before ingestion.

FrancescAlted added 14 commits May 16, 2026 09:34

First implementation

5cb8634

Add plans for ndarray follow-up

263d6be

Updated plan with add_generated_column()

9b02394

Updated plan with new blosc2.transform() descriptor

bf683c1

Better docstrings for add_computed_column() and add_generated_column()

8364755

Be explicit about CTable.__getattr__ attribute-style column access li…

ea23f9a

…mitations

Safer handling of stalled generated column; an exception when accesse…

10d8024

…d is raised now

Nullable ndarray columns are implemented now

94bdde7

Add a .info to NestedColumnNamespace

b865b11

Add shape, chunks, blocks to Column.info

877e2bf

Allow object keys containing None with sortable values

4242edd

Added missing validation for several functions in Cython extensions

ecb981d

FrancescAlted requested a review from Copilot May 18, 2026 08:12

Copilot started reviewing on behalf of FrancescAlted May 18, 2026 08:12 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread doc/reference/reduction_functions.rst

Comment thread src/blosc2/ctable.py Outdated

Comment thread src/blosc2/ctable.py Outdated

Comment thread src/blosc2/ctable.py

FrancescAlted added 4 commits May 18, 2026 10:43

Add docstrings for blosc2.group_reduce() in sphinx

fb39e4c

Small optimization when converting NDArray objects to a Python list

e65002c

Avoid a full Python object materialization of the batch in iter_arrow…

08ac7ae

…_batches()

Small optimization that can help in workloads with many materialized/…

ff66694

…generated columns

FrancescAlted requested a review from Copilot May 18, 2026 09:02

Copilot started reviewing on behalf of FrancescAlted May 18, 2026 09:03 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

FrancescAlted added 6 commits May 18, 2026 11:25

Changed _group_reduce_numpy() sorting to use a dedicated _group_reduc…

2753bb9

…e_sort_key()

Changed Column._null_mask_for() so NaN sentinels are better detected

623f67c

NumPy floating NaN sentinels are treated like Python float('nan')

c04f5d8

NumPy scalar null sentinels are normalized to plain Python scalars

73a2ce8

Better error handling in CTable._csv_ndarray_col_to_array()

2a0d19a

FrancescAlted merged commit b3d514e into main May 18, 2026
17 checks passed

FrancescAlted deleted the ndim-cols branch May 18, 2026 10:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for fixed-shape N-dimensional CTable columns#637

Support for fixed-shape N-dimensional CTable columns#637
FrancescAlted merged 24 commits into
mainfrom
ndim-cols

FrancescAlted commented May 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

FrancescAlted commented May 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants