CTable: Columnar Compressed Table for Blosc2 by Jacc4224 · Pull Request #614 · Blosc/python-blosc2

Jacc4224 · 2026-04-07T17:13:11Z

CTable: Columnar Compressed Table for Blosc2

This PR introduces CTable, a typed columnar table backed by blosc2.NDArray per column, with full
schema validation, persistence, and interoperability.

Schema layer

@DataClass + blosc2.field() spec primitives: int8/16/32/64, uint8/16/32/64, float32/64, bool,
complex64/128, string, bytes
Pydantic-backed row validation (append) and vectorized NumPy bulk validation (extend)
Schema serialization/deserialization for persistence

Core table operations

append() / extend() with schema validation
Tombstone deletion model: delete() sets a mask, compact() closes gaps
sort_by(cols, ascending, inplace) — single and multi-column stable sort
where(expr) row-filter views, select(cols) column-projection views (no data copy)
Schema mutations: add_column, drop_column, rename_column
Column aggregates: sum, min, max, mean, std, any, all, unique, value_counts
Table-level: describe(), cov(), head(), tail(), sample(), info()

Persistence

File-backed NDArrays: one .b2nd per column + _valid_rows.b2nd + _meta.b2frame
CTable(Row, urlpath=...), CTable.open(), t.save(), CTable.load()

Interoperability

Arrow: to_arrow() → pyarrow.Table, from_arrow(arrow_table) → CTable — direct bulk-write per
column, no row-level overhead
CSV: to_csv(path), from_csv(path, row_cls) → CTable — stdlib only, chunk-aware, respects
deletions

Tests & examples

363 tests across construct, validation, delete, compact, sort, aggregates, persistence, Arrow
interop, CSV interop
9 standalone example scripts (examples/ctable/) covering every feature
Benchmark: pandas ↔ CTable roundtrip pipeline (bench/ctable/bench_pandas_roundtrip.py)
Jupyter notebook tutorial (examples/ctable/ctable_tutorial.ipynb) with a 10-city climate
dataset as the running example

Introduce CTable, a new columnar table class for efficient in-memory data storage using Blosc2 as the underlying compression engine. Each column is represented as a Column object wrapping a blosc2.NDArray with typed, compressed storage. Building on top of blosc2's existing infrastructure, CTable supports append, iteration and column-based queries. This is an early-stage (beta) implementation; the table is always fully loaded in memory. New files: - src/blosc2/ctable.py: CTable and Column class definitions - tests/ctable/: unit tests covering construction, slicing, deletion, compaction and row logic - bench/ctable/: benchmarks comparing CTable against pandas

Add CTable, a columnar in-memory table built on top of blosc2

- Add schema.py with spec primitives: int8/16/32/64, uint8/16/32/64, float32/64, bool, complex64/128, string, bytes — sharing a _NumericSpec mixin to avoid boilerplate - Add schema_compiler.py: compile_schema(), CompiledSchema/Column/Config, schema_to_dict() / schema_from_dict() for persistence groundwork - Export all spec types and field() from blosc2 namespace Validation: - Add schema_validation.py: Pydantic-backed row validation for append(), cached per schema, re-raised as plain ValueError - Add schema_vectorized.py: vectorized NumPy constraint checks for extend(), using np.char.str_len() for string/bytes columns - validate= per-call override on extend() (None inherits table default) CTable refactor: - Constructor accepts dataclass schemas; legacy Pydantic adapter kept - Schema introspection: table.schema, column_schema(), schema_dict() - _last_pos cache eliminates backward chunk scan on every append/extend - _grow() shared resize helper; delete() writes back in-place without creating a new array; _n_rows updated by subtraction not count_nonzero - head() and tail() unified through _find_physical_index() Tests and docs: - 135 tests across 10 test files, all passing - plans/ctable-implementation-log.md and ctable-user-guide.md added - Benchmarks: bench_validation.py and bench_append_regression.py

…QoL) Persistency: - FileTableStorage backend: disk layout _meta.b2frame / _valid_rows.b2nd / _cols/<name>.b2nd - CTable(Row, urlpath=..., mode="w"/"a"/"r"), CTable.open(), CTable.save(), CTable.load() - Read-only mode blocks all writes; save() always writes compacted rows Column aggregates: sum, min, max, mean, std, any, all (chunk-aware via iter_chunks) Column utilities: unique(), value_counts(), assign(), boolean mask __getitem__/__setitem__ Schema mutations: add_column (fills default for existing rows), drop_column, rename_column - All three update schema, handle disk files, and block on views View mutability model fix: - Views allow value writes (assign, __setitem__) — only structural mutations are blocked - _read_only=True reserved for mode="r" disk tables; base is not None guards structural ops QoL: __str__ pandas-style, __repr__, cbytes/nbytes, sample(n), Column.iter_chunks(size) Tests: 258 tests, ~5s — new test_persistency.py (33), test_schema_mutations.py (41), expanded test_column.py; optimized helpers to use to_numpy() instead of row[i]

Arrow compatibility Examples Tutorial

Jacc4224 and others added 22 commits March 26, 2026 11:05

Merge pull request Blosc#604 from Jacc4224/ctable-new

01e47f4

Add CTable, a columnar in-memory table built on top of blosc2

Add a plan for declaring a simple schema for CTable objects

c05c2ec

Add a pydantic as a new dependency

725c28b

Fix small formatting issues

0efd450

Simplify the plan for ctable schema

f504ad0

Disable wheel generation for each commit in this branch

46bf2e3

Add a new plan on CTable persistence

43bf562

_

e84f7ac

_

8de1870

Testing

a8db18d

Merge branch 'ctable3' of github.com:Blosc/python-blosc2 into my_ctable3

dd154b1

writen test

ce65607

Remove testing file

b623f0e

Merge branch 'ctable3' of github.com:Blosc/python-blosc2 into my_ctable3

b9e8c35

persistency half way done

ee1d0c4

CSV compatibility implementation

34f8219

Arrow compatibility Examples Tutorial

Persistent ctables.

6bf1ec8

Colision bug fixed 1

34c2eee

FrancescAlted changed the title ~~Ctable 4 request~~ CTable: Columnar Compressed Table for Blosc2 Apr 8, 2026

FrancescAlted merged commit a3852b6 into Blosc:ctable4 Apr 8, 2026
6 of 12 checks passed

FrancescAlted mentioned this pull request Apr 8, 2026

Ctable 3 changes #606

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CTable: Columnar Compressed Table for Blosc2#614

CTable: Columnar Compressed Table for Blosc2#614
FrancescAlted merged 22 commits intoBlosc:ctable4from
Jacc4224:my_ctable3

Jacc4224 commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Jacc4224 commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants