Skip to content

TreeStore to accept both CTable and NDArray as leaves#633

Merged
FrancescAlted merged 10 commits intomainfrom
ts-ctable-ndarray
May 8, 2026
Merged

TreeStore to accept both CTable and NDArray as leaves#633
FrancescAlted merged 10 commits intomainfrom
ts-ctable-ndarray

Conversation

@FrancescAlted
Copy link
Copy Markdown
Member

@FrancescAlted FrancescAlted commented May 8, 2026

This PR makes CTable objects first-class high-level leaf objects in TreeStore.

A TreeStore can now contain regular Blosc2 leaves, such as NDArray, together with inline CTable objects:

with blosc2.TreeStore("bundle.b2z", mode="w") as ts:
    ts["/x"] = blosc2.arange(10)
    ts["/table"] = table

with blosc2.open("bundle.b2z", mode="r") as ts:
    x = ts["/x"]          # NDArray
    table = ts["/table"]  # CTable

   TreeStore can now store CTable objects as first-class inline leaves
   alongside NDArrays:

       ts["/arr"] = blosc2.arange(10)
       ts["/table"] = ctable          # CTable stored inline
       table = ts["/table"]           # returns CTable transparently

   Physical layout: CTable internals (_meta, _valid_rows, _cols/*, _indexes/*)
   are stored as ordinary Blosc2 leaves inside the outer store's working
   directory, avoiding nested ZIPs and keeping everything directly addressable
   by offset in .b2z bundles.

   Implementation:
   - Add TreeStoreTableStorage backend in ctable_storage.py that maps CTable
     logical keys onto an outer TreeStore's map_tree/working_dir
   - Refactor CTable.save() and CTable.open() around shared _save_to_storage()
     and _open_from_storage() helpers; add private _save_to_treestore() and
     _open_from_treestore() used by TreeStore
   - Add persistent object registry in TreeStore (embed-store vlmeta) to
     track object roots; probe _/<key>/_meta as fallback for old stores
   - Update TreeStore.__setitem__ to dispatch CTable and block writes to
     object internals
   - Update TreeStore.__getitem__ to return CTable for registered object roots
   - Update TreeStore.__delitem__ to remove all physical leaves of an object
     root and unregister it; block direct deletion of internals
   - Update keys(), __contains__, walk(), get_children(), get_descendants()
     to treat object roots as opaque leaves and hide their internals
   - get_subtree() raises ValueError on object root paths
   - TreeStore.close()/discard() flush inline CTable handles before repacking
   - Add 30 new tests covering b2d/b2z, append mode, traversal, guards,
     deletion, roundtrips, and string columns
   - Add plans/tree_store_ctable_ndarray.md with design rationale
     - TreeStore.values() now collapses object roots.
     - Parent subtree deletion removes nested inline CTable objects.
     - Missing object registry fallback now hides/protects inline CTable internals.
     - Inline CTable indexes work in .b2d and .b2z.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends blosc2.TreeStore so a single bundle can store both array leaves (e.g., NDArray) and higher-level table objects (CTable) as collapsed “object roots”, with TreeStore traversal/deletion semantics treating a CTable as a single key while storing its physical components inline under that key.

Changes:

  • Add inline CTable support to TreeStore via an object-root registry + fallback probing of physical /_meta manifests, plus protections to hide/block direct access to object internals.
  • Refactor CTable persistence/opening around a storage-backend helper and add a TreeStoreTableStorage backend for inline storage under a TreeStore subtree.
  • Add tests, examples, and documentation demonstrating mixed NDArray/CTable bundles and expected traversal/deletion behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/blosc2/tree_store.py Adds object-root registry/probing, CTable dispatch on get/set/del, hides internals in traversal, and manages inline object handles on close/discard.
src/blosc2/ctable_storage.py Introduces TreeStoreTableStorage to persist CTable components inline inside an outer TreeStore; adjusts index-catalog path handling helpers.
src/blosc2/ctable.py Refactors open/save paths via _open_from_storage/_save_to_storage and adds private TreeStore inline save/open hooks.
src/blosc2/indexing.py Avoids reopening .b2z bundle urlpaths for offset-backed leaves during bucket worker setup.
tests/test_tree_store.py Adds comprehensive tests for mixed NDArray/CTable storage, traversal hiding, conflict rules, deletion semantics, append mode, and b2d/b2z roundtrips.
examples/tree-store.py Updates example to demonstrate storing, appending to, and deleting an inline CTable inside a TreeStore.
doc/getting_started/tutorials/13.containers.ipynb Updates tutorial to document inline CTable support in TreeStore and provides a walkthrough example.
plans/tree_store_ctable_ndarray.md Adds a detailed design/plan document describing the inline-subtree approach, registry, semantics, and test plan.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/blosc2/tree_store.py
Comment thread tests/test_tree_store.py Outdated
Comment thread src/blosc2/tree_store.py
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI and others added 4 commits May 8, 2026 16:32
Agent-Logs-Url: https://github.com/Blosc/python-blosc2/sessions/973a79eb-d0c3-43ac-8008-54a9bd392be0

Co-authored-by: FrancescAlted <314521+FrancescAlted@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Blosc/python-blosc2/sessions/973a79eb-d0c3-43ac-8008-54a9bd392be0

Co-authored-by: FrancescAlted <314521+FrancescAlted@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Blosc/python-blosc2/sessions/973a79eb-d0c3-43ac-8008-54a9bd392be0

Co-authored-by: FrancescAlted <314521+FrancescAlted@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Blosc/python-blosc2/sessions/973a79eb-d0c3-43ac-8008-54a9bd392be0

Co-authored-by: FrancescAlted <314521+FrancescAlted@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Blosc/python-blosc2/sessions/83225dd5-9d5c-4ca4-ba33-6ab4021b552f

Co-authored-by: FrancescAlted <314521+FrancescAlted@users.noreply.github.com>
@FrancescAlted FrancescAlted merged commit 1179928 into main May 8, 2026
17 checks passed
@FrancescAlted FrancescAlted deleted the ts-ctable-ndarray branch May 8, 2026 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants