Skip to content
Merged
116 changes: 70 additions & 46 deletions doc/getting_started/tutorials/13.containers.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,7 @@
"id": "cell-01",
"metadata": {},
"source": [
"# Working with Containers\n",
"\n",
"This notebook is a guided tour of the main data containers in `python-blosc2`.\n",
"\n",
"The goal is to build a practical mental model first: what each container is, how the containers relate, and when each one is the right tool.\n",
"\n",
"We will cover these containers in this order:\n",
"\n",
"1. `SChunk`\n",
"2. `NDArray`\n",
"3. `ObjectArray`\n",
"4. `BatchArray`\n",
"5. `EmbedStore`\n",
"6. `DictStore`\n",
"7. `TreeStore`\n",
"8. `C2Array`"
"# Working with Containers\n\nThis notebook is a guided tour of the main data containers in `python-blosc2`.\n\nThe goal is to build a practical mental model first: what each container is, how the containers relate, and when each one is the right tool.\n\nWe will cover these containers in this order:\n\n1. `SChunk`\n2. `NDArray`\n3. `ObjectArray`\n4. `BatchArray`\n5. `EmbedStore`\n6. `DictStore`\n7. `TreeStore` (including inline `CTable` support)\n8. `C2Array`"
]
},
{
Expand Down Expand Up @@ -444,6 +429,73 @@
" show(\"/exp/run2/data\", tstore[\"/exp/run2/data\"][:])"
]
},
{
"cell_type": "markdown",
"id": "cell-17b",
"metadata": {},
"source": [
"### Storing CTables inside a TreeStore\n",
"\n",
"A `TreeStore` can hold **both NDArrays and CTables** in the same bundle. A `CTable` is stored inline as a named subtree — all its columns, metadata, and index sidecars live as ordinary Blosc2 leaves inside the outer store. From the outside it appears as a single key, exactly like any other leaf:\n",
"\n",
"* `ts[\"/table\"] = ctable` — stores the CTable inline (same syntax as NDArray).\n",
"* `ts[\"/table\"]` — returns a `CTable` object transparently.\n",
"* `\"/table/_meta\" not in ts` — internal keys are hidden from normal traversal.\n",
"* `del ts[\"/table\"]` — removes the whole object and all its leaves at once.\n",
"\n",
"The inline layout means there are **no nested ZIP files**: all leaves are flat members of the outer `.b2z` archive and can be opened by offset without extraction."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cell-17c",
"metadata": {},
"outputs": [],
"source": [
"from dataclasses import dataclass\n",
"\n",
"\n",
"@dataclass\n",
"class Reading:\n",
" sensor_id: int = 0\n",
" value: float = 0.0\n",
"\n",
"\n",
"bundle_path = reset(\"bundle.b2z\")\n",
"\n",
"# --- Write: mix NDArrays and CTables in one bundle ----------------------\n",
"t = blosc2.CTable(Reading)\n",
"for i in range(6):\n",
" t.append(Reading(sensor_id=i, value=round(i * 1.1, 2)))\n",
"\n",
"with blosc2.TreeStore(bundle_path, mode=\"w\") as ts:\n",
" ts[\"/raw/signal\"] = np.arange(8, dtype=np.float32)\n",
" ts[\"/tables/readings\"] = t # CTable stored inline\n",
" show(\"keys after write\", sorted(ts.keys()))\n",
" show(\"/tables/readings/_meta in ts (hidden)\", \"/tables/readings/_meta\" in ts)\n",
"\n",
"# --- Read back from the .b2z archive ------------------------------------\n",
"with blosc2.open(bundle_path, mode=\"r\") as ts:\n",
" readings = ts[\"/tables/readings\"] # returns CTable transparently\n",
" show(\"type\", type(readings).__name__)\n",
" show(\"rows\", len(readings))\n",
" show(\"sensor_id\", list(readings[\"sensor_id\"][:]))\n",
" show(\"value\", list(readings[\"value\"][:]))\n",
"\n",
"# --- Append a row in-place (append mode) --------------------------------\n",
"with blosc2.TreeStore(bundle_path, mode=\"a\") as ts:\n",
" r = ts[\"/tables/readings\"]\n",
" r.append(Reading(sensor_id=99, value=-1.0))\n",
" r.close() # optional; outer store also closes it on __exit__\n",
" show(\"rows after append\", len(ts[\"/tables/readings\"]))\n",
"\n",
"# --- Delete the CTable (all internal leaves removed) -------------------\n",
"with blosc2.TreeStore(bundle_path, mode=\"a\") as ts:\n",
" del ts[\"/tables/readings\"]\n",
" show(\"keys after delete\", sorted(ts.keys()))"
]
},
{
"cell_type": "markdown",
"id": "cell-18",
Expand Down Expand Up @@ -494,43 +546,15 @@
"id": "cell-20",
"metadata": {},
"source": [
"## Choosing The Right Container\n",
"\n",
"| Container | Backing idea | Best for |\n",
"| --- | --- | --- |\n",
"| `SChunk` | raw compressed chunks | direct chunk-level storage control |\n",
"| `NDArray` | `SChunk` plus array metadata | dense numeric arrays |\n",
"| `ObjectArray` | one variable-length entry per chunk | ragged or heterogeneous Python values |\n",
"| `BatchArray` | one batch per chunk | batch-oriented ingestion and access |\n",
"| `EmbedStore` | one bundled object store | packaging a few Blosc2 objects together |\n",
"| `DictStore` | keyed collection of leaves | portable multi-object datasets |\n",
"| `TreeStore` | hierarchical keyed collection | tree-structured datasets |\n",
"| `C2Array` | remote array handle | arrays hosted by a remote Caterva2 service |\n",
"\n",
"A simple rule of thumb is:\n",
"\n",
"- start with `NDArray` for dense numeric data\n",
"- drop down to `SChunk` if you need chunk-level control\n",
"- use `ObjectArray` or `BatchArray` for variable-length Python objects\n",
"- use `EmbedStore`, `DictStore`, or `TreeStore` when your dataset contains multiple objects"
"## Choosing The Right Container\n\n| Container | Backing idea | Best for |\n| --- | --- | --- |\n| `SChunk` | raw compressed chunks | direct chunk-level storage control |\n| `NDArray` | `SChunk` plus array metadata | dense numeric arrays |\n| `ObjectArray` | one variable-length entry per chunk | ragged or heterogeneous Python values |\n| `BatchArray` | one batch per chunk | batch-oriented ingestion and access |\n| `EmbedStore` | one bundled object store | packaging a few Blosc2 objects together |\n| `DictStore` | keyed collection of leaves | portable multi-object datasets |\n| `TreeStore` | hierarchical keyed collection | tree-structured datasets with NDArrays and/or CTables |\n| `C2Array` | remote array handle | arrays hosted by a remote Caterva2 service |\n\nA simple rule of thumb is:\n\n- start with `NDArray` for dense numeric data\n- drop down to `SChunk` if you need chunk-level control\n- use `ObjectArray` or `BatchArray` for variable-length Python objects\n- use `EmbedStore`, `DictStore`, or `TreeStore` when your dataset contains multiple objects"
]
},
{
"cell_type": "markdown",
"id": "cell-21",
"metadata": {},
"source": [
"## Final Notes\n",
"\n",
"This notebook is intentionally organized from low-level storage to higher-level organization:\n",
"\n",
"- understand `SChunk` first\n",
"- use `NDArray` for most dense numeric workloads\n",
"- move to `ObjectArray` or `BatchArray` when entries stop being fixed-size arrays\n",
"- use `EmbedStore`, `DictStore`, or `TreeStore` when you need to package multiple objects together\n",
"- use `C2Array` when the data lives on a remote service\n",
"\n",
"For deeper details on a specific class, continue with the reference docs and the dedicated tutorials for `ObjectArray`, `BatchArray`, and indexing."
"## Final Notes\n\nThis notebook is intentionally organized from low-level storage to higher-level organization:\n\n- understand `SChunk` first\n- use `NDArray` for most dense numeric workloads\n- move to `ObjectArray` or `BatchArray` when entries stop being fixed-size arrays\n- use `EmbedStore`, `DictStore`, or `TreeStore` when you need to package multiple objects together\n- use `TreeStore` + `CTable` together when your bundle mixes dense arrays with structured tables\n- use `C2Array` when the data lives on a remote service\n\nFor deeper details on a specific class, continue with the reference docs and the dedicated tutorials for `ObjectArray`, `BatchArray`, and indexing."
]
},
{
Expand Down
47 changes: 46 additions & 1 deletion examples/tree-store.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@
# SPDX-License-Identifier: BSD-3-Clause
#######################################################################

# Example usage of TreeStore with hierarchical navigation and vlmeta
# Example usage of TreeStore with hierarchical navigation, vlmeta, and CTables

from dataclasses import dataclass

import numpy as np

Expand Down Expand Up @@ -66,3 +68,46 @@
rsub = tstore2["/child0"]
print("/child0/new_leaf via subtree:", rsub["/new_leaf"][:])
print(f"TreeStore file at: {tstore2.localpath}")

# ---------------------------------------------------------------------------
# Mixing NDArrays and CTables in the same TreeStore
# ---------------------------------------------------------------------------


@dataclass
class Reading:
sensor_id: int = 0
value: float = 0.0


with blosc2.TreeStore("example_tree.b2z", mode="a") as ts:
# Create a small CTable in memory and store it inline
t = blosc2.CTable(Reading)
for i in range(5):
t.append(Reading(sensor_id=i, value=float(i) * 1.1))

# Assignment syntax is identical to NDArray
ts["/readings"] = t
print("Keys after adding CTable:", sorted(ts.keys()))

# Object internals are hidden from normal traversal
print("/readings/_meta in ts:", "/readings/_meta" in ts) # False

with blosc2.open("example_tree.b2z", mode="r") as ts:
# CTable is returned transparently; no special open call needed
readings = ts["/readings"]
print(f"CTable type: {type(readings).__name__}, rows: {len(readings)}")
print("sensor_id column:", list(readings["sensor_id"][:]))
print("value column :", list(readings["value"][:]))

# Append a row to an inline CTable via append mode
with blosc2.TreeStore("example_tree.b2z", mode="a") as ts:
readings = ts["/readings"]
readings.append(Reading(sensor_id=99, value=-1.0))
readings.close() # explicit close before outer store repacks
print("After append, rows:", len(ts["/readings"]))

# Delete the CTable object root (removes all internal leaves)
with blosc2.TreeStore("example_tree.b2z", mode="a") as ts:
del ts["/readings"]
print("After deleting /readings:", sorted(ts.keys()))
Loading
Loading