Utilities for benchmarking Zarr datacubes — generate synthetic stores with different chunking schemes, compressors, and dtypes, then measure read performance under realistic access patterns.
Companion package to the Datacube Guide, which documents common pitfalls when producing and consuming multi-dimensional data products.
pip install datacube-benchmarkPython 3.12+ is required.
Create a synthetic Zarr store on local disk and time a few random-access patterns against it:
from pathlib import Path
import obstore as obs
import zarr
import datacube_benchmark
path = Path.cwd() / "data" / "test.zarr"
path.mkdir(parents=True, exist_ok=True)
store = obs.store.LocalStore(str(path))
zarr_store = datacube_benchmark.create_zarr_store(store)
arr = zarr.open_array(zarr_store, zarr_version=3, path="data")
results = datacube_benchmark.benchmark_access_patterns(arr, num_samples=10)
print(results)create_zarr_store takes target sizes and chunk shapes as strings or
pint quantities (e.g. "1 GB",
"10 MB"), and writes through an obstore
store — so the same call works against a local directory, S3, GCS, or
Azure by swapping the store.
create_zarr_store,create_or_open_zarr_store,create_or_open_zarr_array,create_empty_dataarray— build synthetic Zarr datacubes at a target size, resolution, and chunk shape.benchmark_zarr_array— time random reads against one access pattern ("point","time_series","spatial_slice","full") and return summary statistics with units attached.benchmark_access_patterns— run all four access patterns and return the combined results as apandas.DataFrame.benchmark_dataset_open— timexarray.open_dataseton a Zarr store.Config— a dataclass collecting the common knobs (compressor, target array size, sample counts, concurrency).
See the API reference for the full signatures and parameter docs.