Skip to content

developmentseed/datacube-benchmark

Repository files navigation

datacube-benchmark

Docs PyPI License: MIT

Utilities for benchmarking Zarr datacubes — generate synthetic stores with different chunking schemes, compressors, and dtypes, then measure read performance under realistic access patterns.

Companion package to the Datacube Guide, which documents common pitfalls when producing and consuming multi-dimensional data products.

Installation

pip install datacube-benchmark

Python 3.12+ is required.

Quickstart

Create a synthetic Zarr store on local disk and time a few random-access patterns against it:

from pathlib import Path

import obstore as obs
import zarr

import datacube_benchmark

path = Path.cwd() / "data" / "test.zarr"
path.mkdir(parents=True, exist_ok=True)
store = obs.store.LocalStore(str(path))
zarr_store = datacube_benchmark.create_zarr_store(store)

arr = zarr.open_array(zarr_store, zarr_version=3, path="data")
results = datacube_benchmark.benchmark_access_patterns(arr, num_samples=10)
print(results)

create_zarr_store takes target sizes and chunk shapes as strings or pint quantities (e.g. "1 GB", "10 MB"), and writes through an obstore store — so the same call works against a local directory, S3, GCS, or Azure by swapping the store.

What's in the box

  • create_zarr_store, create_or_open_zarr_store, create_or_open_zarr_array, create_empty_dataarray — build synthetic Zarr datacubes at a target size, resolution, and chunk shape.
  • benchmark_zarr_array — time random reads against one access pattern ("point", "time_series", "spatial_slice", "full") and return summary statistics with units attached.
  • benchmark_access_patterns — run all four access patterns and return the combined results as a pandas.DataFrame.
  • benchmark_dataset_open — time xarray.open_dataset on a Zarr store.
  • Config — a dataclass collecting the common knobs (compressor, target array size, sample counts, concurrency).

See the API reference for the full signatures and parameter docs.

License

MIT

About

Utilities to produce a data cube with various formats, compressions, and chunking schemes.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages