You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
obstore is a high-performance Python library for cloud object storage backed by the Rust object_store crate. It provides a single, unified async-native interface across S3, GCS, Azure, HTTP, and local storage — making it a strong fit as our storage backend for both ingested datasets and external Zarr stores (#46).
Why obstore over fsspec
Zarr reads from S3 are many small, concurrent range requests — one per chunk. obstore is designed for exactly this pattern:
Stateless, atomic object API (mirrors how cloud storage actually works) vs. fsspec's filesystem cursor abstraction
~9x higher throughput than fsspec in async workloads
First-class async support — no thread pool workarounds
Automatic credential refresh and multipart uploads built in
Icechunk alignment: Icechunk uses the same underlying Rust object_store crate internally — adopting obstore means a shared storage layer if we go that route
Caveat to verify
zarr-python v3 supports pluggable Store backends and obstore provides an fsspec compatibility shim. We should verify that our full stack (xarray, zarr-python, any ingestion tooling) can wire up to obstore directly rather than falling back through fsspec — otherwise the performance gain is partially lost.
Suggested next steps
Benchmark obstore vs. fsspec/s3fs for representative Zarr chunk reads on our S3 data
Confirm zarr-python v3 + xarray can use obstore natively (not via fsspec shim)
Overview
obstore is a high-performance Python library for cloud object storage backed by the Rust
object_storecrate. It provides a single, unified async-native interface across S3, GCS, Azure, HTTP, and local storage — making it a strong fit as our storage backend for both ingested datasets and external Zarr stores (#46).Why obstore over fsspec
Zarr reads from S3 are many small, concurrent range requests — one per chunk. obstore is designed for exactly this pattern:
Relevance to this project
object_storecrate internally — adopting obstore means a shared storage layer if we go that routeCaveat to verify
zarr-python v3 supports pluggable
Storebackends and obstore provides an fsspec compatibility shim. We should verify that our full stack (xarray, zarr-python, any ingestion tooling) can wire up to obstore directly rather than falling back through fsspec — otherwise the performance gain is partially lost.Suggested next steps
References