Skip to content
This repository has been archived by the owner on Aug 29, 2023. It is now read-only.

Add support for Zarr data I/O format #659

Open
forman opened this issue May 18, 2018 · 4 comments
Open

Add support for Zarr data I/O format #659

forman opened this issue May 18, 2018 · 4 comments
Assignees

Comments

@forman
Copy link
Member

forman commented May 18, 2018

xarray 0.10 introduced two new methods, xr.open_zarr() and Dataset.to_zarr.

After a few first tests, I am very enthusiastic about the Zarr, a new data format optimized for distributed and concurrent array I/O. It seems to offer much better I/O performance over NetCDF4, which maybe due to single-threaded HDF5 decompression in Python (not checked).

As it seems to be a 1:1 representation of the NetCDF4 / HDF5 data model, Cate could use it for very efficient workspaces persistence or users could use it for intermediate computation results.

The good news is, that Cate doesn't require any extra dependencies as the zarr package is already a dependency of xarray 0.10.

@JanisGailis
Copy link
Member

After quickly googling around and reading about Zarr, it really looks quite impressive.

@forman
Copy link
Member Author

forman commented May 18, 2018

And it is lightning fast. Just ingested a Zarr data cube with dims=(time=250, lat=1000, lon=2000) (for another project) and when I time-travel through it, layers are displayed immediately.

@papesci
Copy link
Contributor

papesci commented May 31, 2018

it looks an impressive persistence support. It would be a good idea to include it enabling the user to save processed data.

@forman
Copy link
Member Author

forman commented Jun 1, 2018

I'll merge my branch, so we can play with it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants