-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add cmip6 rechunking example workflow #75
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Cool. That's handy, @dgergel! Did you try reading that final, rechunked zarr back into On groups and arrays... Not quite sure I follow the problem that using groups fixes for us? I'd advise against it unless we have a need. In my experience users can get really hung-up on groups in NetCDF or HDF5 files, especially if they're new to the files and they're reading them in xarray (or R, MATLAB, equivalent). So, I'd avoid hierarchies and complexity in our structures unless we need it. |
@brews good call with checking that xarray can read the zarr array. Looks like it can't because for xarray to be able to read a zarr array, it needs to be part of a zarr group. I don't think there's functionality for reading a zarr array (documentation from I think it should be fine though if I write the above as a zarr group versus a zarr array - it's not a lot of additional complexity other than having to refer to the array as |
@dgergel Nice. Good to figure that out now. What you're saying sounds reasonable. My main concern with groups in the zarr files is getting stuck with nest hierarchies of groups or similar complexity. If it's a single root group acting as a container for many arrays, this doesn't sound like a problem. My main concern in asking about xarray is that most downscaling workflow steps are reading these zarr files from store with I think you have enough details in this notebook for me to start a PR for ClimateImpactLab/dodola#21. 👍 |
@brews awesome. I wasn't able to save to a zarr group in the actual rechunking step (this should work in theory, but it doesn't) so instead I'm saving to a zarr group now after reading in the zarr array (which is fast, so I think it's fine). Now it's super easy to read in the zarr store with |
This PR adds a notebook that pulls in CMIP6 data, saves it as a zarr store on GCS, rechunks it from time chunks to space chunks using
rechunker
, and outputs it as a rechunked zarr array.Potential point of discussion is whether we actually want to save it as a zarr group vs. an array
cc @brews @kemccusker for thoughts on this
closes #74