Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cmip6 rechunking example workflow #75

Merged
merged 2 commits into from
Feb 16, 2021

Conversation

dgergel
Copy link
Member

@dgergel dgergel commented Feb 10, 2021

This PR adds a notebook that pulls in CMIP6 data, saves it as a zarr store on GCS, rechunks it from time chunks to space chunks using rechunker, and outputs it as a rechunked zarr array.

Potential point of discussion is whether we actually want to save it as a zarr group vs. an array
cc @brews @kemccusker for thoughts on this

closes #74

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@brews
Copy link
Member

brews commented Feb 10, 2021

Cool. That's handy, @dgergel!

Did you try reading that final, rechunked zarr back into xarray? I'd ensure the rechunked data reads back into xarray without a hitch, if you haven't already.

On groups and arrays... Not quite sure I follow the problem that using groups fixes for us? I'd advise against it unless we have a need. In my experience users can get really hung-up on groups in NetCDF or HDF5 files, especially if they're new to the files and they're reading them in xarray (or R, MATLAB, equivalent). So, I'd avoid hierarchies and complexity in our structures unless we need it.

@dgergel
Copy link
Member Author

dgergel commented Feb 11, 2021

@brews good call with checking that xarray can read the zarr array. Looks like it can't because for xarray to be able to read a zarr array, it needs to be part of a zarr group. I don't think there's functionality for reading a zarr array (documentation from xr.open_zarr requires a zarr group, not a zarr array).

I think it should be fine though if I write the above as a zarr group versus a zarr array - it's not a lot of additional complexity other than having to refer to the array as source_group['array_name'] similar to a Dataset. Does that seem reasonable to you?

@brews
Copy link
Member

brews commented Feb 11, 2021

I think it should be fine though if I write the above as a zarr group versus a zarr array - it's not a lot of additional complexity other than having to refer to the array as source_group['array_name'] similar to a Dataset. Does that seem reasonable to you?

@dgergel Nice. Good to figure that out now. What you're saying sounds reasonable.

My main concern with groups in the zarr files is getting stuck with nest hierarchies of groups or similar complexity. If it's a single root group acting as a container for many arrays, this doesn't sound like a problem.

My main concern in asking about xarray is that most downscaling workflow steps are reading these zarr files from store with xr.open_zarr() . So, ideally all output rechunked zarr files will be easily readable into a xr.Dataset via xr.open_zarr().

I think you have enough details in this notebook for me to start a PR for ClimateImpactLab/dodola#21. 👍

@dgergel
Copy link
Member Author

dgergel commented Feb 12, 2021

@brews awesome. I wasn't able to save to a zarr group in the actual rechunking step (this should work in theory, but it doesn't) so instead I'm saving to a zarr group now after reading in the zarr array (which is fast, so I think it's fine). Now it's super easy to read in the zarr store with xr.open_zarr() (see the end of the notebook) so I think we're good.

@dgergel dgergel merged commit 51d38b5 into master Feb 16, 2021
@delgadom delgadom deleted the feature/add_rechunking_cmip6_example branch February 23, 2022 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add example workflow using rechunker package on CMIP6
2 participants