add cmip6 rechunking example workflow #75

dgergel · 2021-02-10T20:12:48Z

This PR adds a notebook that pulls in CMIP6 data, saves it as a zarr store on GCS, rechunks it from time chunks to space chunks using rechunker, and outputs it as a rechunked zarr array.

Potential point of discussion is whether we actually want to save it as a zarr group vs. an array
cc @brews @kemccusker for thoughts on this

closes #74

review-notebook-app · 2021-02-10T20:12:51Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

brews · 2021-02-10T21:52:50Z

Cool. That's handy, @dgergel!

Did you try reading that final, rechunked zarr back into xarray? I'd ensure the rechunked data reads back into xarray without a hitch, if you haven't already.

On groups and arrays... Not quite sure I follow the problem that using groups fixes for us? I'd advise against it unless we have a need. In my experience users can get really hung-up on groups in NetCDF or HDF5 files, especially if they're new to the files and they're reading them in xarray (or R, MATLAB, equivalent). So, I'd avoid hierarchies and complexity in our structures unless we need it.

dgergel · 2021-02-11T22:04:13Z

@brews good call with checking that xarray can read the zarr array. Looks like it can't because for xarray to be able to read a zarr array, it needs to be part of a zarr group. I don't think there's functionality for reading a zarr array (documentation from xr.open_zarr requires a zarr group, not a zarr array).

I think it should be fine though if I write the above as a zarr group versus a zarr array - it's not a lot of additional complexity other than having to refer to the array as source_group['array_name'] similar to a Dataset. Does that seem reasonable to you?

brews · 2021-02-11T22:23:23Z

I think it should be fine though if I write the above as a zarr group versus a zarr array - it's not a lot of additional complexity other than having to refer to the array as source_group['array_name'] similar to a Dataset. Does that seem reasonable to you?

@dgergel Nice. Good to figure that out now. What you're saying sounds reasonable.

My main concern with groups in the zarr files is getting stuck with nest hierarchies of groups or similar complexity. If it's a single root group acting as a container for many arrays, this doesn't sound like a problem.

My main concern in asking about xarray is that most downscaling workflow steps are reading these zarr files from store with xr.open_zarr() . So, ideally all output rechunked zarr files will be easily readable into a xr.Dataset via xr.open_zarr().

I think you have enough details in this notebook for me to start a PR for ClimateImpactLab/dodola#21. 👍

dgergel · 2021-02-12T01:20:58Z

@brews awesome. I wasn't able to save to a zarr group in the actual rechunking step (this should work in theory, but it doesn't) so instead I'm saving to a zarr group now after reading in the zarr array (which is fast, so I think it's fine). Now it's super easy to read in the zarr store with xr.open_zarr() (see the end of the notebook) so I think we're good.

add cmip6 rechunking example workflow

2d229a2

dgergel added the enhancement label Feb 10, 2021

dgergel requested review from brews and kemccusker February 10, 2021 20:12

brews approved these changes Feb 10, 2021

View reviewed changes

brews mentioned this pull request Feb 11, 2021

Add rechunker service ClimateImpactLab/dodola#21

Closed

update to save as zarr group versus array for reading in with xarray

c8e7f22

dgergel merged commit 51d38b5 into master Feb 16, 2021

delgadom deleted the feature/add_rechunking_cmip6_example branch February 23, 2022 22:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add cmip6 rechunking example workflow #75

add cmip6 rechunking example workflow #75

dgergel commented Feb 10, 2021

review-notebook-app bot commented Feb 10, 2021

brews commented Feb 10, 2021

dgergel commented Feb 11, 2021

brews commented Feb 11, 2021

dgergel commented Feb 12, 2021 •

edited

Loading

add cmip6 rechunking example workflow #75

add cmip6 rechunking example workflow #75

Conversation

dgergel commented Feb 10, 2021

review-notebook-app bot commented Feb 10, 2021

brews commented Feb 10, 2021

dgergel commented Feb 11, 2021

brews commented Feb 11, 2021

dgergel commented Feb 12, 2021 • edited Loading

dgergel commented Feb 12, 2021 •

edited

Loading