Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing to zarr fails with message "specified zarr chunks would overlap multiple dask chunks" #347

Closed
TonioF opened this issue Oct 26, 2020 · 10 comments · Fixed by #419
Closed
Labels
bug Something isn't working

Comments

@TonioF
Copy link
Contributor

TonioF commented Oct 26, 2020

For some data it is necessary to rechunk it in order to deal with it efficiently. This rechunking process might result in that the final chunk of a dimension has a smaller size than the previous ones. E.g., if we have a dimension of 500, originally split into ten chunks of size 50, we can rechunk it to sizes of 200, 200, and 100. This is perfectly valid and supported by xarray.
However, when the affected dimension/variable is latitude and the latitude is ascending, the current implementation of the normalization in xcube will revert the order of chunks and cause that the smaller chunk will be at the start - which is not supported by xarray and might result in an error as the one in the description, for example when writing to zarr.

For the time being, I will work around this issue by ensuring that all chunks are of the same size after rechunking, which will make this issue a bit harder to reproduce. I open this issue to document that the current solution of dealing with ascending latitudes is not optimal. The best solution will probably be to get rid of the part where latitudes are reverted during the normalization and support ascending latitudes in the data.

See also #251 and #327.

@TonioF TonioF added the bug Something isn't working label Oct 26, 2020
@TonioF TonioF changed the title Writing to zarr fails with message specified zarr chunks would overlap multiple dask chunks Writing to zarr fails with message "specified zarr chunks would overlap multiple dask chunks" Oct 26, 2020
@AliceBalfanz
Copy link
Contributor

Hi Tonio, I am facing a similar issue with the same error message.

When executing the example jupyter NB examples/store/open_data_directory.ipynb
line
store.write_data(new_dataset, 'cube-1-250-250-subset.zarr', writer_id='dataset:zarr:posix')
results in

NotImplementedError: Specified zarr chunks encoding['chunks']=(1, 250, 250) for variable named 'quality_flags' would overlap multiple dask chunks ((1, 1, 1, 1, 1), (90, 90, 90, 90), (90, 90, 90, 90, 90, 90, 90, 90, 60)). This is not implemented in xarray yet. Consider either rechunking using `chunk()` or instead deleting or modifying `encoding['chunks']`.

xarray does not adjust the encoding of varialbes when rechunking a dataset.

new_dataset.c2rcc_flags.encoding
{'chunks': (1, 250, 250),
 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0),
 'filters': None,
 '_FillValue': nan,
 'dtype': dtype('float64')}

This problem is already known:

pydata/xarray#2300

Maybe we could implement in xcube, that if a dataset is rechunked, then the encoding is adjusted as well?

@TonioF
Copy link
Contributor Author

TonioF commented Nov 16, 2020

Maybe we could implement in xcube, that if a dataset is rechunked, then the encoding is adjusted as well?

We already did: https://github.com/dcs4cop/xcube/blob/bc4cdb4aa5e88557d920d71b7dff4100015c3512/xcube/core/chunk.py#L10 (Make sure you set the format when you use this, otherwise the encodings won't be updated).

However, I still couldn't run your notebook cell successfully, but that seems to be due to another problem.

@AliceBalfanz
Copy link
Contributor

Thanks for your suggestion, I changed the notebook to use xcube chunk_dataset instead of xarrays way to chunk a dataset. It now works like a dream :)

@forman forman added this to To do in Release 0.6 Nov 18, 2020
@forman forman moved this from To do to Deferred in Release 0.6 Nov 18, 2020
@rabernat
Copy link

rabernat commented Mar 3, 2021

This can and should be fixed in xarray. But it's also very easy to just delete the encoding.

del new_dataset.c2rcc_flags.encoding['chunks']

as a simple workaround.

@forman
Copy link
Member

forman commented Mar 5, 2021

@rabernat thanks! The xcube function chunk_dataset() is basically xarray.chunk() plus correction of "chunks" encoding in each variable.

@AliceBalfanz

Thanks for your suggestion, I changed the notebook to use xcube chunk_dataset instead of xarrays way to chunk a dataset.

We should make sure that any xr.Dataset returned from higher-level xcube functions should always have a chunking compatible with Zarr, as this is our standard I/O format. Users should not be forced to rechunk just for the purpose of writing to Zarr. This is counter-intuitive. I suggest we provide a utiliy function that ensures "valid" chunking (including possible deletion of the "chunks" encoding property, as suggested by @rabernat), ensure_valid_chunks(). Then we'll apply that to all datasets before we return them from xcube functions.

I guess there are good reasons why encoding is not adjusted in xarray.chunk().

@rabernat
Copy link

rabernat commented Mar 5, 2021

The xcube function chunk_dataset() is basically xarray.chunk() plus correction of "chunks" encoding in each variable.

🙌 Could I convince you to submit this as a PR to xarray itself? 😁

@forman
Copy link
Member

forman commented Mar 5, 2021

I'm convinced. Hope to find a little time next week. Maybe there is a related issue already?

@rabernat
Copy link

rabernat commented Mar 5, 2021

pydata/xarray#2300 is the main one.

forman added a commit that referenced this issue Mar 17, 2021
Workaround for some cases that end up in #347
@forman forman added this to Done in Release 0.7.1 Mar 17, 2021
@rabernat
Copy link

FYI I have started a PR to fix this upstream in Xarray. Your review there would be helpful.

@forman
Copy link
Member

forman commented Mar 23, 2021

Sure @rabernat. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

4 participants