New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue warning when map_blocks() function with axis arguments conflicts with know dask array chunk structure #6810
Comments
Thanks for the detailed write-up! Since this seems like a relatively common scenario, I think it's a great opportunity to raise a warning in advance. Especially since the code doesn't throw a warning, but just gives an unexpected result. I am trying to think about whether this is a general problem with the Please feel free to open a pull request! |
Hey @jsignell, It looks like this issue may be general. I only tried one more function, np.sum, but found a similar result. Example below: Create array whose chunk structure happens to run along columns (each column part of a different chunk):
Undesired output when summing across columns, as each column is treated as independent due to chunk structure:
When summing across rows we get the expected output:
Output:
Rechunking so that the chunk size aligns with the axis we are trying to reduce over:
Results in expected output:
I am happy to open a PR. Please let me know if you have any more thoughts. Thanks! |
Yeah that makes sense. It seems reasonable to me to raise a warning for these cases. Go for it! Just ping me if you run into issues. |
I think this is good to close since in the linked PR, it was decided this warning may not be needed. |
Thanks for following up here @Madhu94 |
Hello, I recently ran into this issue and wanted to suggest issuing a warning when mapping a function onto a dask array when the mapped function arguments could yield unexpected/undesirable behavior in relation to the known chunk structure of the array. I provide an example below.
Minimal example:
I want to horizontally stack multiple 1-d dask arrays and argsort them along their columns.
Unexpected/undesired output:
This code results in unexpected and undesirable output. The hstacked array remains chunked along the columns, causing the mapped argsort along axis=1 to return all zeros:
I resolved the issue by rechunking the stacked array so that each row was part of the same chunk:
Desired output:
Suggestion:
To help prevent unexpected and undesirable results, it may be worth alerting the user if the arguments to their mapping function (axis=1 in this case) conflict with the known chunk structure of their array. What do you think?
The text was updated successfully, but these errors were encountered: