-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid rechunking in reshape with chunksize=1 #6748
Avoid rechunking in reshape with chunksize=1 #6748
Conversation
When the slow-moving (early) axes in `.reshape` are all size 1, then we can avoid an intermediate rechunk which could cause memory issues. ``` 00 01 | 02 03 # a[0, :, :] ----- | ----- 04 05 | 06 07 08 09 | 10 11 ============= 12 13 | 14 15 # a[1, :, :] ----- | ----- 16 17 | 18 19 20 21 | 22 23 -> (3, 4) 00 01 | 02 03 ----- | ----- 04 05 | 06 07 08 09 | 10 11 ----- | ----- 12 13 | 14 15 ----- | ----- 16 17 | 18 19 20 21 | 22 23 ``` xref dask#5544, specifically the examples given in dask#5544 (comment).
@jrbourbeau or @jakirkham do either of you have a chance to glance over the changes here? This code is pretty tricky, and the diff isn't super-informative... Hopefully the new tests cases illustrate things. I'll have a followup pull request that more fully addresses #5544 (comment) and will include user-documentation, since it will require a new keyword. |
In case it helps, I've pushed #6753 with the docs that explain the overall tradeoff between chunking and merging. That said, the optimization in this PR doesn't actually hit that tradeoff, since the input chunking in this special case is already ideal for the no-communication / no-merge implementation. But hopefully the docs shown in #6753 give a sense for the problem. |
I haven't reviewed this seriously, but the change here seems finely-scoped enough, and @TomAugspurger seems comfident enough that I'm happy to go ahead with this. @TomAugspurger if you feel like self-merging then please go ahead. |
I added a test one more test (reshaping a 3-d array to 1d). Planning to merge once that passes. |
When the slow-moving (early) axes in
.reshape
are all size 1, then wecan avoid an intermediate rechunk which could cause memory issues.
This is demonstrated in reshape a
(2, 3, 4)
array ->(6, 4)
The "ideal" (zero communication) chunking is given by:
Previously, that merged the intermediates and had the result chunks
Now we have the result chunks
This doesn't remove the need for something like #6272, since it only handles the special case of the low axes having
chunksize=1
. But we also don't need a keyword for this since this implementation should be strictly better than the old one for this special case. There's no tradeoff between number of tasks and data movement when the input is already fully chunked along the low axes..xref #5544, specifically the examples
given in #5544 (comment).