Avoid rechunking in reshape with chunksize=1 #6748

TomAugspurger · 2020-10-19T18:10:13Z

When the slow-moving (early) axes in .reshape are all size 1, then we
can avoid an intermediate rechunk which could cause memory issues.

This is demonstrated in reshape a (2, 3, 4) array -> (6, 4)

In [3]: a = da.from_array(np.arange(24).reshape(2, 3, 4), chunks=((1, 1), (1, 2), (2, 2)))
   ...: a.reshape(6, 4)

The "ideal" (zero communication) chunking is given by:

00 01 | 02 03   # a[0, :, :]
----- | -----
04 05 | 06 07
08 09 | 10 11

=============

12 13 | 14 15   # a[1, :, :]
----- | -----
16 17 | 18 19
20 21 | 22 23

-> (6, 4)

00 01 | 02 03
----- | -----
04 05 | 06 07
08 09 | 10 11
----- | -----
12 13 | 14 15
----- | -----
16 17 | 18 19
20 21 | 22 23

Previously, that merged the intermediates and had the result chunks

Out[2]: ((3, 3), (2, 2))

Now we have the result chunks

Out[2]: ((1, 2, 1, 2), (2, 2))

This doesn't remove the need for something like #6272, since it only handles the special case of the low axes having chunksize=1. But we also don't need a keyword for this since this implementation should be strictly better than the old one for this special case. There's no tradeoff between number of tasks and data movement when the input is already fully chunked along the low axes..

xref #5544, specifically the examples
given in #5544 (comment).

When the slow-moving (early) axes in `.reshape` are all size 1, then we can avoid an intermediate rechunk which could cause memory issues. ``` 00 01 | 02 03 # a[0, :, :] ----- | ----- 04 05 | 06 07 08 09 | 10 11 ============= 12 13 | 14 15 # a[1, :, :] ----- | ----- 16 17 | 18 19 20 21 | 22 23 -> (3, 4) 00 01 | 02 03 ----- | ----- 04 05 | 06 07 08 09 | 10 11 ----- | ----- 12 13 | 14 15 ----- | ----- 16 17 | 18 19 20 21 | 22 23 ``` xref dask#5544, specifically the examples given in dask#5544 (comment).

TomAugspurger · 2020-10-20T01:14:56Z

@jrbourbeau or @jakirkham do either of you have a chance to glance over the changes here? This code is pretty tricky, and the diff isn't super-informative... Hopefully the new tests cases illustrate things.

I'll have a followup pull request that more fully addresses #5544 (comment) and will include user-documentation, since it will require a new keyword.

TomAugspurger · 2020-10-20T13:57:34Z

In case it helps, I've pushed #6753 with the docs that explain the overall tradeoff between chunking and merging.

That said, the optimization in this PR doesn't actually hit that tradeoff, since the input chunking in this special case is already ideal for the no-communication / no-merge implementation. But hopefully the docs shown in #6753 give a sense for the problem.

mrocklin · 2020-10-26T16:22:09Z

I haven't reviewed this seriously, but the change here seems finely-scoped enough, and @TomAugspurger seems comfident enough that I'm happy to go ahead with this. @TomAugspurger if you feel like self-merging then please go ahead.

…pecial

TomAugspurger · 2020-10-29T19:41:05Z

I added a test one more test (reshaping a 3-d array to 1d). Planning to merge once that passes.

TomAugspurger added the array label Oct 19, 2020

TomAugspurger mentioned this pull request Oct 19, 2020

chunks get combined in 4d array reshape #5544

Closed

TomAugspurger added 2 commits October 19, 2020 16:23

fix conditioni

04ba4c5

remove breakpoint comment

70efe82

TomAugspurger mentioned this pull request Oct 20, 2020

Add option to control rechunking in reshape #6753

Merged

TomAugspurger mentioned this pull request Oct 27, 2020

Expose rechunking in reshape #6596

Closed

TomAugspurger added 2 commits October 29, 2020 14:04

Added 3d->1d test

4d8e2ed

Merge remote-tracking branch 'upstream/master' into reshape-rechunk-s…

7d90974

…pecial

TomAugspurger merged commit c4038ad into dask:master Oct 30, 2020

TomAugspurger deleted the reshape-rechunk-special branch October 30, 2020 14:01

This was referenced Feb 3, 2021

Fix PCA tests running on Dask 2020.12.0 sgkit-dev/sgkit#430

Merged

Dask reshape bug for arrays with fully chunked leading axes sgkit-dev/sgkit#456

Open

Dask array reshape bug for arrays with fully chunked leading axes #7171

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid rechunking in reshape with chunksize=1 #6748

Avoid rechunking in reshape with chunksize=1 #6748

TomAugspurger commented Oct 19, 2020 •

edited

Loading

TomAugspurger commented Oct 20, 2020

TomAugspurger commented Oct 20, 2020

mrocklin commented Oct 26, 2020

TomAugspurger commented Oct 29, 2020

Avoid rechunking in reshape with chunksize=1 #6748

Avoid rechunking in reshape with chunksize=1 #6748

Conversation

TomAugspurger commented Oct 19, 2020 • edited Loading

TomAugspurger commented Oct 20, 2020

TomAugspurger commented Oct 20, 2020

mrocklin commented Oct 26, 2020

TomAugspurger commented Oct 29, 2020

TomAugspurger commented Oct 19, 2020 •

edited

Loading