Investigate using da.pad in da.overlap #3641

jakirkham · 2018-06-19T18:57:09Z

The da.overlap module was written to handle map_overlap and all of its intricacies. It provides some support for adding padding to edge chunks. Though it only supports a few modes of padding, but not nearly as many as da.pad supports. Given that da.pad now exists, it makes sense to just use this for performing padding in da.overlap to provide these same options to map_overlap. Also should help simplify some of the code in da.overlap.

The text was updated successfully, but these errors were encountered:

jakirkham · 2018-06-19T19:00:26Z

As a side note, map_overlap allows handling different boundaries with different modes of padding (e.g. reflection on one axis and constant on another). Not aware of any use cases where this is needed. We may consider deprecating this functionality to make this transition simpler.

jakirkham · 2018-06-26T18:41:51Z

Any thoughts on this, @mrocklin?

mrocklin · 2018-06-26T18:53:11Z

I suspect that the modes used were built to support scikit-image, but I don't recall exactly.

jakirkham · 2018-06-26T20:24:19Z

Looking at da.map_overlap's boundaries docs, these all seem to be supported by da.pad. So it should be possible to call da.pad in all of these cases. Plus the custom user function case can be handled as well. Though agree it would be good to check.

jakirkham · 2018-06-26T20:28:15Z

@emmanuelle @jni, am thinking about using da.pad to handle boundary conditions within da.map_overlap. Should provide greater flexibility to the user. Would this be of interest for things like apply_parallel? Are there other issues or concerns that you can think of, which would need to be addressed?

jni · 2018-06-27T01:53:20Z

da.pad only applies to external boundaries, not the internal ones, right? So is this about not having to pad a NumPy array ahead of time?

jakirkham · 2018-06-27T04:41:19Z

da.pad only applies to external boundaries, not the internal ones, right?

Correct.

So is this about not having to pad a NumPy array ahead of time?

Sort of.

da.map_overlap provides some padding options, but they are limited. da.pad provides substantially more options (including those supported by map_overlap currently). So this is about finding the best way to make use of da.pad with da.map_overlap to provide these additional options to the user.

jni · 2018-06-27T05:23:58Z

Oh. Why does da.map_overlap not call out to da.pad?

jakirkham · 2018-06-27T05:49:59Z

Indeed, agree it should. Mainly because da.pad only recently got added. :)

We should also think about how we can pass arguments through da.map_overlap to da.pad.

jakirkham · 2018-09-13T19:19:17Z

As you have been looking into related issues, @hmaarrfk, would be interested to hear your thoughts on this. :)

hmaarrfk · 2018-09-13T20:34:01Z

so pad adds fancy ramps on the outside of the array. I don't see any problem with that.

The issue with map_overlap was that it was using concatenate to add virtual boundaries that would then get removed when the boundary was none. For a 2D array, this would result in 3 extra memory copies (vertical boundary, horizontal boundary, corner boundary).

I don't know how pad is implemented but whatever it does, it should ensure that a single memory copy is necessary no matter the dimensionality of the originating array. There might even be more ways to optimize higher dimentionality arrays.

FYI: a dask is only interested in large arrays, I figured I would cross ref this post.

Some speedup can be expected in numpy 1.15.2 if the future PR mentionned at the bottom of this issue gets merged in: numpy/numpy#11919
Basically, copying large arrays linux was limited by page faulting

hmaarrfk · 2018-09-14T14:43:44Z

Here are a few interesting benchmarks. We consider the case where we want to pad an chunked array.
If we want to have a contiguous array in memory, we must incur the cost of at least 1 memory copy. Numpy gives us an easy way to express this idea:

i7 7700 HQ, Dual channel DDR4 2400MHz

import numpy as np
from dask import array as da

shape = (50, 1024, 1024)
chunks = (50, shape[1]//2, shape[2]//2)
dtype = 'uint8'
padding = 10

padded_chunks = (chunks[0] + padding*2, chunks[1] + padding * 2, chunks[2] + padding*2)
d1 = da.ones(shape=shape, chunks=chunks, dtype=dtype).persist()
n1 = np.ones(shape=shape, dtype=dtype)

%%timeit
constant_value = 0
n00 = np.empty(shape=padded_chunks, dtype=d_padded.dtype)
n01 = np.empty(shape=padded_chunks, dtype=d_padded.dtype)
n10 = np.empty(shape=padded_chunks, dtype=d_padded.dtype)
n11 = np.empty(shape=padded_chunks, dtype=d_padded.dtype)

n00[padding:-padding, padding:-padding, padding:-padding] = n1[:, :n1.shape[1]//2, :n1.shape[2]//2]
n11[padding:-padding, padding:-padding, padding:-padding] = n1[:, n1.shape[1]//2:, n1.shape[2]//2:]
n01[padding:-padding, padding:-padding, padding:-padding] = n1[:, n1.shape[1]//2:, :n1.shape[2]//2]
n01[padding:-padding, padding:-padding, padding:-padding] = n1[:, :n1.shape[1]//2, n1.shape[2]//2:]


for this_n in [n00, n01, n10, n11]:
    this_n[:padding, ...] = 0
    this_n[-padding:, ...] = 0
    this_n[:, :padding, ...] = 0
    this_n[:, -padding:, ...] = 0
    this_n[:, :, :padding, ...] = 0
    this_n[:, :, -padding:, ...] = 0

12.1 ms ± 59.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Dask doesn't support array assignments. Which means we have to use the other primitives that are provided. Note that d1 is persisted, therefore we don't count the creation of the ones array. Also please note that creating zeros isn't the same as initializing the memory to zero. The OS might trick you and so you would still incur page faults.

d_padded = da.pad(d1, padding, mode='constant', constant_values=0)
%timeit _ = d_padded.persist()

43.7 ms ± 452 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

d_overlaped = da.overlap.overlap(d1, depth=padding, boundary='nearest')
%timeit _ = d_overlaped.persist()

98.9 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

I've seen dask take 30 ms to start up in an other benchmark. So i'm not sure what the culprit is.

I tried to write code that would effectively implement the numpy logic I wrote above in dask. Unfortunately, it was difficult without being able to assign to slices of dask arrays. Is this a feature that will be possible in the near future?

For tasks that finish in seconds, this kind of overhead is REALLY high. Is there a benchmarking suite for dask? It would be interesting to add these to it.

mrocklin · 2018-09-14T14:47:27Z

Unfortunately, it was difficult without being able to assign to slices of dask arrays. Is this a feature that will be possible in the near future?

It's certainly in-scope, but it would be challenging. Implementing numpy-style slicing on chunked datasets is hard and requires some attention to detail. We did this work for __getitem__. The same work could be done for __setitem__, it just requires someone who has the focus and time to get it done. https://github.com/dask/dask/blob/master/dask/array/slicing.py

mrocklin · 2018-09-14T14:48:13Z

Is there a benchmarking suite for dask? It would be interesting to add these to it.

There is, though it's not well maintained. No reason not to start now though :)

https://github.com/dask/dask-benchmarks

hmaarrfk · 2018-09-14T15:15:25Z

WRT slicing, I think one time you mentioned that dask arrays were immutable.

I think that mutable assignments might make a whole copy of the array? kinda scary. That would require much though.

I think you could do it without that ability, just expressing things might be different.

mrocklin · 2018-09-14T15:18:07Z

I think one time you mentioned that dask arrays were immutable.

this is not currently the case

I think that mutable assignments might make a whole copy of the array?

What does a "whole copy of the array" mean in this case? A dask array is a task graph that constructs an array. There isn't necessarily any data actively in memory when you copy it. It's just a plan.

jakirkham · 2018-09-14T15:19:22Z

so pad adds fancy ramps on the outside of the array.

In some cases, yes. In other cases it may alter the original edge chunks. For example look at pad_edge. It really depends on the kind of padding chosen.

I don't know how pad is implemented but whatever it does, it should ensure that a single memory copy is necessary no matter the dimensionality of the originating array.

For the cases where it alters an original edge chunk, it just calls numpy.pad with appropriate arguments. Other cases where the original edge chunks are not altered we perform some sort of global operation and attach the results onto the edges and corners of the array. So the original edge chunks are not modified for the padded array in this case. In all cases, interior chunks are untouched.

hmaarrfk · 2018-09-14T15:33:02Z

@jakirkham I didn't know about then numpy pad function. According to this benchmark it should basically not be used. a 4x slowdown!

%%timeit
n00 = np.pad(n1[:, :chunks[1], :chunks[2]], pad_width=padding, mode='constant', constant_values=0)
n11 = np.pad(n1[:, chunks[1]:, chunks[2]:], pad_width=padding, mode='constant', constant_values=0)
n01 = np.pad(n1[:, chunks[1]:, :chunks[2]], pad_width=padding, mode='constant', constant_values=0)
n01 = np.pad(n1[:, :chunks[1], chunks[2]:], pad_width=padding, mode='constant', constant_values=0)

68 ms ± 787 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

FYI, I may have installed that hugepages patch. so your benchmarks may differ.

The challenge that I think dask users will see (well I definitely saw) is that every operation becomes an out of cache operation. As such, repeated memory copies are unacceptable since you will always hit memory bandwidth limits.

hmaarrfk · 2018-09-14T15:42:23Z

Effectively, for a 3D array, np.concatenate is called 3*ndims to pad an array!!

https://github.com/numpy/numpy/blob/v1.15.1/numpy/lib/arraypad.py#L97

I'm worried that I might be missing something. The comment is very encouraging so I'm not sure about the exact benchmark

This commit changed implementation, from something that looks like the fast, I showed to a new concatenation algorithm
numpy/numpy@246c06d#diff-4fcddc9140fbe901655b3e1bd484205cR1331

jakirkham · 2018-09-14T15:48:33Z

Think I'm missing some things in your benchmark. What are n1, chunks, and padding in this case? Also what are we comparing to?

hmaarrfk · 2018-09-14T15:50:26Z

You probably need code from #3641 (comment)
and i'm comparing to my naive implementation that makes use of np.empty and slicing.

jakirkham · 2018-09-14T15:58:08Z

Need to think about this more, but comparing da.pad with a constant to da.overlap.constant is probably a more appropriate comparison for determining whether da.pad should be used in map_overlap.

If we think numpy.pad's performance is poor, would be good to address that upstream.

hmaarrfk · 2018-09-14T16:57:16Z

Here is the numpy issue that looks at this:

numpy/numpy#11126 (comment)

I've provided my benchmark there that illustrates the problem in a much more succinct way for np.pad

I would say that benchmarks need to be used to motivate whatever choice is made. After all, dask is all about large arrays.

I would be hesitant to add all the numpy options because it would be "irreversable". Instead, you can call out to the pad function, but keep the overlap functions minimal that way you can roll back in 6 months if np.pad improves in performance (whch it might since I might be motivated to do it depending on how much time I have).
Improve the documentation on the dask.overlapped module to teach people how to break down their operations that way they can choose the implementation that is best for their use-case.

hmaarrfk · 2018-09-15T16:02:05Z

So here is a PR that should make numpy.pad much faster especially for dask users.

numpy/numpy#11358

Benchmarking dask with this PR included might be super important if we do.

jakirkham · 2018-09-15T18:35:21Z

Thanks for working with upstream on this @hmaarrfk. It’s greatly appreciated 🙂

jakirkham · 2018-10-30T02:49:53Z

Given it's taking a bit to resolve the numpy.pad issue, took a look at da.pad again to see if we could eliminate numpy.pad in the cases it was used. Think PR ( #4152 ) addresses this concern. Also has the benefit of making the padding behavior a bit more consistent between padding modes.

jakirkham · 2019-06-20T19:26:46Z

If we are comfortable with a breaking change here, we could make another 1.x release, which warns users that boundaries will not be handled by da.map_overlap going forward and users should call da.pad before calling da.map_overlap to get this functionality. As part of 2.0, we could then remove these options.

Edit: Also worth pointing to issue ( #4252 ), which suggests this.

jakirkham added the array label Jun 19, 2018

jakirkham changed the title ~~Investigate using da.pad in da.ghost~~ Investigate using da.pad in da.overlap Nov 27, 2018

jakirkham mentioned this issue Nov 27, 2018

Simplifying da.map_overlap's boundary padding #4252

Open

jakirkham mentioned this issue Jun 20, 2019

Release Version 2.0 #4973

Closed

TomAugspurger added this to In progress in Core maintenance Jun 25, 2019

jakirkham self-assigned this Jun 25, 2019

jakirkham mentioned this issue Jul 2, 2019

WIP: Use pad to add boundary conditions #5052

Closed

2 tasks

mrocklin moved this from In progress to Backlog in Core maintenance Aug 27, 2019

jakirkham mentioned this issue Nov 17, 2021

Change default boundary value to "none" for dask map_overlap #8391

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate using da.pad in da.overlap #3641

Investigate using da.pad in da.overlap #3641

jakirkham commented Jun 19, 2018 •

edited

jakirkham commented Jun 19, 2018

jakirkham commented Jun 26, 2018

mrocklin commented Jun 26, 2018

jakirkham commented Jun 26, 2018

jakirkham commented Jun 26, 2018

jni commented Jun 27, 2018

jakirkham commented Jun 27, 2018

jni commented Jun 27, 2018

jakirkham commented Jun 27, 2018

jakirkham commented Sep 13, 2018

hmaarrfk commented Sep 13, 2018

hmaarrfk commented Sep 14, 2018

mrocklin commented Sep 14, 2018

mrocklin commented Sep 14, 2018

hmaarrfk commented Sep 14, 2018

mrocklin commented Sep 14, 2018

jakirkham commented Sep 14, 2018

hmaarrfk commented Sep 14, 2018

hmaarrfk commented Sep 14, 2018

jakirkham commented Sep 14, 2018

hmaarrfk commented Sep 14, 2018

jakirkham commented Sep 14, 2018

hmaarrfk commented Sep 14, 2018

hmaarrfk commented Sep 15, 2018

jakirkham commented Sep 15, 2018

jakirkham commented Oct 30, 2018

jakirkham commented Jun 20, 2019 •

edited

Investigate using da.pad in da.overlap #3641

Investigate using da.pad in da.overlap #3641

Comments

jakirkham commented Jun 19, 2018 • edited

jakirkham commented Jun 19, 2018

jakirkham commented Jun 26, 2018

mrocklin commented Jun 26, 2018

jakirkham commented Jun 26, 2018

jakirkham commented Jun 26, 2018

jni commented Jun 27, 2018

jakirkham commented Jun 27, 2018

jni commented Jun 27, 2018

jakirkham commented Jun 27, 2018

jakirkham commented Sep 13, 2018

hmaarrfk commented Sep 13, 2018

hmaarrfk commented Sep 14, 2018

mrocklin commented Sep 14, 2018

mrocklin commented Sep 14, 2018

hmaarrfk commented Sep 14, 2018

mrocklin commented Sep 14, 2018

jakirkham commented Sep 14, 2018

hmaarrfk commented Sep 14, 2018

hmaarrfk commented Sep 14, 2018

jakirkham commented Sep 14, 2018

hmaarrfk commented Sep 14, 2018

jakirkham commented Sep 14, 2018

hmaarrfk commented Sep 14, 2018

hmaarrfk commented Sep 15, 2018

jakirkham commented Sep 15, 2018

jakirkham commented Oct 30, 2018

jakirkham commented Jun 20, 2019 • edited

jakirkham commented Jun 19, 2018 •

edited

jakirkham commented Jun 20, 2019 •

edited