-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate using da.pad in da.overlap #3641
Comments
As a side note, |
Any thoughts on this, @mrocklin? |
I suspect that the modes used were built to support scikit-image, but I don't recall exactly. |
Looking at |
@emmanuelle @jni, am thinking about using |
|
Correct.
Sort of.
|
Oh. Why does |
Indeed, agree it should. Mainly because We should also think about how we can pass arguments through |
As you have been looking into related issues, @hmaarrfk, would be interested to hear your thoughts on this. :) |
so The issue with I don't know how pad is implemented but whatever it does, it should ensure that a single memory copy is necessary no matter the dimensionality of the originating array. There might even be more ways to optimize higher dimentionality arrays. FYI: a dask is only interested in large arrays, I figured I would cross ref this post. Some speedup can be expected in numpy 1.15.2 if the future PR mentionned at the bottom of this issue gets merged in: numpy/numpy#11919 |
Here are a few interesting benchmarks. We consider the case where we want to pad an chunked array. i7 7700 HQ, Dual channel DDR4 2400MHz import numpy as np
from dask import array as da
shape = (50, 1024, 1024)
chunks = (50, shape[1]//2, shape[2]//2)
dtype = 'uint8'
padding = 10
padded_chunks = (chunks[0] + padding*2, chunks[1] + padding * 2, chunks[2] + padding*2)
d1 = da.ones(shape=shape, chunks=chunks, dtype=dtype).persist()
n1 = np.ones(shape=shape, dtype=dtype) %%timeit
constant_value = 0
n00 = np.empty(shape=padded_chunks, dtype=d_padded.dtype)
n01 = np.empty(shape=padded_chunks, dtype=d_padded.dtype)
n10 = np.empty(shape=padded_chunks, dtype=d_padded.dtype)
n11 = np.empty(shape=padded_chunks, dtype=d_padded.dtype)
n00[padding:-padding, padding:-padding, padding:-padding] = n1[:, :n1.shape[1]//2, :n1.shape[2]//2]
n11[padding:-padding, padding:-padding, padding:-padding] = n1[:, n1.shape[1]//2:, n1.shape[2]//2:]
n01[padding:-padding, padding:-padding, padding:-padding] = n1[:, n1.shape[1]//2:, :n1.shape[2]//2]
n01[padding:-padding, padding:-padding, padding:-padding] = n1[:, :n1.shape[1]//2, n1.shape[2]//2:]
for this_n in [n00, n01, n10, n11]:
this_n[:padding, ...] = 0
this_n[-padding:, ...] = 0
this_n[:, :padding, ...] = 0
this_n[:, -padding:, ...] = 0
this_n[:, :, :padding, ...] = 0
this_n[:, :, -padding:, ...] = 0
Dask doesn't support array assignments. Which means we have to use the other primitives that are provided. Note that d_padded = da.pad(d1, padding, mode='constant', constant_values=0)
%timeit _ = d_padded.persist()
I've seen dask take 30 ms to start up in an other benchmark. So i'm not sure what the culprit is. I tried to write code that would effectively implement the numpy logic I wrote above in dask. Unfortunately, it was difficult without being able to assign to slices of dask arrays. Is this a feature that will be possible in the near future? For tasks that finish in seconds, this kind of overhead is REALLY high. Is there a benchmarking suite for dask? It would be interesting to add these to it. |
It's certainly in-scope, but it would be challenging. Implementing numpy-style slicing on chunked datasets is hard and requires some attention to detail. We did this work for |
There is, though it's not well maintained. No reason not to start now though :) |
WRT slicing, I think one time you mentioned that dask arrays were immutable. I think that mutable assignments might make a whole copy of the array? kinda scary. That would require much though. I think you could do it without that ability, just expressing things might be different. |
this is not currently the case
What does a "whole copy of the array" mean in this case? A dask array is a task graph that constructs an array. There isn't necessarily any data actively in memory when you copy it. It's just a plan. |
In some cases, yes. In other cases it may alter the original edge chunks. For example look at
For the cases where it alters an original edge chunk, it just calls |
@jakirkham I didn't know about then numpy pad function. According to this benchmark it should basically not be used. a 4x slowdown! %%timeit
n00 = np.pad(n1[:, :chunks[1], :chunks[2]], pad_width=padding, mode='constant', constant_values=0)
n11 = np.pad(n1[:, chunks[1]:, chunks[2]:], pad_width=padding, mode='constant', constant_values=0)
n01 = np.pad(n1[:, chunks[1]:, :chunks[2]], pad_width=padding, mode='constant', constant_values=0)
n01 = np.pad(n1[:, :chunks[1], chunks[2]:], pad_width=padding, mode='constant', constant_values=0)
FYI, I may have installed that hugepages patch. so your benchmarks may differ. The challenge that I think dask users will see (well I definitely saw) is that every operation becomes an out of cache operation. As such, repeated memory copies are unacceptable since you will always hit memory bandwidth limits. |
Effectively, for a 3D array, https://github.com/numpy/numpy/blob/v1.15.1/numpy/lib/arraypad.py#L97 I'm worried that I might be missing something. The comment is very encouraging so I'm not sure about the exact benchmark This commit changed implementation, from something that looks like the |
Think I'm missing some things in your benchmark. What are |
You probably need code from #3641 (comment) |
Need to think about this more, but comparing If we think |
Here is the numpy issue that looks at this: I've provided my benchmark there that illustrates the problem in a much more succinct way for I would say that benchmarks need to be used to motivate whatever choice is made. After all, dask is all about large arrays.
|
So here is a PR that should make Benchmarking dask with this PR included might be super important if we do. |
Thanks for working with upstream on this @hmaarrfk. It’s greatly appreciated 🙂 |
Given it's taking a bit to resolve the |
If we are comfortable with a breaking change here, we could make another 1.x release, which warns users that boundaries will not be handled by Edit: Also worth pointing to issue ( #4252 ), which suggests this. |
The
da.overlap
module was written to handlemap_overlap
and all of its intricacies. It provides some support for adding padding to edge chunks. Though it only supports a few modes of padding, but not nearly as many asda.pad
supports. Given thatda.pad
now exists, it makes sense to just use this for performing padding inda.overlap
to provide these same options tomap_overlap
. Also should help simplify some of the code inda.overlap
.The text was updated successfully, but these errors were encountered: