-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the implementation of dask.array.block #3987
Comments
Improvements are always welcome. Two questions:
|
I just wanted to point somebody to a |
This might also be helpful in replacing the current |
Here is some more concrete information about the challenge: The current implementation adds 24 graph operations for a a 3D blocking operation with 8 sub arrays.
I'm not too sure how to handle unknown dimensions. That is really a unique (and quite cool feature) of dask. I'll try to followup with a real benchmark soon and look into concatenate3 |
Any update on this @hmaarrfk ? |
I started with a benchmark. It shows basically that it suffers a little bit from overhead. But not from copying data. |
FWIW a lot of the code for |
I've been working on getting a numpy implementation without The secondary challenge was that the existing benchmarks were hard to interpret numpy/numpy#12001 and a bug in the existing implementation made one benchmark made my proposed implementation look so much worse than the existing one numpy/numpy#11979 Ultimately, different algorithms are applicable for different datasets. For blocking small arrays (512x512 final size), it actually turned out that Dask also offers alot of optimizations like this notion of squashing linear operations. I still need to experiment with it, but initial benchmarks dask/dask-benchmarks#18 show that there isn't much more than 10% to be gained. |
Checking in here. Any update on this @hmaarrfk ? Is there anything that you are blocked on here on the dask side? |
No, more I'm waiting for the numpy PR to go through. A few peculiarities have come up. |
OK, just checking in. Thanks for the update. |
Yeah, so while the dask operation is built on multiple dask implementations
of concatenate, those aren't actually costly as data isn't copied until the
very end.
Maybe graph operations can be reduced, making the graph cleaner, but it
isn't something I am enthousiastic about because that numpy PR took so long
to get through, probably in large part due to my novice development style.
I am secretly hoping now that numpy is expanding their high level
interface, that maybe we can expose the internal algorithm of block and use
that for dask. Dask would depend on numpy 1.16 or 1.17
I'm not sure what would happen when you block two dask arrays with chunks
that don't line up. An operation that is probably allowed now.
…On Mon, Oct 22, 2018 at 5:12 PM Matthew Rocklin ***@***.***> wrote:
OK, just checking in. Thanks for the update.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3987 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAFfmMnFsj5OHyn1oZegBF6catxuy0dfks5unjTFgaJpZM4WrZoo>
.
|
dask/dask/array/core.py
Line 2899 in 14b4c38
I recently rewrote what will likely be numpy's new implementation of
block
. @eric-wieser suggested an enhancement which should make it really trivial for dask to piggy back off the results and use it in it's implementation.numpy/numpy#11971 (comment)
The first time I tried to use dask.block, it was really slow, so I had to write my own. I think this might help it since it avoid calls to concatenate all together.
Not sure about unknown dimensions.
The text was updated successfully, but these errors were encountered: