You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Runtime 36 seconds - ok, it's a fairly small sample, but that seems like huge overhead for the extra threads we're adding... Lets try with a custom parallel function that tears the array apart, runs parallel tasks then puts it back together again to rule that out.
Apply won't be any faster with threads because it uses Python for loops.
Not sure what is going on with the processes situation. I would run this on the local distributed scheduler and watch the dashboard to get a sense of what is going on.
So this is in the category "I'm not sure I should open a bug", as it might be me being new to dask and not understanding why this operation is slow.
See https://gist.github.com/andaag/207fcdc6965b86b7085406221279e4c2
With pandas:
Runtime 22 seconds.
With dask:
Runtime 36 seconds - ok, it's a fairly small sample, but that seems like huge overhead for the extra threads we're adding... Lets try with a custom parallel function that tears the array apart, runs parallel tasks then puts it back together again to rule that out.
With custom parallel function:
Runtime 4.7 seconds. What's going on here?
If dask is slow in this case due to the overhead of parallelization my custom function should not be faster.
This experiment is run inside of https://hub.docker.com/r/andaag/aibox_cuda9, which is built from https://github.com/andaag/aibox. (Which is huge due to deep learning libraries and cuda.. sorry)
The text was updated successfully, but these errors were encountered: