New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Putting lazy collections into Dask Arrays #5879
Comments
There is nothing in Dask itself, but you might be able to add this to serialization methods like the pickle protocol or Dask's custom serialization methods.
You can have workers that have only a single thread, but there is no way to temporarily claim the entire worker. One could maybe do something fancy with locks and semaphores, but there's nothing like that today and it might take some time to gather enough use cases to make a good design here. |
For other situration that already have parallelism (like BLAS) we tend to turn off parallelism in the user's library. It tends to be decently efficient most of the time to just let Dask be the only thing that runs in parallel. |
This is a nice idea, and probably close to what I want either way.
I'll have to test both and see the performance difference in each case, but since I'm in the exploratory phase at this moment nothing can be said. 😄 |
Usually parallelizing within an algorithm is harder than running it many times in parallel. If you're in a situation where you can use Dask to run things manyt imes, it's probably not a bad idea to use Dask. That at least has been the experience so far. |
I'll keep that in mind. 😄 |
One aspect of this I haven't yet mentioned (and the reason I'm reopening) is: Is there a way to call |
Perhaps post-compute and post-persist?
https://docs.dask.org/en/latest/custom-collections.html#__dask_postcompute__
…On Mon, Feb 10, 2020 at 3:10 AM Hameer Abbasi ***@***.***> wrote:
One aspect of this I haven't yet mentioned (and the reason I'm reopening)
is: Is there a way to call .compute() on my collection before Dask
assembles the results, and not just when shuffling nodes?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5879?email_source=notifications&email_token=AACKZTBPDVUKGTGILYYV7S3RCEY3ZA5CNFSM4KSAEY52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELIDW3I#issuecomment-584072045>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTFH23X6B6WIJTFHPRLRCEY3ZANCNFSM4KSAEY5Q>
.
|
Hello. I was wondering if there was a mechanism for putting lazy arrays into Dask Arrays. I was looking into doing whole-program optimization for PyData/Sparse, which turns out to be hugely better than local optimisation. See https://github.com/tensor-compiler/taco and the related papers/talks for details.
However, when/before transmitting results, I'd like to "signal" to Dask that it should do Dask's equivalent of
.compute()
, before transmitting results to another node.I was wondering if I could "turn off" parallelism on one node, as there will be parallelism inherent to the algorithm itself.
The text was updated successfully, but these errors were encountered: