Joblib scatter #1022
Joblib scatter #1022
Conversation
My benchmarks show this helps, but only a little bit. Have yet to find a scikit-learn task where the distributed backend is noticeably more performant than the existing threading/multiprocessing backends (except for grid-search, which is still beat by One issue is that internal scikit-learn code may transform the input data, so the objects that make it to joblib aren't the same as the input ones (e.g. the |
Are the |
I wouldn't be surprised if joblib was near full speed on a single machine. This might have more of an impact on a cluster where communication costs are larger. |
Different objects. Usually it's a small thing like a reshape or dtype change. |
What is the status here? Is this code helpful? |
It's useful, but still fails to beat single machine performance for anything except grid_search (which is still beat by |
Do you mean to say that if we have a cluster we should still prefer to use a single machine or that on a single machine one should use the threaded/multiprocessing joblib backends? If the former then did you get a sense for what the limiting factor was? |
The latter. The former depends on the work:serialization-cost ratio, and how many tasks there are. Grid-Search is a good use here, as fitting individual estimators is usually pretty expensive, and we're using the same data multiple times (which can be cached). |
Are there joblib-accelerated algorithms within sklearn that have a high work-to-serialization-cost ratio other than grid search? cc @ogrisel |
Ok, I ran a benchmark of fitting using a modified version of this scikit-learn benchmark (see here for the code). This fits a The threading backend was run on a single
As you can see here, train time for I tried running without the As such, I think this is a net win. I have a patch for scikit-learn that I haven't pushed yet to allow overriding the backend for fitting on these classes, which would avoid the need for the monkeypatch in the benchmark above. (cc @amueller, who recommended this benchmark a few months ago :)) |
To my naive understanding this seems pretty great? Hijacking joblib like this seems like the least intrusive way to accelerate scikit-learn with dask.distributed. It would be interesting to see both how well this scales and also what other common computations in scikit-learn might benefit. also cc'ing @GaelVaroquaux For the perspective of this PR I think it's clear that it provides an improvement. Working towards merging sounds good to me. |
That looks pretty good indeed. Maybe get some error bars? It looks like the speedup for RF is slightly more than 3x which is a bit strange ;) |
Yeah, not sure why previous numbers were less great. Might have grabbed the wrong commit when running them. |
MIN_IDEAL_BATCH_DURATION = 0.2 | ||
MAX_IDEAL_BATCH_DURATION = 1.0 | ||
MIN_IDEAL_BATCH_DURATION = 0.5 | ||
MAX_IDEAL_BATCH_DURATION = 5.0 |
mrocklin
Apr 26, 2017
Member
How did you come by these values?
How did you come by these values?
jcrist
Apr 26, 2017
Author
Member
Oop, leftover from debugging.
Oop, leftover from debugging.
self.client = Client(scheduler_host, loop=loop) | ||
if scatter is not None: | ||
# Keep a reference to the scattered data to keep the ids the same | ||
self._scatter = list(scatter) |
mrocklin
Apr 26, 2017
Member
What happens if someone gives us a single numpy array?
What happens if someone gives us a single numpy array?
jcrist
Apr 26, 2017
Author
Member
Then it's bad. We should only accept lists here because technically we can pre-scatter any object so it'd be hard to know if someone wanted us to scatter the object or elements inside it. Will fix to error if not list/tuple.
Then it's bad. We should only accept lists here because technically we can pre-scatter any object so it'd be hard to know if someone wanted us to scatter the object or elements inside it. Will fix to error if not list/tuple.
return Batch(tasks), args2 | ||
|
||
def apply_async(self, func, callback=None): | ||
key = '%s-%s' % (joblib_funcname(func), uuid4().hex) |
mrocklin
Apr 26, 2017
Member
Should this add "%s-batch-%s" or something to signify that this is more than one function call?
Should this add "%s-batch-%s" or something to signify that this is more than one function call?
jcrist
Apr 26, 2017
Author
Member
Sure.
Sure.
Was just monkey-patching in the benchmark. Scikit-learn hardcodes the backend into the
There might be a better way though. |
Allows to prescatter data to the cluster. For large arguments (big arrays, etc...) that are used in more than one task, this can be more efficient as it avoids serializing the data for every task. Also fix to ensure distributed backend is an instance of the `ParallelBackendBase` for both scikit-learn and joblib modules.
I think this should be good to go. |
sols = [func(*args, **kwargs) for func, args, kwargs in tasks] | ||
results = Parallel()(tasks) | ||
|
||
ba.terminate() |
mrocklin
Apr 27, 2017
Member
Why is this call necessary? Should this be called as part of __exit__
?
Why is this call necessary? Should this be called as part of __exit__
?
jcrist
Apr 27, 2017
•
Author
Member
Backends aren't contextmanagers, there is no __exit__
to the backend. The call was already there for the other tests (e.g. https://github.com/dask/distributed/blob/master/distributed/tests/test_joblib.py#L42), I moved it to use terminate
which is more standard to the other joblib backends.
Backends aren't contextmanagers, there is no __exit__
to the backend. The call was already there for the other tests (e.g. https://github.com/dask/distributed/blob/master/distributed/tests/test_joblib.py#L42), I moved it to use terminate
which is more standard to the other joblib backends.
@@ -45,42 +46,108 @@ def joblib_funcname(x): | |||
return funcname(x) | |||
|
|||
|
|||
class Batch(object): |
mrocklin
Apr 27, 2017
Member
Will users ever see this object? If so should it get a one-line docstring?
Will users ever see this object? If so should it get a one-line docstring?
jcrist
Apr 27, 2017
Author
Member
No, this is completely internal.
No, this is completely internal.
+1 from me |
Backends aren't contextmanagers, there is no __exit__ to the backend.
I'm not opposed to change this.
|
@jcrist can you raise a PR with the patch to sklearn? |
I could, but wanted to wait until a decision on how best to override scikit-learn estimators was reached (see relevant issue: scikit-learn/scikit-learn#8804). |
Allows to prescatter data to the cluster. For large arguments (big arrays, etc...) that are used in more than one task, this can be more efficient as it avoids serializing the data for every task.
Also fix to ensure distributed backend is an instance of the
ParallelBackendBase
for both scikit-learn and joblib modules.