Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joblib interface #124

Closed
mrocklin opened this issue Feb 10, 2016 · 4 comments
Closed

Joblib interface #124

mrocklin opened this issue Feb 10, 2016 · 4 comments

Comments

@mrocklin
Copy link
Member

Should we support a joblib interface alongside the concurrent.futures interface? This would be for simple embarrassingly parallel computation. It would require us to think about auto-batching long sequences of small inputs to ensure we tailored batch size to hit a nice frequency of output. This would add novel capability to existing joblib users in two ways:

  1. They could nest several functions that call out to joblib smoothly (this is a current pain point)
  2. They would get distributed computation

This would be a nice way to support existing codebases within libraries like scikit-learn.

@ogrisel
Copy link
Contributor

ogrisel commented Mar 29, 2016

Now that the backend refactoring of joblib has been merged in master I would be in favor of shipping a distributed backend implementation instead.

Here is an example of scikit-learn using distributed via this backend:

https://github.com/ogrisel/docker-distributed/blob/master/examples/sklearn_parameter_search.ipynb

the code of the backend is there:

https://github.com/ogrisel/docker-distributed/blob/master/examples/distributed_joblib_backend.py

@mrocklin
Copy link
Member Author

Was the joblib-distributed backend used in that notebook? It looks like you were submitting tasks manually.

@ogrisel
Copy link
Contributor

ogrisel commented Mar 30, 2016

Oops, sorry, that's was the wrong example. Here is the example with the joblib backend:

https://github.com/ogrisel/docker-distributed/blob/master/examples/sklearn_parameter_search_joblib.ipynb

@mrocklin
Copy link
Member Author

That's really cool. Did you notice a speedup? Its hard to compare the numbers in the notebook directly.

I'll go over the joblib DistributedBackend sometime today, test it a bit more extensively and submit a PR, probably sometime tomorrow (traveling a bit today.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants