Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blockwise Metaestimator #190

Merged
merged 8 commits into from
Jun 4, 2018
Merged

Conversation

TomAugspurger
Copy link
Member

@TomAugspurger TomAugspurger commented Jun 2, 2018

Adds a meta-estimator for wrapping estimators that implement partial_fit.

In [1]: from dask_ml.wrappers import Blockwise
   ...: from dask_ml.datasets import make_classification
   ...: import sklearn.linear_model
   ...:
   ...:

In [2]: X, y = make_classification(chunks=25)

In [3]: est = sklearn.linear_model.SGDClassifier()

In [4]: clf = Blockwise(est, classes=[0, 1])

In [5]: clf.fit(X, y)
Out[5]:
Blockwise(estimator=SGDClassifier(alpha=0.0001, average=False, class_weight=None, epsilon=0.1,
       eta0=0.0, fit_intercept=True, l1_ratio=0.15,
       learning_rate='optimal', loss='hinge', max_iter=None, n_iter=None,
       n_jobs=1, penalty='l2', power_t=0.5, random_state=None,
       shuffle=True, tol=None, verbose=0, warm_start=False))

A few notes:

  1. The name: I went with Blockwise. We could also do Streamable, but I worry people will avoid using it if they think "I don't have streaming data, so this isn't for me.". Any thoughts on the name?
  2. Currently, there's a lot of overlap with the various Partial* estimators scattered throughout dask-ml in terms of code and use. I can / will reduce the code duplication before merging. API-wise, there are benefits to the both. The meta-estimator is nice since it can wrap any sklearn-compatible estimator that implements partial_fit (e.g. from modl). But the "pre-wrapped" versions are maybe nicer for discovery?
  3. I went with **kwargs for the signature instead of a fit_params dict. I could see either working but **kwargs felt a bit more natural.

cc @jakirkham @ogrisel

Closes #188

@TomAugspurger
Copy link
Member Author

This also doesn't yet work with grid search. I'm doing that this week in another PR.

@mrocklin
Copy link
Member

mrocklin commented Jun 2, 2018

I agree with avoiding the term Partial and Streamable. How about Incremental or Sequential? To me Blockwise doesn't capture the sequential nature of the computation. We have other blockwise operations like map_blocks that operate quite differently.

@jakirkham
Copy link
Member

I see the example uses classes. Is it limited to this or can this also work with non-class algorithms like decomposition algorithms?

Maybe fuse them somehow like BlockStreamable or ArrayStreamable or similar? 😉

@stsievert
Copy link
Member

stsievert commented Jun 2, 2018

This would close #188, correct?

Ditto for avoiding Partial and Streamable. How 'bout BlockLearner or ChunkLearner? From the tests, I pulled that it learns from blocks/chunks of a Dask array one at a time. To me, BlockLearner indicates it learns off blocks. That seems sequential to me. Maybe SequentialBlockLearner or SerialBlockLearner if we wanted to reinforce this idea?

machine.

Calling :meth:`Streamable.fit` with a Dask Array will pass each block of
the Dask array to to ``estimator.partial_fit`` *sequentially*.
Copy link
Member

@stsievert stsievert Jun 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should expand on "sequentially". Maybe something like

More concretely, when :meth:Blockwise.fit is called it calls estimator.partial_fit on each set of blocks in the data arrays X and y. It waits for estimator.partial_fit to complete before calling it again on the next block.

@TomAugspurger
Copy link
Member Author

TomAugspurger commented Jun 3, 2018 via email

``partial_fit`` methods would materialize the large Dask Array on a single
machine.

Calling :meth:`Streamable.fit` with a Dask Array will pass each block of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Streamable.fit here intended?

Blockwise
"""
blockwise = Blockwise(estimator, **kwargs)
return blockwise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this function is an alias for Blockwise. Why do we want it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogrisel mentioned a make_ helper. Do you have thoughts here?

Given the simplicity of the class-API, I'm inclined to just remove the function helper.

Scikit-Learn estimators supporting the ``partial_fit`` API. Each individual
chunk in a Dask Array can be passed to the estimator's ``partial_fit`` method.

Dask-ML provides two ways to achieve this: :ref:`incremental.blockwise-metaestimator`, for wrapping any estimator with a `partial_fit` method, and some pre-daskified :ref:`incremental.dask-friendly` incremental.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider deleting these pre-daskified versions? I wonder if we can expect users to build these themselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been mulling this over today. I think we should remove the pre-daskified versions. The meta-estimator approach is clearly more flexible. It works with estimators outside scikit-learn implementing the partial_fit API.

The downside of the meta-estimator is (the poor) discoverability. People scanning the API docs could miss the meta-estimator and think "oh, Dask-ML doesn't do dictionary learning". But let's expect the best of our users and guide them to the preferred solution with better documentation and examples.

@TomAugspurger
Copy link
Member Author

I see the example uses classes. Is it limited to this or can this also work with non-class algorithms like decomposition algorithms?

This will work with any class implementing partial_fit. The additional **kwargs are for parameters passed to partial_fit, e.g. classes for SGDClassifier. I'll try to clarify that in the docs.

This would close #188, correct?

Yep, updated.

@stsievert
Copy link
Member

Does this PR change estimator.partial_fit at all? I think I'd expect it to be called on each block.

@TomAugspurger
Copy link
Member Author

Does this PR change estimator.partial_fit at all? I think I'd expect it to be called on each block.

Could you clarify? for Incremental(estimator), estimator.partial_fit is indeed called on each block.

@stsievert
Copy link
Member

Perfect, that’s exactly what I meant. Do we want to implement a test for this? There is a test for fit, but not for partial fit.

Removed make_blockwise helper.
@TomAugspurger
Copy link
Member Author

@stsievert do you mean Incremental.partial_fit? I suppose that should work. Adding a test.

@TomAugspurger TomAugspurger changed the title [WIP] Blockwise Metaestimator Blockwise Metaestimator Jun 4, 2018
@TomAugspurger
Copy link
Member Author

Removed the WIP status.

Copy link
Member

@mrocklin mrocklin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

@@ -22,6 +23,13 @@ class _WritableDoc(ABCMeta):
# TODO: Py2: remove all this


_partial_deprecation = (
"'{cls.__name__}' is deprecated. Use "
"'dask_ml.wrappers.Incremental({base.__name__}(), **kwargs)' "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Incremental be top-level? This might be a broader discussion though about namespaces.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike the .wrappers namespace, but I haven't put much thought into a replacement.

@TomAugspurger
Copy link
Member Author

TomAugspurger commented Jun 4, 2018

Last CI failure came from a python2. I've verified that the warning appears manually, and am not inclined to spend time debugging the test, so I've skipped it on py2.

@TomAugspurger
Copy link
Member Author

Merging this, since it'll be working on hyperparameter optimization around it next.

@TomAugspurger TomAugspurger merged commit 5438527 into dask:master Jun 4, 2018
@TomAugspurger TomAugspurger deleted the streamable branch June 4, 2018 17:33
@stsievert
Copy link
Member

stsievert commented Jun 4, 2018

I've thought more about the naming, Incremental vs BlockLearner. I think this comes down to declarative vs imperative naming respectively. I'm on board with Incremental now – we should specify the class goal, not how it's achieved.

EDIT: ...and you mentioned the same goal (what, not how) in #194. Go figure.

@jakirkham
Copy link
Member

How would we go about pinning an Incremental learner to a particular worker? Expect this will come up when dealing with matrix decomposition problems where the learner is more expensive to move than the blocks.

@TomAugspurger
Copy link
Member Author

TomAugspurger commented Jun 4, 2018 via email

@jakirkham
Copy link
Member

Agreed.

Do we know if the scheduler has enough information to make that decision? In particular, will it inspect the attributes of the model when estimating size?

If the answer is no to these, what sort of workarounds (like manual pinning) are available to us?

@mrocklin
Copy link
Member

mrocklin commented Jun 4, 2018

Do we know if the scheduler has enough information to make that decision? In particular, will it inspect the attributes of the model when estimating size?

This is proposed here: scikit-learn/scikit-learn#8642

SKLearn devs seemed amenable to it. If @stsievert has time and interest this might be a good issue to start interacting with the Scikit-Learn community. I suspect that it will be important for proper distributed scheduling.

@jakirkham
Copy link
Member

Thanks for linking that issue. Sounds like that will likely be solved by the next scikit-learn release, correct? In the interim, what should we be doing to keep the model from moving? Open even to hacky solutions in the short term. :)

@mrocklin
Copy link
Member

mrocklin commented Jun 4, 2018

I think that my preferred way would be to implement it in scikit-learn and then use the master version of that short-term. If that approach concerns people then we could also implement this in the distributed.sizeof function.

@jakirkham
Copy link
Member

Maybe our concepts of short term in this context differ. :) I'm thinking, "what could one do today?" ;)

@TomAugspurger
Copy link
Member Author

TomAugspurger commented Feb 7, 2019 via email

@Soumyajaganathan
Copy link

Yes, got it. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants