Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blockwise Metaestimator #190

Merged
merged 8 commits into from Jun 4, 2018

Conversation

Projects
None yet
5 participants
@TomAugspurger
Copy link
Member

TomAugspurger commented Jun 2, 2018

Adds a meta-estimator for wrapping estimators that implement partial_fit.

In [1]: from dask_ml.wrappers import Blockwise
   ...: from dask_ml.datasets import make_classification
   ...: import sklearn.linear_model
   ...:
   ...:

In [2]: X, y = make_classification(chunks=25)

In [3]: est = sklearn.linear_model.SGDClassifier()

In [4]: clf = Blockwise(est, classes=[0, 1])

In [5]: clf.fit(X, y)
Out[5]:
Blockwise(estimator=SGDClassifier(alpha=0.0001, average=False, class_weight=None, epsilon=0.1,
       eta0=0.0, fit_intercept=True, l1_ratio=0.15,
       learning_rate='optimal', loss='hinge', max_iter=None, n_iter=None,
       n_jobs=1, penalty='l2', power_t=0.5, random_state=None,
       shuffle=True, tol=None, verbose=0, warm_start=False))

A few notes:

  1. The name: I went with Blockwise. We could also do Streamable, but I worry people will avoid using it if they think "I don't have streaming data, so this isn't for me.". Any thoughts on the name?
  2. Currently, there's a lot of overlap with the various Partial* estimators scattered throughout dask-ml in terms of code and use. I can / will reduce the code duplication before merging. API-wise, there are benefits to the both. The meta-estimator is nice since it can wrap any sklearn-compatible estimator that implements partial_fit (e.g. from modl). But the "pre-wrapped" versions are maybe nicer for discovery?
  3. I went with **kwargs for the signature instead of a fit_params dict. I could see either working but **kwargs felt a bit more natural.

cc @jakirkham @ogrisel

Closes #188

@TomAugspurger

This comment has been minimized.

Copy link
Member Author

TomAugspurger commented Jun 2, 2018

This also doesn't yet work with grid search. I'm doing that this week in another PR.

@mrocklin

This comment has been minimized.

Copy link
Member

mrocklin commented Jun 2, 2018

I agree with avoiding the term Partial and Streamable. How about Incremental or Sequential? To me Blockwise doesn't capture the sequential nature of the computation. We have other blockwise operations like map_blocks that operate quite differently.

@jakirkham

This comment has been minimized.

Copy link
Member

jakirkham commented Jun 2, 2018

I see the example uses classes. Is it limited to this or can this also work with non-class algorithms like decomposition algorithms?

Maybe fuse them somehow like BlockStreamable or ArrayStreamable or similar? 😉

@stsievert

This comment has been minimized.

Copy link
Contributor

stsievert commented Jun 2, 2018

This would close #188, correct?

Ditto for avoiding Partial and Streamable. How 'bout BlockLearner or ChunkLearner? From the tests, I pulled that it learns from blocks/chunks of a Dask array one at a time. To me, BlockLearner indicates it learns off blocks. That seems sequential to me. Maybe SequentialBlockLearner or SerialBlockLearner if we wanted to reinforce this idea?

machine.
Calling :meth:`Streamable.fit` with a Dask Array will pass each block of
the Dask array to to ``estimator.partial_fit`` *sequentially*.

This comment has been minimized.

Copy link
@stsievert

stsievert Jun 2, 2018

Contributor

I think we should expand on "sequentially". Maybe something like

More concretely, when :meth:Blockwise.fit is called it calls estimator.partial_fit on each set of blocks in the data arrays X and y. It waits for estimator.partial_fit to complete before calling it again on the next block.

@TomAugspurger

This comment has been minimized.

Copy link
Member Author

TomAugspurger commented Jun 3, 2018

``partial_fit`` methods would materialize the large Dask Array on a single
machine.
Calling :meth:`Streamable.fit` with a Dask Array will pass each block of

This comment has been minimized.

Copy link
@mrocklin

mrocklin Jun 3, 2018

Member

Is Streamable.fit here intended?

Blockwise
"""
blockwise = Blockwise(estimator, **kwargs)
return blockwise

This comment has been minimized.

Copy link
@mrocklin

mrocklin Jun 3, 2018

Member

It looks like this function is an alias for Blockwise. Why do we want it?

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Jun 3, 2018

Author Member

@ogrisel mentioned a make_ helper. Do you have thoughts here?

Given the simplicity of the class-API, I'm inclined to just remove the function helper.

Scikit-Learn estimators supporting the ``partial_fit`` API. Each individual
chunk in a Dask Array can be passed to the estimator's ``partial_fit`` method.

Dask-ML provides two ways to achieve this: :ref:`incremental.blockwise-metaestimator`, for wrapping any estimator with a `partial_fit` method, and some pre-daskified :ref:`incremental.dask-friendly` incremental.

This comment has been minimized.

Copy link
@mrocklin

mrocklin Jun 3, 2018

Member

Should we consider deleting these pre-daskified versions? I wonder if we can expect users to build these themselves.

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Jun 3, 2018

Author Member

I've been mulling this over today. I think we should remove the pre-daskified versions. The meta-estimator approach is clearly more flexible. It works with estimators outside scikit-learn implementing the partial_fit API.

The downside of the meta-estimator is (the poor) discoverability. People scanning the API docs could miss the meta-estimator and think "oh, Dask-ML doesn't do dictionary learning". But let's expect the best of our users and guide them to the preferred solution with better documentation and examples.

@TomAugspurger

This comment has been minimized.

Copy link
Member Author

TomAugspurger commented Jun 3, 2018

I see the example uses classes. Is it limited to this or can this also work with non-class algorithms like decomposition algorithms?

This will work with any class implementing partial_fit. The additional **kwargs are for parameters passed to partial_fit, e.g. classes for SGDClassifier. I'll try to clarify that in the docs.

This would close #188, correct?

Yep, updated.

@stsievert

This comment has been minimized.

Copy link
Contributor

stsievert commented Jun 3, 2018

Does this PR change estimator.partial_fit at all? I think I'd expect it to be called on each block.

@TomAugspurger

This comment has been minimized.

Copy link
Member Author

TomAugspurger commented Jun 3, 2018

Does this PR change estimator.partial_fit at all? I think I'd expect it to be called on each block.

Could you clarify? for Incremental(estimator), estimator.partial_fit is indeed called on each block.

@stsievert

This comment has been minimized.

Copy link
Contributor

stsievert commented Jun 3, 2018

Perfect, that’s exactly what I meant. Do we want to implement a test for this? There is a test for fit, but not for partial fit.

Rename to incremental.
Removed make_blockwise helper.
@TomAugspurger

This comment has been minimized.

Copy link
Member Author

TomAugspurger commented Jun 4, 2018

@stsievert do you mean Incremental.partial_fit? I suppose that should work. Adding a test.

@TomAugspurger TomAugspurger changed the title [WIP] Blockwise Metaestimator Blockwise Metaestimator Jun 4, 2018

@TomAugspurger

This comment has been minimized.

Copy link
Member Author

TomAugspurger commented Jun 4, 2018

Removed the WIP status.

@mrocklin
Copy link
Member

mrocklin left a comment

This looks good to me.

@@ -22,6 +23,13 @@ class _WritableDoc(ABCMeta):
# TODO: Py2: remove all this


_partial_deprecation = (
"'{cls.__name__}' is deprecated. Use "
"'dask_ml.wrappers.Incremental({base.__name__}(), **kwargs)' "

This comment has been minimized.

Copy link
@mrocklin

mrocklin Jun 4, 2018

Member

Should Incremental be top-level? This might be a broader discussion though about namespaces.

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Jun 4, 2018

Author Member

I dislike the .wrappers namespace, but I haven't put much thought into a replacement.

@TomAugspurger

This comment has been minimized.

Copy link
Member Author

TomAugspurger commented Jun 4, 2018

Last CI failure came from a python2. I've verified that the warning appears manually, and am not inclined to spend time debugging the test, so I've skipped it on py2.

@TomAugspurger

This comment has been minimized.

Copy link
Member Author

TomAugspurger commented Jun 4, 2018

Merging this, since it'll be working on hyperparameter optimization around it next.

@TomAugspurger TomAugspurger merged commit 5438527 into dask:master Jun 4, 2018

3 checks passed

ci/circleci: py27 Your tests passed on CircleCI!
Details
ci/circleci: py36 Your tests passed on CircleCI!
Details
ci/circleci: sklearn_dev Your tests passed on CircleCI!
Details

@TomAugspurger TomAugspurger deleted the TomAugspurger:streamable branch Jun 4, 2018

@stsievert

This comment has been minimized.

Copy link
Contributor

stsievert commented Jun 4, 2018

I've thought more about the naming, Incremental vs BlockLearner. I think this comes down to declarative vs imperative naming respectively. I'm on board with Incremental now – we should specify the class goal, not how it's achieved.

EDIT: ...and you mentioned the same goal (what, not how) in #194. Go figure.

@jakirkham

This comment has been minimized.

Copy link
Member

jakirkham commented Jun 4, 2018

How would we go about pinning an Incremental learner to a particular worker? Expect this will come up when dealing with matrix decomposition problems where the learner is more expensive to move than the blocks.

@TomAugspurger

This comment has been minimized.

Copy link
Member Author

TomAugspurger commented Jun 4, 2018

@jakirkham

This comment has been minimized.

Copy link
Member

jakirkham commented Jun 4, 2018

Agreed.

Do we know if the scheduler has enough information to make that decision? In particular, will it inspect the attributes of the model when estimating size?

If the answer is no to these, what sort of workarounds (like manual pinning) are available to us?

@mrocklin

This comment has been minimized.

Copy link
Member

mrocklin commented Jun 4, 2018

Do we know if the scheduler has enough information to make that decision? In particular, will it inspect the attributes of the model when estimating size?

This is proposed here: scikit-learn/scikit-learn#8642

SKLearn devs seemed amenable to it. If @stsievert has time and interest this might be a good issue to start interacting with the Scikit-Learn community. I suspect that it will be important for proper distributed scheduling.

@jakirkham

This comment has been minimized.

Copy link
Member

jakirkham commented Jun 4, 2018

Thanks for linking that issue. Sounds like that will likely be solved by the next scikit-learn release, correct? In the interim, what should we be doing to keep the model from moving? Open even to hacky solutions in the short term. :)

@mrocklin

This comment has been minimized.

Copy link
Member

mrocklin commented Jun 4, 2018

I think that my preferred way would be to implement it in scikit-learn and then use the master version of that short-term. If that approach concerns people then we could also implement this in the distributed.sizeof function.

@jakirkham

This comment has been minimized.

Copy link
Member

jakirkham commented Jun 5, 2018

Maybe our concepts of short term in this context differ. :) I'm thinking, "what could one do today?" ;)

@jakirkham jakirkham referenced this pull request Jun 6, 2018

Open

Add DictionaryLearning #95

@TomAugspurger

This comment has been minimized.

Copy link
Member Author

TomAugspurger commented Feb 7, 2019

@Soumyajaganathan

This comment has been minimized.

Copy link

Soumyajaganathan commented Feb 7, 2019

Yes, got it. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.