Blockwise Metaestimator #190

TomAugspurger · 2018-06-02T18:27:23Z

Adds a meta-estimator for wrapping estimators that implement partial_fit.

In [1]: from dask_ml.wrappers import Blockwise
   ...: from dask_ml.datasets import make_classification
   ...: import sklearn.linear_model
   ...:
   ...:

In [2]: X, y = make_classification(chunks=25)

In [3]: est = sklearn.linear_model.SGDClassifier()

In [4]: clf = Blockwise(est, classes=[0, 1])

In [5]: clf.fit(X, y)
Out[5]:
Blockwise(estimator=SGDClassifier(alpha=0.0001, average=False, class_weight=None, epsilon=0.1,
       eta0=0.0, fit_intercept=True, l1_ratio=0.15,
       learning_rate='optimal', loss='hinge', max_iter=None, n_iter=None,
       n_jobs=1, penalty='l2', power_t=0.5, random_state=None,
       shuffle=True, tol=None, verbose=0, warm_start=False))

A few notes:

The name: I went with Blockwise. We could also do Streamable, but I worry people will avoid using it if they think "I don't have streaming data, so this isn't for me.". Any thoughts on the name?
Currently, there's a lot of overlap with the various Partial* estimators scattered throughout dask-ml in terms of code and use. I can / will reduce the code duplication before merging. API-wise, there are benefits to the both. The meta-estimator is nice since it can wrap any sklearn-compatible estimator that implements partial_fit (e.g. from modl). But the "pre-wrapped" versions are maybe nicer for discovery?
I went with **kwargs for the signature instead of a fit_params dict. I could see either working but **kwargs felt a bit more natural.

cc @jakirkham @ogrisel

Closes #188

TomAugspurger · 2018-06-02T18:44:06Z

This also doesn't yet work with grid search. I'm doing that this week in another PR.

mrocklin · 2018-06-02T22:38:02Z

I agree with avoiding the term Partial and Streamable. How about Incremental or Sequential? To me Blockwise doesn't capture the sequential nature of the computation. We have other blockwise operations like map_blocks that operate quite differently.

jakirkham · 2018-06-02T23:06:12Z

I see the example uses classes. Is it limited to this or can this also work with non-class algorithms like decomposition algorithms?

Maybe fuse them somehow like BlockStreamable or ArrayStreamable or similar? 😉

stsievert · 2018-06-02T23:40:04Z

This would close #188, correct?

Ditto for avoiding Partial and Streamable. How 'bout BlockLearner or ChunkLearner? From the tests, I pulled that it learns from blocks/chunks of a Dask array one at a time. To me, BlockLearner indicates it learns off blocks. That seems sequential to me. Maybe SequentialBlockLearner or SerialBlockLearner if we wanted to reinforce this idea?

stsievert · 2018-06-02T23:51:51Z

dask_ml/wrappers.py

+    machine.
+
+    Calling :meth:`Streamable.fit` with a Dask Array will pass each block of
+    the Dask array to to ``estimator.partial_fit`` *sequentially*.


I think we should expand on "sequentially". Maybe something like

More concretely, when :meth:Blockwise.fit is called it calls estimator.partial_fit on each set of blocks in the data arrays X and y. It waits for estimator.partial_fit to complete before calling it again on the next block.

TomAugspurger · 2018-06-03T00:57:36Z

I like Incremental. I avoided that in the old class-based approach because that would lead to name like IncrementalIncrementalPCA :)

________________________________ From: Scott Sievert <notifications@github.com> Sent: Saturday, June 2, 2018 6:52:15 PM To: dask/dask-ml Cc: Tom Augspurger; Author Subject: Re: [dask/dask-ml] [WIP] Blockwise Metaestimator (#190) @stsievert requested changes on this pull request.

________________________________ In dask_ml/wrappers.py<#190 (comment)>:

@@ -199,6 +219,76 @@ def _check_method(self, method):

return getattr(self.estimator, method) +class Blockwise(ParallelPostFit): + """Metaestimator for feeding Dask Arrays to an estimator blockwise. + + This wrapper provides a bridge between Dask objects and estimators + implementing the ``partial_fit`` API. These estimators can train on + batches of data, but simply passing a Dask array to their ``fit`` or + ``partial_fit`` methods would materialize the large Dask Array on a single + machine. + + Calling :meth:`Streamable.fit` with a Dask Array will pass each block of + the Dask array to to ``estimator.partial_fit`` *sequentially*. I think we need to expand upon "sequentially". Maybe something like More concretely, when :meth:Blockwise.fit is called it calls estimator.partial_fit on each set of blocks in the data arrays X and y. It waits for estimator.partial_fit to complete before calling it again on the next block. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#190 (review)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABQHIj-9vSD5CItr6uK_F-5815IhB4Cqks5t4yUvgaJpZM4UXyq3>.

mrocklin · 2018-06-02T22:44:10Z

dask_ml/wrappers.py

+    ``partial_fit`` methods would materialize the large Dask Array on a single
+    machine.
+
+    Calling :meth:`Streamable.fit` with a Dask Array will pass each block of


Is Streamable.fit here intended?

mrocklin · 2018-06-02T22:45:54Z

dask_ml/wrappers.py

+    Blockwise
+    """
+    blockwise = Blockwise(estimator, **kwargs)
+    return blockwise


It looks like this function is an alias for Blockwise. Why do we want it?

@ogrisel mentioned a make_ helper. Do you have thoughts here?

Given the simplicity of the class-API, I'm inclined to just remove the function helper.

mrocklin · 2018-06-02T22:47:42Z

docs/source/incremental.rst

+Scikit-Learn estimators supporting the ``partial_fit`` API. Each individual
+chunk in a Dask Array can be passed to the estimator's ``partial_fit`` method.
+
+Dask-ML provides two ways to achieve this: :ref:`incremental.blockwise-metaestimator`, for wrapping any estimator with a `partial_fit` method, and some pre-daskified :ref:`incremental.dask-friendly` incremental.


Should we consider deleting these pre-daskified versions? I wonder if we can expect users to build these themselves.

I've been mulling this over today. I think we should remove the pre-daskified versions. The meta-estimator approach is clearly more flexible. It works with estimators outside scikit-learn implementing the partial_fit API.

The downside of the meta-estimator is (the poor) discoverability. People scanning the API docs could miss the meta-estimator and think "oh, Dask-ML doesn't do dictionary learning". But let's expect the best of our users and guide them to the preferred solution with better documentation and examples.

TomAugspurger · 2018-06-03T20:24:32Z

I see the example uses classes. Is it limited to this or can this also work with non-class algorithms like decomposition algorithms?

This will work with any class implementing partial_fit. The additional **kwargs are for parameters passed to partial_fit, e.g. classes for SGDClassifier. I'll try to clarify that in the docs.

This would close #188, correct?

Yep, updated.

stsievert · 2018-06-03T20:46:21Z

Does this PR change estimator.partial_fit at all? I think I'd expect it to be called on each block.

TomAugspurger · 2018-06-03T21:59:31Z

Does this PR change estimator.partial_fit at all? I think I'd expect it to be called on each block.

Could you clarify? for Incremental(estimator), estimator.partial_fit is indeed called on each block.

stsievert · 2018-06-03T22:36:46Z

Perfect, that’s exactly what I meant. Do we want to implement a test for this? There is a test for fit, but not for partial fit.

Removed make_blockwise helper.

TomAugspurger · 2018-06-04T14:15:59Z

@stsievert do you mean Incremental.partial_fit? I suppose that should work. Adding a test.

TomAugspurger · 2018-06-04T15:56:32Z

Removed the WIP status.

mrocklin

This looks good to me.

mrocklin · 2018-06-04T17:10:25Z

dask_ml/_partial.py

@@ -22,6 +23,13 @@ class _WritableDoc(ABCMeta):
    # TODO: Py2: remove all this


+_partial_deprecation = (
+    "'{cls.__name__}' is deprecated. Use "
+    "'dask_ml.wrappers.Incremental({base.__name__}(), **kwargs)' "


Should Incremental be top-level? This might be a broader discussion though about namespaces.

I dislike the .wrappers namespace, but I haven't put much thought into a replacement.

TomAugspurger · 2018-06-04T17:20:33Z

Last CI failure came from a python2. I've verified that the warning appears manually, and am not inclined to spend time debugging the test, so I've skipped it on py2.

TomAugspurger · 2018-06-04T17:33:37Z

Merging this, since it'll be working on hyperparameter optimization around it next.

stsievert · 2018-06-04T17:57:51Z

I've thought more about the naming, Incremental vs BlockLearner. I think this comes down to declarative vs imperative naming respectively. I'm on board with Incremental now – we should specify the class goal, not how it's achieved.

EDIT: ...and you mentioned the same goal (what, not how) in #194. Go figure.

jakirkham · 2018-06-04T18:25:58Z

How would we go about pinning an Incremental learner to a particular worker? Expect this will come up when dealing with matrix decomposition problems where the learner is more expensive to move than the blocks.

TomAugspurger · 2018-06-04T18:28:40Z

At the moment, we haven't thought much about designing an API for that. All else equal, if the data is indeed cheaper to move than the model, then the scheduler should choose that route.

…

On Mon, Jun 4, 2018 at 1:25 PM, jakirkham ***@***.***> wrote: How would we go about pinning an Incremental learner to a particular worker? Expect this will come up when dealing with matrix decomposition problems where the learner is more expensive to move than the blocks. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#190 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHInVC98fPpEHIoVL-Vn1oItATbdPmks5t5Xu3gaJpZM4UXyq3> .

jakirkham · 2018-06-04T18:50:18Z

Agreed.

Do we know if the scheduler has enough information to make that decision? In particular, will it inspect the attributes of the model when estimating size?

If the answer is no to these, what sort of workarounds (like manual pinning) are available to us?

mrocklin · 2018-06-04T20:01:58Z

Do we know if the scheduler has enough information to make that decision? In particular, will it inspect the attributes of the model when estimating size?

This is proposed here: scikit-learn/scikit-learn#8642

SKLearn devs seemed amenable to it. If @stsievert has time and interest this might be a good issue to start interacting with the Scikit-Learn community. I suspect that it will be important for proper distributed scheduling.

jakirkham · 2018-06-04T21:25:39Z

Thanks for linking that issue. Sounds like that will likely be solved by the next scikit-learn release, correct? In the interim, what should we be doing to keep the model from moving? Open even to hacky solutions in the short term. :)

mrocklin · 2018-06-04T21:27:46Z

I think that my preferred way would be to implement it in scikit-learn and then use the master version of that short-term. If that approach concerns people then we could also implement this in the distributed.sizeof function.

jakirkham · 2018-06-05T19:35:03Z

Maybe our concepts of short term in this context differ. :) I'm thinking, "what could one do today?" ;)

TomAugspurger · 2019-02-07T13:22:51Z

Make sure you're on the latest version of dask-ml.

…

On Thu, Feb 7, 2019 at 4:04 AM Soumyajaganathan ***@***.***> wrote: Here, I am just trying to fit the large dataset with Incremental in DASK. But I am getting error like this, from dask_ml.datasets import make_classification from dask_ml.wrappers import Incremental from sklearn.linear_model import SGDClassifier from dask.distributed import client import dask X, y = make_classification(chunks=25) estimator = SGDClassifier(random_state=10) clf = Incremental(estimator,shuffle_blocks=True,random_state=0) clf.fit(X, y) *Error*: --> 197 new_dsk = dask.sharedict.merge((name, dsk), x.dask, getattr(y, "dask", {})) 198 value = Delayed((name, nblocks - 1), new_dsk) 199 AttributeError: module 'dask' has no attribute 'sharedict' Then I imported sharedict from DASK to fix this, from dask_ml.datasets import make_classification from dask_ml.wrappers import Incremental from sklearn.linear_model import SGDClassifier from dask.distributed import client import dask.sharedict import dask import dask.delayed from toolz import merge from toolz import partial rom dask.delayed import Delayed X, y = make_classification(chunks=25) estimator = SGDClassifier(random_state=10) clf = Incremental(estimator,shuffle_blocks=True,random_state=0) clf.fit(X, y) *Error*: ~/.local/lib/python3.6/site-packages/dask/base.py in _extract_graph_and_keys(vals) 210 graph = HighLevelGraph.merge(*graphs) 211 else: --> 212 graph = merge(*graphs) 213 214 return graph, keys ~/.local/lib/python3.6/site-packages/toolz/dicttoolz.py in merge(*dicts, **kwargs) 37 rv = factory() 38 for d in dicts: ---> 39 rv.update(d) 40 return rv 41 ValueError: dictionary update sequence element #0 has length 36; 2 is required Help me to fix this, — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#190 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHInCWTJieuY-OqsPd7qrOxFxCNYu1ks5vK_pKgaJpZM4UXyq3> .

Soumyajaganathan · 2019-02-07T14:40:52Z

Yes, got it. Thanks

ENH: Blockwise Metaestimator

ebb2a06

Update docs

d98c1a8

stsievert requested changes Jun 2, 2018

View reviewed changes

mrocklin reviewed Jun 3, 2018

View reviewed changes

Rename to incremental.

eb40dc0

Removed make_blockwise helper.

TomAugspurger added 2 commits June 4, 2018 10:47

Deprecate the old

0f3dab7

document deprecation

c264dbb

TomAugspurger changed the title ~~[WIP] Blockwise Metaestimator~~ Blockwise Metaestimator Jun 4, 2018

partial fit test

aaba050

linting

ccc858c

mrocklin reviewed Jun 4, 2018

View reviewed changes

CI fixup

0f368c8

TomAugspurger mentioned this pull request Jun 4, 2018

Better namespaces for Incremental and ParallelPostFit #194

Open

TomAugspurger merged commit 5438527 into dask:master Jun 4, 2018

TomAugspurger deleted the streamable branch June 4, 2018 17:33

TomAugspurger mentioned this pull request Jun 4, 2018

Hyperparameter optimization over Incremental-wrapped models #195

Closed

jakirkham mentioned this pull request Jun 6, 2018

Add DictionaryLearning #95

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blockwise Metaestimator #190

Blockwise Metaestimator #190

TomAugspurger commented Jun 2, 2018 •

edited

Loading

TomAugspurger commented Jun 2, 2018

mrocklin commented Jun 2, 2018

jakirkham commented Jun 2, 2018

stsievert commented Jun 2, 2018 •

edited

Loading

stsievert Jun 2, 2018 •

edited

Loading

TomAugspurger commented Jun 3, 2018 via email

mrocklin Jun 2, 2018

mrocklin Jun 2, 2018

TomAugspurger Jun 3, 2018

mrocklin Jun 2, 2018

TomAugspurger Jun 3, 2018

TomAugspurger commented Jun 3, 2018

stsievert commented Jun 3, 2018

TomAugspurger commented Jun 3, 2018

stsievert commented Jun 3, 2018

TomAugspurger commented Jun 4, 2018

TomAugspurger commented Jun 4, 2018

mrocklin left a comment

mrocklin Jun 4, 2018

TomAugspurger Jun 4, 2018

TomAugspurger commented Jun 4, 2018 •

edited

Loading

TomAugspurger commented Jun 4, 2018

stsievert commented Jun 4, 2018 •

edited

Loading

jakirkham commented Jun 4, 2018

TomAugspurger commented Jun 4, 2018 via email

jakirkham commented Jun 4, 2018

mrocklin commented Jun 4, 2018

jakirkham commented Jun 4, 2018

mrocklin commented Jun 4, 2018

jakirkham commented Jun 5, 2018

TomAugspurger commented Feb 7, 2019 via email

Soumyajaganathan commented Feb 7, 2019

Blockwise Metaestimator #190

Blockwise Metaestimator #190

Conversation

TomAugspurger commented Jun 2, 2018 • edited Loading

TomAugspurger commented Jun 2, 2018

mrocklin commented Jun 2, 2018

jakirkham commented Jun 2, 2018

stsievert commented Jun 2, 2018 • edited Loading

stsievert Jun 2, 2018 • edited Loading

Choose a reason for hiding this comment

TomAugspurger commented Jun 3, 2018 via email

mrocklin Jun 2, 2018

Choose a reason for hiding this comment

mrocklin Jun 2, 2018

Choose a reason for hiding this comment

TomAugspurger Jun 3, 2018

Choose a reason for hiding this comment

mrocklin Jun 2, 2018

Choose a reason for hiding this comment

TomAugspurger Jun 3, 2018

Choose a reason for hiding this comment

TomAugspurger commented Jun 3, 2018

stsievert commented Jun 3, 2018

TomAugspurger commented Jun 3, 2018

stsievert commented Jun 3, 2018

TomAugspurger commented Jun 4, 2018

TomAugspurger commented Jun 4, 2018

mrocklin left a comment

Choose a reason for hiding this comment

mrocklin Jun 4, 2018

Choose a reason for hiding this comment

TomAugspurger Jun 4, 2018

Choose a reason for hiding this comment

TomAugspurger commented Jun 4, 2018 • edited Loading

TomAugspurger commented Jun 4, 2018

stsievert commented Jun 4, 2018 • edited Loading

jakirkham commented Jun 4, 2018

TomAugspurger commented Jun 4, 2018 via email

jakirkham commented Jun 4, 2018

mrocklin commented Jun 4, 2018

jakirkham commented Jun 4, 2018

mrocklin commented Jun 4, 2018

jakirkham commented Jun 5, 2018

TomAugspurger commented Feb 7, 2019 via email

Soumyajaganathan commented Feb 7, 2019

TomAugspurger commented Jun 2, 2018 •

edited

Loading

stsievert commented Jun 2, 2018 •

edited

Loading

stsievert Jun 2, 2018 •

edited

Loading

TomAugspurger commented Jun 4, 2018 •

edited

Loading

stsievert commented Jun 4, 2018 •

edited

Loading