Incremental model selection #288

mrocklin · 2018-07-08T17:13:56Z

This implements a simple approach to do hyper-parameter optimization on
incrementally trained models. It is heavily insprired by (and steals content
from) #221 but makes fewer opinions, and doesn't attempt to implement
scikit-learn APIs. I don't intend for this to be user-facing, or necessarily get
merged, but I hope that it to enable conversation around the topic.

Currently it implements a function fit, which takes in a model, training
and testing data and a parameter space. It creates many models and trains them
on a block of data, scores them, and then keeps only the better performing
ones, training them on subsequent blocks of data.
It continues to do this until only one is selected.

This function is not intended for public users, but is hopefully something we
can play with as we eventually build a full solution. It is decently
configurable today and we will need to find sane defaults. Notably, this
function does not take opinions on the following:

How testing data should be separated from training data,
currently they are both explicitly provided
How many parameters to start with (provided as integer input)
How to decrease this number over time (provided as function input)

Additionally it does not, but probably should provide configurations for the
following:

How to select models from the current group
How to decide on stopping criteria for any particular model
How to decide how many models to select based on the scores received so far

I believe that this implementation is a decent start though and should be ok as a
conversation piece. I think that it makes some decisions that we don't want to
think about yet (online determination of how many models to keep) while giving
us knobs for other parameters that I think we'll want to play with (model decay
functions, how to handle testing, etc..)

It is decently well tuned from a dask perspsective.
It optimistically tries moving forward on models between rounds and then cancels
work if it's not necessary. This depends on dask master.
Performance is also a bit better with dask/distributed#2100
merged in (but not necessary)

Notebook: https://gist.github.com/3a33bed7e7289550bfea8a48af702c82

It would be nice to settle on a sane default for this operation, which I understand is commonly needed.
I suspect that we should play a bit with some real problems using this to see what ends up being important before doing much more development work.

mrocklin · 2018-07-08T17:15:16Z

dask_ml/model_selection/_incremental.py

+
+        else:
+            x_key, y_key = seen[j]
+            return Future(x_key), Future(y_key)


I don't recommend looking too deeply at this function. It's unclean and likely to go in the future.

stsievert

Thanks for this implementation @mrocklin. I've glanced at the code, but need to take time to thoroughly review. Does this clarify the ideas in #221 (comment)?

It implements starting/stopping work based on the current estimate of good/bad models (which I agree should be done in #221). Is there anything else I should pay close attention to for #221?

Oh, and the gist isn't loading for me.

stsievert · 2018-07-08T20:33:42Z

dask_ml/model_selection/_incremental.py

+            model = client.submit(_partial_fit, model, X_future, y_future,
+                                  fit_params, priority=-i + meta['score'])
+        score = client.submit(_score, model, X_test, y_test, scorer,
+                              priority=-time_step + meta['score'])


Nice – the priority keyword is useful, and I wasn't aware of the implementation.

I should spend some time thinking about priority. I think we want to use a multiplicative ratio to try to remove scale: priority=score / i.

mrocklin · 2018-07-08T20:54:14Z

I recommend that before we develop further on either solution that we play a bit with concrete problems to see what is important. Eventually my guess is that this implementation will grow to the point where you can use it as a piece to build out the full algorithm that you want without having to do any low-level dask work. However, I'm going to claim that we don't yet know what algorithm you'll want to build before first having a few more serious examples to play with.

…

On Sun, Jul 8, 2018 at 4:47 PM, Scott Sievert ***@***.***> wrote: ***@***.**** commented on this pull request. Thanks for this implementation @mrocklin <https://github.com/mrocklin>. I've glanced at the code, but need to take time to thoroughly review. Does this clarify the ideas in #221 (comment) <#221 (comment)>? It implements starting/stopping work based on the current estimate of good/bad models (which I agree should be done in #221 <#221>). Is there anything else I should pay close attention to for #221 <#221>? ------------------------------ In dask_ml/model_selection/_incremental.py <#288 (comment)>: > + continue + time_step = meta['time_step'] + ident = meta['ident'] + + done[time_step].add(ident) + info[ident].update(meta) + history.append(meta) + + # Evolve the model by a few time steps, then call score on the last one + model = models[ident] + for i in range(time_step, next_time_step): + X_future, y_future = get_futures(i + 1) + model = client.submit(_partial_fit, model, X_future, y_future, + fit_params, priority=-i + meta['score']) + score = client.submit(_score, model, X_test, y_test, scorer, + priority=-time_step + meta['score']) Nice – the priority keyword is useful, and I wasn't aware of the implementation. I should spend some time thinking about priority. I think we want to use a multiplicative ratio to try to remove scale: priority=score / i. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#288 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszFZ9gF2CukknIBePCoxb0pFCPVRDks5uEm_GgaJpZM4VGzHI> .

mrocklin · 2018-07-16T22:23:41Z

After looking at the Incremental.fit problem I'm inclined to generalize this code a bit so that the caller provides a list of models, training and testing data, and a function that gets called after every batch of training data gets processed to determine how to proceed. That function would accept a dictionary mapping model identity to scoring information, and it would then return a dictionary showing how much further to proceed on each model before checking in again.

>>> scores  # example simplified input
{'model-1': [0.3, 0.4, 0.5, 0.6, 0.7],  # evolving
 'model-2': [0.2, 0.6, 0.6, 0.6, 0.6],  # converged
 'model-2': [0.1, 0.1, 0.1, 0.1, 0.1],  # bad
}
>>> out  # example output
{'model-1': 3  # evolve forward three more steps, then give us back the score
 'model-2': 0  # keep this around, but don't bother evolving further 
# 'model-3': ... # omit model 3 to forget about it
}

def update(scores):
    """ Something like successive halving for hyperband """
    best = topk(n=len(scores) / 2, scores, lambda k: score[k][-1])
    additional_steps = int(...)
    return {ident: additional_steps for ident in best}

def update(scores):
    """ Something like Incremental.fit """
    assert len(score_info) == 1
    [(ident, v)] = scores.items()
    if any(score > v[-10] + tolerance for score in v[-10:]):
        return {ident: 1}  # do one more time step
    else:
        return {ident: 0}  # no more work to do, signals termination

def update(scores):
    """ Something like RandomSearchCV """
    out = {}
    for ident, v in scores:
        if any(score > v[-10] + tolerance for score in v[-10:]):
            out[ident] = 1  # keep evolving this one
        else:
            out[ident] = 0  # don't evolve this one any further, it's done

The full fit function would have the something like the following inputs:

def fit(model, parameter_list, X_train, y_train, X_test, y_test, update, fit_params, scorer):
    ...

This may not cover these cases though. Critique welcome before I invest time into this.

mrocklin · 2018-07-16T22:32:55Z

I simplified the example above significantly in an edit. Better to look on github rather than depend on e-mail.

stsievert · 2018-07-16T23:57:28Z

I'm inclined to generalize this code a bit

This looks like a disciplined way of controlling when to call partial_fit on models. It looks like one float determines model quality. update controls whether to pass a particular model to fit.

I think this looks good to me. I would like to store state inside update (I assume update is called with as_completed as fit completes). For the output of update, I think a negative integer should cancel any fit calls that are currently running for that model.

I think we'd want to control the fit call signature. Could we have a list of metadata aside the scores? Something like

>>> scores
{'model-1': [0.3, 0.4, 0.5],
 'model-2': [0.5, 0.5, 0.5]}
>>> model_meta
{'model-1': {...},
 'model-2': {...}}
>>> all_meta  # for HyperbandCV
{'scorer': ...,
 'X_test': ...,
 'y_test': ...}

where all_meta and model_meta could be configured by the programmer of Hyperband/Incremental.fit/RandomizedSearchCV in some way.

Then the call signature for fit could be

fit(model, X, y, **model_meta[ident], **all_meta) -> model, meta

I'm not seeing why update should be an argument to fit, but scikit-learn/scikit-learn#9599 (comment) seems related. I renamed {X, y}_train to X, y. I also think we should rename update.

mrocklin · 2018-07-17T01:00:06Z

It looks like one float determines model quality

I don't understand this. Which float? In principle there is no one thing that determines model quality here. We're a bit lower level.

I would like to store state inside update.

What state would you want to propagate. I expect that the scores input will grow a number of other bits of metadata collected during the computation.

(I assume update is called with as_completed as fit completes)

In the context of successive halving update is called at the end of every epoch

I think a negative integer should cancel any fit calls that are currently running for that model.

The convention I proposed above is that the user passes back the number of additional steps go to forward in each model before calling update again. A number of 0 would result in no additional fits. Removing the element from the dict would result in forgetting the model entirely. I believe that your proposed use of -1 fits one of these two. Is that correct?

I think we'd want to control the fit call signature. Could we have a list of metadata aside the scores?

Yes, I agree that some collection of metadata is likely to be important.

where all_meta and model_meta could be configured by the programmer of Hyperband/Incremental.fit/RandomizedSearchCV in some way.

So the user would also pass functions to include in the partial_fit or score calls? Is there a particular place where they should be run? Should the user have the freedom to provide the entire partial_fit or score call or is this too much? If too much, then can you suggest an implementation of _partial_fit that would cover most needs that you can think of? Where do we call meta_func? Or should it be a context manager or something else? What is the right balance of easy to implement and also sufficient? (these are somewhat rhetorical questions, I wouldn't expect anyone to have a full answer to any of them without trying out a bunch of things).

def _partial_fit(model_and_meta, X, y, fit_params, meta_func=None):
    ...

def _score(model_and_meta, X, y, scorer, meta_func=None):
    ...

I'm not seeing why update should be an argument to fit, but scikit-learn/scikit-learn#9599 (comment) seems related.

I'm probably misusing the term fit here. That's just the name of my full function that does all of the training work. I'm using the term partial_fit for training on a single batch. I suspect that this is what you are calling fit above.

stsievert · 2018-07-17T17:45:39Z

I got confused by naming/etc but am starting to understand (and disregard most of #288 (comment); it's ill-informed). I think this adaptive training will require

specifying how many times _partial_fit is called with a particular model
receiving the score for each model after _partial_fit called so many times
storage and modification of metadata. This metadata is per-model for Hyperband, but it might be global for all models in other use cases (e.g., bayesian searches).

I think the _partial_fit and _score functions are good in this implementation. I think update should have the call signature

update(ident: str, train_metas: {str: dict}, model_metas: {str: dict}) 
    -> training_iters: {str: int}, model_metas: {str: dict}

There are two pieces of metadata: one for training (train_metas), and one to be used and modified in update (model_metas)
I would expect this to be called as soon as model ident has finished their _score call.
The metadata information in model_metas might change

I think train_metas should have scores and partial_fit_calls keys, and also the same keys as cv_results_ (e.g., mean_fit_time). This would be the object passed to and modified in _partial_fit and _score.

I think model_metas should hold on to historical information, and be initialized in meta_func. In the Hyperband case, this would mean holding onto the previous brackets and their scores in case any models need to be restarted. This would only live on the scheduler/master and not be passed to or modified in _partial_fit or _score.

I do think update should be renamed.

The convention I proposed ... Removing the element from the dict would result in forgetting the model entirely

I think 'not present' should default to 'do nothing'. I'd only want to specify the state for some subset of the models, not all the models. For Hyperband with some subset of the models I'd specify which models should be trained more and which models should be killed.

I think there {'model-0': -1} should cancel the most recent _partial_fit call for 'model-0', but hold onto the results from the previous _partial_fit calls. I think we should use -1 because it's kind of like going back one _partial_fit call.

mrocklin · 2018-07-17T19:25:00Z

I'm not sure about some of the motivation behind your thoughts above.

For the desire to call update after every score call I believe that you want this in order to support asynchronous hyperband, where you evolve tasks based on the scores that have arrived so far. Instead of this approach I recommend that we just submit a training step on every model until a global update check, but submit them with a priority equal to their score. This will mean that we still keep things saturated during stragglers, but won't let good models die if they happen to be unlucky about when they finish. If async hyperband was your only motivation then I suggest that we keep things a bit simpler for now with the update function and run it only after all of the previous instructions finish.

Similarly I suspect that this was your motivation for having "no change" be the default in update. But I'm not sure. If you can include motivation behind your thoughts that would be useful

mrocklin · 2018-07-17T19:27:45Z

Or, putting this another way. I believe that I've presented a sufficient approach to successive halving in my update function above. I believe that the speculative execution should handle the underlying desires behind the move to async hyperband. I've removed the prioritization for now for simplicity, but it will be easy to put it back.

If I'm missing something and the approach is unlikely to work for some reason please let me know

stsievert · 2018-07-17T22:42:40Z

If you can include motivation behind your thoughts that would be useful

Asynchronous Hyperband is part of my motivation but not all of it. This is an adaptive training function, one that decides to train models based on prior evaluations and stored metadata. Wikipedia has a good example of the model updating after seeing one evaluation, what I'm used to seeing in adaptive algorithms.

run it [update] only after all of the previous instructions finish.

This won't perform well for synchronous Hyperband. There are a couple embarrassingly parallel loops in synchronous Hyperband, and there's no reason for loop to wait on another. It will work well for one successive halving call in synchronous Hyperband.

This is important: the initial call to update would return anywhere between 1 and 81 calls to _partial_fit. This doesn't really go away in later calls to update, which will specify anywhere between 3 and 27 _partial_fit calls.

This could be avoided if there's some way to specify the different Hyperband loops as independent, at least for synchronous Hyperband. This could work for asynchronous Hyperband too, as long as additional arguments are passed.

run it [update] only after all of the previous instructions finish.

I think this should be supported as well. That's the simplest interface I would expect.

mrocklin · 2018-07-17T22:55:21Z

Notebook implementing prototypes of of Incremental.fit, RandomSearch, and SucessiveHalving:

https://gist.github.com/4c95bd26d15281d82e0bf2d27632e294

mrocklin · 2018-07-17T23:06:48Z

This is an adaptive training function, one that decides to train models based on prior evaluations and stored metadata. Wikipedia has a good example of the model updating after seeing one evaluation, what I'm used to seeing in adaptive algorithms

Understood. And in principle I agree that we might gain from making decisions whenever we find new information. To start out though with my current constraints (finding solutions to the above problems) I seem to be able to implement most things about as efficiently as desired. If we choose to get fancier we can get fancier easily. This seems to suffice for now though?

This won't perform well for synchronous Hyperband. There are a couple embarrassingly parallel loops in synchronous Hyperband, and there's no reason for loop to wait on another. It will work well for one successive halving call in synchronous Hyperband.

How would you interleave calls? The paper specifies a nested loop. That would be trivial to implement here. We can definitely launch the various successive-halving computations concurrently if desired. Or perhaps you're referring to some way to fuse and share computations between successive halving computations?

The initial call to update would return anywhere between 1 and 81 calls to _partial_fit

Just checking, 81 is just an example number here, right? One might ask for thousands I think? I believe that 81 is just a number they use in a table to show an example of how things might evolve, and what they use in their LeNet benchmark. I don't think it's a magic number, please let me know if I'm missing something.

This could be avoided if there's some way to specify the different Hyperband loops as independent, at least for synchronous Hyperband. This could work for asynchronous Hyperband too, as long as additional arguments are passed.

It's definitely possible to run multiple of these at the same time, though they won't be sharing work. I can imagine that that would be helpful if we have abundant parallelism, which seems possible.

To make this concrete, something like the following would work:

@gen.coroutine
def hyperband(...):
    yield [successive_halving(...) for _ in ...]

The Dask client will then run this process many times from the same thread.

mrocklin · 2018-07-17T23:08:13Z

Also, I'm running this notebook using my cull-before-send branch in dask/distributed

stsievert · 2018-07-17T23:24:36Z

81 is just an example number here

Correct. Sorry for the confusion. maxiter=81 and that can be customized.

It's definitely possible to run multiple of these at the same time,

Then for synchronous Hyperband I am satisfied. My other concerns go out the window.

There might be something more with tpot or Bayesian sampling, but I'll look at the code first.

If we choose to get fancier we can get fancier easily.

Could you push your code? I take it the notebook was run with modifications to _fit. I'd like to look that code over.

mrocklin · 2018-07-17T23:34:27Z

Ah! Indeed. I was operating on the wrong branch. Pushed!

mrocklin · 2018-07-17T23:41:26Z

I recommend the graph view of the dashboard when executing.

My current plan is to leave this as-is until some sort of case study arrives to benchmark it against. I'm inclined to experiment a bit with successive_halving to see how different decay rates behave and then use that as motivation for hyperband after we have some practical experience. Presumably UI bits would come after that.

TomAugspurger · 2018-07-19T20:22:17Z

Pushed some formatting changes.

Will take a closer look later (tomorrow most likely).

stsievert · 2018-07-26T02:11:10Z

dask_ml/model_selection/_incremental.py

+    """ Create a model by cloning and then setting params """
+    with log_errors(pdb=True):
+        model = clone(model).set_params(**params)
+        return model, {"ident": ident, "params": params, "time_step": -1}


I got confused with time_step here. I thought it referred to the number of times update was called, not the number of times partial_fit was called.

stsievert · 2018-07-26T03:06:41Z

This looks pretty good. A lot of my concerns are addressed by the fact that this

works with classes (e.g., update=cls.fit), so arbitrary state can be stored and arbitrary functions can be run.
can be called in parallel (though I still need to verify this works).

Making this work with asynchronous Hyperband will be some work but not an incredible amount.

mrocklin · 2018-07-26T15:39:37Z

Making this work with asynchronous Hyperband will be some work but not an incredible amount.

Yeah, I'm happy to set up concurrent execution of successive halving for hyperband (or walk someone else through it) after we have a bit more understaning of how randomsearch and successive halving perform. I don't think that building hyperband is likely to be a priority until then.

This implements a simple approach to do hyper-parameter optimization on incrementally trained models. It is heavily insprired by (and steals content from) dask#221 but attempts less. Currently it implements a function ``fit``, which takes in a model, training and testing data and a parameter space. It creates many models and trains them on a block of data, scores them, and then keeps only the better performing ones. It continues to do this until only one is selected. This function is not intended for public users, but is hopefully something we can play with as we eventually build a full solution. It is decently configurable today and we will need to find sane defaults. Notably, this function does not take opinions on the following: 1. How testing data should be separated from training data, currently they are both explicitly provided 2. How many parameters to start with, or how to decrease this over time These are given as inputs with an integer and function respectively Additionally it does not, but probably should provide configurations for the following: 1. How to select models from the current group 2. How to decide on stopping criteria 3. How any ability actively decide the number of surviving models based on the scores received so far I believe that this implementation is a decent start though and should be a conversation piece. I think that it makes some decisions that we don't want to think about yet (online determination of how many models to keep) while giving us knobs for other parameters that I think we'll want to play with (model decay functions, how to handle testing, etc..)

This removes logic from the fit function to make it less about successive halving, and be more general to support Incremental.fit and RandomSearchCV.

stsievert · 2018-07-28T01:59:04Z

I thought I discovered some bugs, but turns out I didn't.

stsievert · 2018-08-31T22:49:07Z

In this, it'd be nice to support keyboard interrupts in this, and preserve the models/infos/etc on exit.

My use case is a user firing a keyboard interrupt and still recovering the best model's score/related info.

mrocklin · 2018-08-31T23:30:50Z

Agreed. That's a nice idea. I'll take a look at that soonish.

…

On Fri, Aug 31, 2018 at 6:49 PM, Scott Sievert ***@***.***> wrote: In this, it'd be nice to support keyboard interrupts in this, and preserve the models/infos/etc on exit. My use case is a user firing a keyboard interrupt and still recovering the best model's score/related info. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#288 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszBaq2dz1JdoM9SrmMiBeA08sN2ptks5uWb1jgaJpZM4VGzHI> .

mrocklin · 2018-08-31T23:31:47Z

"soonish" meaning in the next few days On Fri, Aug 31, 2018 at 7:30 PM, Matthew Rocklin <mrocklin@anaconda.com> wrote:

…

Agreed. That's a nice idea. I'll take a look at that soonish. On Fri, Aug 31, 2018 at 6:49 PM, Scott Sievert ***@***.***> wrote: > In this, it'd be nice to support keyboard interrupts in this, and > preserve the models/infos/etc on exit. > > My use case is a user firing a keyboard interrupt and still recovering > the best model's score/related info. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#288 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AASszBaq2dz1JdoM9SrmMiBeA08sN2ptks5uWb1jgaJpZM4VGzHI> > . >

mrocklin · 2018-09-05T13:02:08Z

@TomAugspurger do you have thoughts on the right way to make a RandomSearch version of this that is API-familiar to a Scikit-Learn user? Is there even an established API for this kind of thing with partial_fit?

TomAugspurger

@TomAugspurger do you have thoughts on the right way to make a RandomSearch version of this that is API-familiar to a Scikit-Learn user?

Does something roughly like this make sense?

class IncrementalSearch(BaseEstimator):
    def __init__(self, estimator, params, additional_calls, X_test, y_test):
        self.estimator = estimator
        self.params = params
        self.additional_calls = additional_calls
        self.X_test = X_test
        self.y_test = y_test

    def fit(self, X, y=None, **fit_params):
        info, model, history = fit(self.estimator, self.params, X, y, self.X_test,
                                   self.y_test, self.additional_calls, **fit_params)
        self.info_ = info
        self.model_ = model
        self.history_ = history

The only strange part is passing the validation set to the init. That could maybe be done as a fit param, and plucked from fit_params in fit. I'm not sure there's anything in scikit-learn that takes a validation set as an argument (maybe @ogrisel knows otherwise). Estimators like SGDClassifier will internally create a validation set from the training data when early_stopping=True. Perhaps we could do the same?

TomAugspurger · 2018-09-05T13:17:24Z

dask_ml/model_selection/_incremental.py

+    raise gen.Return((info, models, history))
+
+
+def fit(*args, **kwargs):


Would be best to make this the actual signature.

TomAugspurger · 2018-09-05T13:17:44Z

dask_ml/model_selection/_incremental.py

+        Numpy array or small dask array.  Should fit in memory.
+    y_test : Array
+        Numpy array or small dask array.  Should fit in memory.
+    start : int


removed I think.

Add additional_calls.

TomAugspurger · 2018-09-05T13:18:07Z

dask_ml/model_selection/_incremental.py

+    >>> X_train = X[100000:]
+    >>> y_train = y[100000:]
+
+    >>> info, model, history = yield fit(model, params,


Need to update.

mrocklin · 2018-09-05T13:56:52Z

class IncrementalSearch(BaseEstimator):

Would it make sense to factor this out from the Hyperband work going on now?

TomAugspurger · 2018-09-05T14:27:21Z

Yes, I think so. But I'm not sure what order we should do things in.

IncrementalSearch (is that the best name?) that provides a kind of baseline for this class of search models
Hyperband is a relatively minimal extension to IncrementalSearch implementing its strategies for model selection, etc.

does that conform with others' expectations? I'm happy to take over the work from those that are busy with other things.

mrocklin · 2018-09-05T14:45:54Z

I've updated the docstring considerably, including a runnable example. Tests will fail until dask/distributed#2232 is merged in.

mrocklin · 2018-09-05T14:48:56Z

Yes, I think so. But I'm not sure what order we should do things in.

IncrementalSearch (is that the best name?) that provides a kind of baseline for this class of search models

Hyperband is a relatively minimal extension to IncrementalSearch implementing its strategies for model selection, etc.

does that conform with others' expectations? I'm happy to take over the work from those that are busy with other things.

Yes, I think that ideally something like Hyperband would look something like the following:

class Hyperband(IncrementalSearch):
    def __init__(**hyperband_specific_kwargs);
        ...

    def additional_calls(self, scores):
        ...

If this existed I think that it would be easier to experiment with this sort of thing.

I should write up documentation for this PR, but I might want to wait until such a super-class exists.

mrocklin · 2018-09-05T14:49:31Z

I would be happy to leave the sklearn-specific conventions to someone else :)

TomAugspurger · 2018-09-05T14:55:27Z

OK then, so how's this for a plan

Merge add byte_keys to unpack_remotedata call distributed#2232
Merge this
I take a pass at an IncrementalSearch base class today
We rebase Hyperband on IncrementalSearch

And followup items for documentation, examples, etc. of IncrementalSeach

The goal being to maximize the effectiveness of @stsievert's time :)

mrocklin · 2018-09-05T14:58:44Z

Sounds good to me

…

On Wed, Sep 5, 2018 at 10:55 AM, Tom Augspurger ***@***.***> wrote: OK then, so how's this for a plan 1. Merge dask/distributed#2232 <dask/distributed#2232> 2. Merge this 3. I take a pass at an IncrementalSearch base class today 4. We rebase Hyperband on IncrementalSearch And followup items for documentation, examples, etc. of IncrementalSeach The goal being to maximize the effectiveness of @stsievert <https://github.com/stsievert>'s time :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#288 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszPmVR3XDUo_xlrdACmhvYLI3NxQxks5uX-X5gaJpZM4VGzHI> .

mrocklin · 2018-09-05T15:11:17Z

dask_ml/model_selection/_incremental.py

@@ -218,10 +219,23 @@ def get_futures(partial_fit_calls):

        new_scores = list(_scores2.values())

+    models = {k: client.submit(operator.getitem, v, 0) for k, v in models.items()}


This will break the current Hyperband implementation BTW

stsievert · 2018-09-05T16:02:07Z

class IncrementalSearch(BaseEstimator):

Perfect – I had this idea when playing with stop_on_plateau, but didn't bring it up. This should also inherit from DaskBaseSearchCV, right?

TomAugspurger · 2018-09-05T16:08:15Z

That's my current plan. Should have something later this afternoon.

…

On Wed, Sep 5, 2018 at 11:02 AM Scott Sievert ***@***.***> wrote: class IncrementalSearch(BaseEstimator): Perfect – I had this idea when playing with stop_on_plateau, but didn't bring it up. This should also inherit from DaskBaseSearchCV, right? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#288 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIikP5ChUY33NGFup36R6nltlSHRcks5uX_WKgaJpZM4VGzHI> .

mrocklin · 2018-09-05T16:59:58Z

Good to merge any time @TomAugspurger . I'll leave this to you

TomAugspurger · 2018-09-05T17:01:40Z

dask_ml/model_selection/_incremental.py

+    if isinstance(X_test, da.Array):
+        X_test = client.compute(X_test)
+    else:
+        y_test = yield client.scatter(y_test)


Should this be X_test = yield client.scatter(X_test)?

Yes, it probably should. My mistake.

TomAugspurger · 2018-09-05T18:24:18Z

Updated that, and made the data and a bit smaller in a hope to make it run a bit faster.

commit 41b0af2 Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Wed Sep 5 11:11:20 2018 -0500 [WIP]: Base class for estimators using incremental fit commit 80f0b14 Author: Matthew Rocklin <mrocklin@gmail.com> Date: Wed Sep 5 14:44:12 2018 -0400 Incremental model selection (dask#288) This implements a simple approach to do hyper-parameter optimization on incrementally trained models. It is heavily insprired by (and steals content from) dask#221 but attempts less.

mrocklin commented Jul 8, 2018

View reviewed changes

stsievert reviewed Jul 8, 2018

View reviewed changes

mrocklin mentioned this pull request Jul 16, 2018

Multiple epochs for Incremental #264

Open

stsievert reviewed Jul 26, 2018

View reviewed changes

dask deleted a comment from mrocklin Jul 27, 2018

stsievert mentioned this pull request Jul 27, 2018

Patch for successive halving mrocklin/dask-ml#1

Closed

mrocklin added 3 commits July 27, 2018 17:13

use single-threaded scheduler in make_classification

cb5c2ee

Clean up incremental model selection

aaaf7e8

This removes logic from the fit function to make it less about successive halving, and be more general to support Incremental.fit and RandomSearchCV.

mrocklin force-pushed the incremental-model-selection branch from b213809 to aaaf7e8 Compare July 28, 2018 00:15

stsievert mentioned this pull request Aug 1, 2018

ENH: Hyperband implementation #221

Merged

7 tasks

lint

91453c3

TomAugspurger reviewed Sep 5, 2018

View reviewed changes

Add more comprehensive docstring with runnable example

0f6a983

mrocklin commented Sep 5, 2018

View reviewed changes

TomAugspurger added 3 commits September 5, 2018 10:47

Reproducible docstring

72cdf8e

imports

2f69601

imports

0cd9f45

TomAugspurger added 2 commits September 5, 2018 11:33

Maybe fix one upstream warning

eb16597

isort tests

cfba8ed

TomAugspurger reviewed Sep 5, 2018

View reviewed changes

TomAugspurger added 2 commits September 5, 2018 13:18

Smaller test

d091264

Smaller yet

9f58f20

TomAugspurger merged commit 80f0b14 into dask:master Sep 5, 2018

mrocklin deleted the incremental-model-selection branch September 5, 2018 21:01

		raise gen.Return((info, models, history))


		def fit(args, *kwargs):

		@@ -218,10 +219,23 @@ def get_futures(partial_fit_calls):

		new_scores = list(_scores2.values())

		models = {k: client.submit(operator.getitem, v, 0) for k, v in models.items()}

Incremental model selection #288

Incremental model selection #288

Conversation

mrocklin commented Jul 8, 2018

Choose a reason for hiding this comment

stsievert left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrocklin commented Jul 8, 2018 via email

mrocklin commented Jul 16, 2018 • edited

mrocklin commented Jul 16, 2018

stsievert commented Jul 16, 2018

mrocklin commented Jul 17, 2018

stsievert commented Jul 17, 2018

mrocklin commented Jul 17, 2018

mrocklin commented Jul 17, 2018

stsievert commented Jul 17, 2018

mrocklin commented Jul 17, 2018

mrocklin commented Jul 17, 2018

mrocklin commented Jul 17, 2018

stsievert commented Jul 17, 2018

mrocklin commented Jul 17, 2018

mrocklin commented Jul 17, 2018

TomAugspurger commented Jul 19, 2018

Choose a reason for hiding this comment

stsievert commented Jul 26, 2018 • edited

mrocklin commented Jul 26, 2018

stsievert commented Jul 28, 2018

stsievert commented Aug 31, 2018

mrocklin commented Aug 31, 2018 via email

mrocklin commented Aug 31, 2018 via email

mrocklin commented Sep 5, 2018

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrocklin commented Sep 5, 2018

TomAugspurger commented Sep 5, 2018

mrocklin commented Sep 5, 2018

mrocklin commented Sep 5, 2018

mrocklin commented Sep 5, 2018

TomAugspurger commented Sep 5, 2018

mrocklin commented Sep 5, 2018 via email

Choose a reason for hiding this comment

stsievert commented Sep 5, 2018

TomAugspurger commented Sep 5, 2018 via email

mrocklin commented Sep 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Sep 5, 2018

stsievert left a comment •

edited

mrocklin commented Jul 16, 2018 •

edited

stsievert commented Jul 26, 2018 •

edited