Refactor AutoML Search for Parallel Workers #1295

christopherbunn · 2020-10-09T20:49:01Z

Define an evaluation method (name can be whatever) which takes pipeline+data input, outputs scores/results/CV data etc., with no side-effects
To do this: update _evaluate to call _add_result afterwards instead of inside that method, so that _evaluate has no side effects.
May have to update _compute_cv_scores too

The text was updated successfully, but these errors were encountered:

christopherbunn · 2020-10-21T19:21:05Z

I have a first pass of removing the side effects out of _evaluate. In doing this, I realized that _evaluate is essentially just _compute_cv_scores and the side effect in one function. By removing the side effect, we're just left with a useless wrapper for _compute_cv_scores.

Rather than taking out the side effect or _evaluate altogether, I propose that we go in the opposite direction and beef up _evaluate by renaming to _evaluate_pipelines, keeping the side effect, and moving some of the steps in automl.search() to here. Essentially, this renamed function would:

Evaluate the current pipeline batch.
Update the AutoML Algorithm
For parallel searches, it would copy the required data for _compute_cv_scores for searches into a dictionary and create Dask futures

automl.search() will still take care of the AutoML Algorithm setup, creating new batches, and checking for the stopping condition.

Pros:

It centralizes the pipeline evaluation logic into a function.
- When parallel engines are implemented, this will help prevent search from getting too unwieldy. This becomes more important if we add different ways of processing it other than iteratively and with batching.

Cons:

KeyboardInterrupt logic might need to be adjusted. I think we can just move it to _evaluate_pipeline but not sure if that would mean interrupting the keyboard while the AutoML get another batch would be covered.
Not sure if separating the AutoML update step into a separate function from the AutoML setup and batch generation steps will cause problems down the line.

For the scope of this issue, we would just move the AutoML Algorithm update code (lines 482 to 485) and the function name to _evaluate_pipelines.

dsherry · 2020-10-22T15:36:50Z

@christopherbunn when you say "Update the AutoML Algorithm" that's referring to sending the evaluation results to the AutoMLAlgorithm object, yeah?

Yep I think I follow what you've described here. I do think we should still have one private method in AutoMLSearch which takes the training data and a pipeline as input and returns the pipeline scores; currently, that's _evaluate. Let's keep that encapsulation in some way shape or form. Happy to discuss more verbally.

Hmm, I hear you about the keyboard interrupt. As long as the behavior of that feature doesn't change, I'm on board. If we're gonna change the behavior there, let's discuss first.

christopherbunn added the enhancement An improvement to an existing feature. label Oct 12, 2020

christopherbunn self-assigned this Oct 12, 2020

christopherbunn mentioned this issue Oct 22, 2020

Updated _evaluate_pipelines to consolidate side effects #1337

Merged

dsherry closed this as completed in #1337 Nov 5, 2020

christopherbunn mentioned this issue Nov 5, 2020

Updated _evaluate_pipelines to consolidate side effects #1410

Merged

dsherry reopened this Nov 5, 2020

christopherbunn closed this as completed in #1410 Dec 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor AutoML Search for Parallel Workers #1295

Refactor AutoML Search for Parallel Workers #1295

christopherbunn commented Oct 9, 2020

christopherbunn commented Oct 21, 2020

dsherry commented Oct 22, 2020

Refactor AutoML Search for Parallel Workers #1295

Refactor AutoML Search for Parallel Workers #1295

Comments

christopherbunn commented Oct 9, 2020

christopherbunn commented Oct 21, 2020

dsherry commented Oct 22, 2020