-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop AutoMLSearch from running multiple times #1647
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1647 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 240 240
Lines 18271 18287 +16
=========================================
+ Hits 18263 18279 +16
Misses 8 8
Continue to review full report at Codecov.
|
evalml/automl/automl_search.py
Outdated
# Baseline + first batch + each pipeline iteration + 1 | ||
first_ensembling_iteration = (1 + len(self.allowed_pipelines) + len(self.allowed_pipelines) * self._pipelines_per_batch + 1) | ||
if self.max_iterations < first_ensembling_iteration: | ||
if not self._searched: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure why it changed this much, but I added this if statement to determine if we need to do the search, then put the rest of the search within this if statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered instead just returning instead of true? Saves a little nesting of conditionals and might make the diff a little clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with @chukarsten's comment! Makes it easier to follow along
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call @chukarsten! I'll fix that.
evalml/automl/automl_search.py
Outdated
|
||
best_pipeline = self.rankings.iloc[0] | ||
best_pipeline_name = best_pipeline["pipeline_name"] | ||
self._find_best_pipeline() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the best pipeline here
evalml/automl/automl_search.py
Outdated
self._find_best_pipeline() | ||
logger.info(f"Best pipeline: {best_pipeline_name}") | ||
logger.info(f"Best pipeline {self.objective.name}: {best_pipeline['score']:3f}") | ||
self._searched = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and I set searched to be true here so we won't redo the same search
|
||
def _find_best_pipeline(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New method to find the best pipeline and train only if the pipeline is new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@@ -403,6 +404,10 @@ def search(self, data_checks="auto", show_iteration_plot=True): | |||
show_iteration_plot (boolean, True): Shows an iteration vs. score plot in Jupyter notebook. | |||
Disabled by default in non-Jupyter enviroments. | |||
""" | |||
if self._searched: | |||
logger.info("AutoMLSearch has already been run and will not run again on the same instance. Re-initialize AutoMLSearch to search again.") | |||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bchen1116 hmm after our discussion yesterday I thought we were going with option 3 from the issue writeup, not option 2. Did I remember wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry, I voiced for option 2 since the X_train, y_train
data are now passed through the init of AutoMLSearch
, and during the discussion yesterday, I believe everyone agreed, which is why I proceeded with option 2. Do you think option 3 would be better? My main issue with option 3 is that rerunning the same AutoMLSearch
class, with the same data and defaults, should lead to the same results, and therefore isn't very useful. Additionally, clearing the previous state could be tricky, especially if a user called add_to_rankings
previously. Let me know your thoughts, and we can also discuss more after standup if needed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's entirely fair. Unless we provided a way for people to reconfigure the automl settings on a single automl search instance, rerunning the search from scratch should produce highly similar results. 👍
Sounds good to me. Just wanted to make sure I wasn't missing something.
* Fixed bug where time series baseline estimators were not receiving ``gap`` and ``max_delay`` in ``AutoMLSearch`` :pr:`1645` | ||
* Changes | ||
* Rerunning search for ``AutoMLSearch`` results in a message thrown rather than failing the search, and removed ``has_searched`` property :pr:`1647` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the removal of has_searched
to the breaking changes section as well.
evalml/automl/automl_search.py
Outdated
@@ -403,6 +404,10 @@ def search(self, data_checks="auto", show_iteration_plot=True): | |||
show_iteration_plot (boolean, True): Shows an iteration vs. score plot in Jupyter notebook. | |||
Disabled by default in non-Jupyter enviroments. | |||
""" | |||
if self._searched: | |||
logger.info("AutoMLSearch has already been run and will not run again on the same instance. Re-initialize AutoMLSearch to search again.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit-pick: "AutoMLSearch.search() has already been run..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bchen1116 LGTM!
fix #1424
has_searched
property fromAutoMLSearch
best_pipeline
inadd_to_rankings
as wellAutoMLSearch.search
twiceUpdated search call

The files changed look like I changed a lot, but I commented on the changes I made from the main branch and this branch in
automl_search.py