Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop AutoMLSearch from running multiple times #1647

Merged
merged 16 commits into from
Jan 6, 2021
Merged

Conversation

bchen1116
Copy link
Contributor

@bchen1116 bchen1116 commented Jan 5, 2021

fix #1424

  • Remove has_searched property from AutoMLSearch
  • Update best_pipeline in add_to_rankings as well
  • No-op when user calls AutoMLSearch.search twice

Updated search call
image

The files changed look like I changed a lot, but I commented on the changes I made from the main branch and this branch in automl_search.py

@bchen1116 bchen1116 self-assigned this Jan 5, 2021
@codecov
Copy link

codecov bot commented Jan 5, 2021

Codecov Report

Merging #1647 (396c739) into main (593b88a) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1647     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         240      240             
  Lines       18271    18287     +16     
=========================================
+ Hits        18263    18279     +16     
  Misses          8        8             
Impacted Files Coverage Δ
evalml/automl/automl_search.py 99.7% <100.0%> (+0.1%) ⬆️
evalml/tests/automl_tests/test_automl.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 593b88a...396c739. Read the comment docs.

# Baseline + first batch + each pipeline iteration + 1
first_ensembling_iteration = (1 + len(self.allowed_pipelines) + len(self.allowed_pipelines) * self._pipelines_per_batch + 1)
if self.max_iterations < first_ensembling_iteration:
if not self._searched:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure why it changed this much, but I added this if statement to determine if we need to do the search, then put the rest of the search within this if statement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered instead just returning instead of true? Saves a little nesting of conditionals and might make the diff a little clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @chukarsten's comment! Makes it easier to follow along

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call @chukarsten! I'll fix that.


best_pipeline = self.rankings.iloc[0]
best_pipeline_name = best_pipeline["pipeline_name"]
self._find_best_pipeline()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the best pipeline here

self._find_best_pipeline()
logger.info(f"Best pipeline: {best_pipeline_name}")
logger.info(f"Best pipeline {self.objective.name}: {best_pipeline['score']:3f}")
self._searched = True
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I set searched to be true here so we won't redo the same search


def _find_best_pipeline(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New method to find the best pipeline and train only if the pipeline is new

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@@ -403,6 +404,10 @@ def search(self, data_checks="auto", show_iteration_plot=True):
show_iteration_plot (boolean, True): Shows an iteration vs. score plot in Jupyter notebook.
Disabled by default in non-Jupyter enviroments.
"""
if self._searched:
logger.info("AutoMLSearch has already been run and will not run again on the same instance. Re-initialize AutoMLSearch to search again.")
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 hmm after our discussion yesterday I thought we were going with option 3 from the issue writeup, not option 2. Did I remember wrong?

Copy link
Contributor Author

@bchen1116 bchen1116 Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry, I voiced for option 2 since the X_train, y_train data are now passed through the init of AutoMLSearch, and during the discussion yesterday, I believe everyone agreed, which is why I proceeded with option 2. Do you think option 3 would be better? My main issue with option 3 is that rerunning the same AutoMLSearch class, with the same data and defaults, should lead to the same results, and therefore isn't very useful. Additionally, clearing the previous state could be tricky, especially if a user called add_to_rankings previously. Let me know your thoughts, and we can also discuss more after standup if needed!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's entirely fair. Unless we provided a way for people to reconfigure the automl settings on a single automl search instance, rerunning the search from scratch should produce highly similar results. 👍

Sounds good to me. Just wanted to make sure I wasn't missing something.

* Fixed bug where time series baseline estimators were not receiving ``gap`` and ``max_delay`` in ``AutoMLSearch`` :pr:`1645`
* Changes
* Rerunning search for ``AutoMLSearch`` results in a message thrown rather than failing the search, and removed ``has_searched`` property :pr:`1647`
Copy link
Contributor

@dsherry dsherry Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the removal of has_searched to the breaking changes section as well.

@@ -403,6 +404,10 @@ def search(self, data_checks="auto", show_iteration_plot=True):
show_iteration_plot (boolean, True): Shows an iteration vs. score plot in Jupyter notebook.
Disabled by default in non-Jupyter enviroments.
"""
if self._searched:
logger.info("AutoMLSearch has already been run and will not run again on the same instance. Re-initialize AutoMLSearch to search again.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit-pick: "AutoMLSearch.search() has already been run..."

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 LGTM!

@bchen1116 bchen1116 merged commit 584a00e into main Jan 6, 2021
@bchen1116 bchen1116 mentioned this pull request Jan 26, 2021
@freddyaboulton freddyaboulton deleted the bc_1424_automl branch May 13, 2022 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AutoMLSearch: calling search twice on same instance doesn't work
4 participants