Skip to content

Change AutoMLSearch parameters for AutoML Algorithm Upgrade#3304

Merged
jeremyliweishih merged 11 commits into
js_2867_defaultfrom
js_2867_change_search_for_default
Feb 10, 2022
Merged

Change AutoMLSearch parameters for AutoML Algorithm Upgrade#3304
jeremyliweishih merged 11 commits into
js_2867_defaultfrom
js_2867_change_search_for_default

Conversation

@jeremyliweishih
Copy link
Copy Markdown
Collaborator

@jeremyliweishih jeremyliweishih commented Feb 3, 2022

Adds on to #3261.

Changes:

  • Set default max batches in search to 4 batches (3 for time series), which corresponds to fast mode
  • Make _automl_algorithm parameter public

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 3, 2022

Codecov Report

Merging #3304 (8e3d661) into js_2867_default (df9f827) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@                Coverage Diff                @@
##           js_2867_default   #3304     +/-   ##
=================================================
+ Coverage             99.6%   99.7%   +0.1%     
=================================================
  Files                  325     325             
  Lines                31750   31764     +14     
=================================================
+ Hits                 31620   31639     +19     
+ Misses                 130     125      -5     
Impacted Files Coverage Δ
...sts/test_automl_search_classification_iterative.py 100.0% <ø> (ø)
...l_tests/test_automl_search_regression_iterative.py 100.0% <ø> (ø)
evalml/tests/utils_tests/test_logger.py 100.0% <ø> (ø)
evalml/automl/automl_algorithm/automl_algorithm.py 100.0% <100.0%> (ø)
...valml/automl/automl_algorithm/default_algorithm.py 100.0% <100.0%> (ø)
evalml/automl/automl_search.py 99.7% <100.0%> (ø)
...ts/automl_tests/parallel_tests/test_automl_dask.py 96.3% <100.0%> (ø)
evalml/tests/automl_tests/test_automl.py 99.5% <100.0%> (+0.3%) ⬆️
...ts/automl_tests/test_automl_iterative_algorithm.py 100.0% <100.0%> (ø)
.../automl_tests/test_automl_search_classification.py 96.4% <100.0%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update df9f827...8e3d661. Read the comment docs.

Comment thread evalml/automl/automl_search.py Outdated
n_automl_pipelines = n_results
assert automl._automl_algorithm.batch_number == max_batches
assert automl._automl_algorithm.pipeline_number + 1 == n_automl_pipelines
if max_batches is None:
Copy link
Copy Markdown
Collaborator Author

@jeremyliweishih jeremyliweishih Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a weird one and let me try to explain! In this PR, I make these changes which not only set the default behavior for max batches but more importantly moved this logic after the automl algorithm is instantiated. So, on main, the default max batches is set before the automl algorithm is set. This meant that if max_batches, max_iterations etc. were all None on main, max_batches would be set to 1 and would be passed into IterativeAlgorithm. However, IterativeAlgorithm has this logic which sets max_iterations if ensembling = True and if max_batches is not None. This meant that on main , turning ensembling=True would mean that AutoMLSearch would use max_iterations as stopping criteria. When I made the changes to search in this PR, it meant that search was correctly using max_batches as stopping criteria which meant that search would stop when batch_number > max_batches. These changes to this test reflect that and with an added complication of _pipelines_per_batch.

This also means that if we're using IterativeAlgorithm and we set max_batches and ensembling, max_iterations will be the final stopping criterion. However, this isn't new behavior.

The question now is: what should intended behavior be? Should max_iterations be used as stopping criterion if max_iterations is not set? Or should we stick with max_batches as the default stopping criteria.

Other food for thought: if max_batches is the stopping criteria, is it weird if automl.automl_algorithm.batch_number == max_batches + 1?

@property
def default_max_batches(self):
"""Returns the number of max batches AutoMLSearch should run by default."""
return 4 if not is_time_series(self.problem_type) else 3
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to update after ##3191.

self.max_iterations = max_iterations
self.max_batches = max_batches
self._pipelines_per_batch = _pipelines_per_batch
if not self.max_iterations and not self.max_time and not self.max_batches:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this logic to below AutoMLAlgo instantiation to use automl_algorithm.default_max_batches.

Copy link
Copy Markdown
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, Jeremy! I think as a suggestion to anyone reading this comment, we should try to keep small and less significant changes (like publicizing automl_algorithm) in separate PRs to maximize views on perhaps the more impactful parts of a PR. Thanks for doing this, though! Just trying to increase visibility on the more important stuff rather than the tedious!

with env.test_context(score_return_value={"Log Loss Binary": 0.30}):
automl.search()
assert automl.rankings["pipeline_name"][1:].str.contains("Natural Language").all()
assert automl.rankings["pipeline_name"][1:-1].str.contains("Natural Language").all()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's changed here that we're not expecting Natural Language in the last pipeline?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its the ensembling pipeline so it won't contain Natural Language in the pipeline name.

@jeremyliweishih jeremyliweishih merged commit c6f6676 into js_2867_default Feb 10, 2022
jeremyliweishih added a commit that referenced this pull request Feb 14, 2022
* Make default default

* Remove batches case as the limit is algorithm dependent

* Ammend batches case to be greater than 0

* Remove allowed model families and # of pipelines for test_rankings

* Either remove test_automl_tuner_exception or use iterative, tuners are created after the fact in default

* rename to test_automl_feature_selection_with_allowed_component_graphs, must use iterative here

* Rename to test_automl_allowed_component_graphs_iterative_algorithm and remove uncessary mock logic

* Change to use default and explicitely name algorithm

* Rename to test_describe_pipeline_with_ensembling_iterative and use iterative

* Rename to test_component_graph_with_incorrect_problem_type_iterative and remove extra call

* Rename to test_jobs_cancelled_when_keyboard_interrupt_iterative and use iterative

* Rename to test_max_iteration_works_with_stacked_ensemble_iterative and use iterative

* Change max_batches tests to only test max batches and moved original test

* lint

* Change ts regression test to check algorithm pipeline parameters

* Rename test_automl_respects_pipeline_parameters_with_duplicate_components to test_automl_respects_pipeline_parameters_with_duplicate_components_iterative and use iterative

* make test_automl_adds_pipeline_parameters_to_custom_pipeline_hyperparams algorithm agnostic

* Fix test_automl_drop_unknown_columns by adding verbose to default

* Rename test_pipeline_parameter_warnings_component_graphs to test_pipeline_parameter_warnings_component_graphs_iterative and use iterative

* Rename to test_component_and_pipeline_warnings_surface_in_search and generalize

* Rename to test_graph_automl_iterative and use iterative

* Rename to test_automl_respects_pipeline_order_iterative and use iterative'

* Rename to test_get_ensembler_input_pipelines_iterative and use iterative

* Rename to test_automl_one_allowed_component_graph_ensembling_disabled and use iterative

* Generalize test_max_batches_plays_nice_with_other_stopping_criteria

* Rename to test_pipeline_hyperparameters_make_pipeline_other_errors_iterative and use iterative

* Rename to iterative format and use iterative

* Rename to test_automl_ensembling_false_iterative and use iterative

* Rename to iterative format and use iterative

* Use iterative

* lint test_automl.py and related changes

* Fix test_automl_search_classification tests

* Add to RL

* Fix test_automl_search_regression tests

* fix logger tests

* Fix start.ipynb

* actually fix docs

* Fix test_automl_search_sampler_method by using iterative

* lint

* Fix automl.ipynb

* Rename test_automl_feature_selection_with_allowed_component_graphs to test_automl_feature_selection_with_allowed_component_graphs_iterative'

* Rename back to test_callback

* Rename back to test_jobs_cancelled_when_keyboard_interrupt

* Move test_automl iterative tests to test_automl_iterative_algorithm.py

* Move to respective iterative files

* Fix imports

* Remove dask engine space

* Add missing iterative call

* Revert back to 10 iterations for test_automl_tuner_exception

* Move test_callback back

* lint

* Fix test I reverted for some reason

* Lint

* Rename tests

* Fix test_describe_pipeline_with_ensembling

* Fix coverage in test_pipeline_custom_hyperparameters_make_pipeline

* Fix coverage for test_automl_respects_pipeline_custom_hyperparameters_with_duplicate_components

* lint

* Fix doc failures

* Fix release notes

* Fix release notes

* Change `AutoMLSearch` parameters for AutoML Algorithm Upgrade (#3304)

* Change default max batches to fast mode

* Make automl_algorithm parameter public

* RL

* Fix release notes

* Add more logic to max_batches change

* Fix tests to work with new max batches

* Drop periods down to 100

* Add default max batches property to algos

* Lint

* Add time series logic to default max batches of default algo

* Fix test_max_batches_num_pipelines
@freddyaboulton freddyaboulton deleted the js_2867_change_search_for_default branch May 13, 2022 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants