Change AutoMLSearch parameters for AutoML Algorithm Upgrade#3304
Conversation
Codecov Report
@@ Coverage Diff @@
## js_2867_default #3304 +/- ##
=================================================
+ Coverage 99.6% 99.7% +0.1%
=================================================
Files 325 325
Lines 31750 31764 +14
=================================================
+ Hits 31620 31639 +19
+ Misses 130 125 -5
Continue to review full report at Codecov.
|
| n_automl_pipelines = n_results | ||
| assert automl._automl_algorithm.batch_number == max_batches | ||
| assert automl._automl_algorithm.pipeline_number + 1 == n_automl_pipelines | ||
| if max_batches is None: |
There was a problem hiding this comment.
This is a weird one and let me try to explain! In this PR, I make these changes which not only set the default behavior for max batches but more importantly moved this logic after the automl algorithm is instantiated. So, on main, the default max batches is set before the automl algorithm is set. This meant that if max_batches, max_iterations etc. were all None on main, max_batches would be set to 1 and would be passed into IterativeAlgorithm. However, IterativeAlgorithm has this logic which sets max_iterations if ensembling = True and if max_batches is not None. This meant that on main , turning ensembling=True would mean that AutoMLSearch would use max_iterations as stopping criteria. When I made the changes to search in this PR, it meant that search was correctly using max_batches as stopping criteria which meant that search would stop when batch_number > max_batches. These changes to this test reflect that and with an added complication of _pipelines_per_batch.
This also means that if we're using IterativeAlgorithm and we set max_batches and ensembling, max_iterations will be the final stopping criterion. However, this isn't new behavior.
The question now is: what should intended behavior be? Should max_iterations be used as stopping criterion if max_iterations is not set? Or should we stick with max_batches as the default stopping criteria.
Other food for thought: if max_batches is the stopping criteria, is it weird if automl.automl_algorithm.batch_number == max_batches + 1?
| @property | ||
| def default_max_batches(self): | ||
| """Returns the number of max batches AutoMLSearch should run by default.""" | ||
| return 4 if not is_time_series(self.problem_type) else 3 |
| self.max_iterations = max_iterations | ||
| self.max_batches = max_batches | ||
| self._pipelines_per_batch = _pipelines_per_batch | ||
| if not self.max_iterations and not self.max_time and not self.max_batches: |
There was a problem hiding this comment.
Moved this logic to below AutoMLAlgo instantiation to use automl_algorithm.default_max_batches.
chukarsten
left a comment
There was a problem hiding this comment.
This LGTM, Jeremy! I think as a suggestion to anyone reading this comment, we should try to keep small and less significant changes (like publicizing automl_algorithm) in separate PRs to maximize views on perhaps the more impactful parts of a PR. Thanks for doing this, though! Just trying to increase visibility on the more important stuff rather than the tedious!
| with env.test_context(score_return_value={"Log Loss Binary": 0.30}): | ||
| automl.search() | ||
| assert automl.rankings["pipeline_name"][1:].str.contains("Natural Language").all() | ||
| assert automl.rankings["pipeline_name"][1:-1].str.contains("Natural Language").all() |
There was a problem hiding this comment.
What's changed here that we're not expecting Natural Language in the last pipeline?
There was a problem hiding this comment.
Its the ensembling pipeline so it won't contain Natural Language in the pipeline name.
* Make default default * Remove batches case as the limit is algorithm dependent * Ammend batches case to be greater than 0 * Remove allowed model families and # of pipelines for test_rankings * Either remove test_automl_tuner_exception or use iterative, tuners are created after the fact in default * rename to test_automl_feature_selection_with_allowed_component_graphs, must use iterative here * Rename to test_automl_allowed_component_graphs_iterative_algorithm and remove uncessary mock logic * Change to use default and explicitely name algorithm * Rename to test_describe_pipeline_with_ensembling_iterative and use iterative * Rename to test_component_graph_with_incorrect_problem_type_iterative and remove extra call * Rename to test_jobs_cancelled_when_keyboard_interrupt_iterative and use iterative * Rename to test_max_iteration_works_with_stacked_ensemble_iterative and use iterative * Change max_batches tests to only test max batches and moved original test * lint * Change ts regression test to check algorithm pipeline parameters * Rename test_automl_respects_pipeline_parameters_with_duplicate_components to test_automl_respects_pipeline_parameters_with_duplicate_components_iterative and use iterative * make test_automl_adds_pipeline_parameters_to_custom_pipeline_hyperparams algorithm agnostic * Fix test_automl_drop_unknown_columns by adding verbose to default * Rename test_pipeline_parameter_warnings_component_graphs to test_pipeline_parameter_warnings_component_graphs_iterative and use iterative * Rename to test_component_and_pipeline_warnings_surface_in_search and generalize * Rename to test_graph_automl_iterative and use iterative * Rename to test_automl_respects_pipeline_order_iterative and use iterative' * Rename to test_get_ensembler_input_pipelines_iterative and use iterative * Rename to test_automl_one_allowed_component_graph_ensembling_disabled and use iterative * Generalize test_max_batches_plays_nice_with_other_stopping_criteria * Rename to test_pipeline_hyperparameters_make_pipeline_other_errors_iterative and use iterative * Rename to iterative format and use iterative * Rename to test_automl_ensembling_false_iterative and use iterative * Rename to iterative format and use iterative * Use iterative * lint test_automl.py and related changes * Fix test_automl_search_classification tests * Add to RL * Fix test_automl_search_regression tests * fix logger tests * Fix start.ipynb * actually fix docs * Fix test_automl_search_sampler_method by using iterative * lint * Fix automl.ipynb * Rename test_automl_feature_selection_with_allowed_component_graphs to test_automl_feature_selection_with_allowed_component_graphs_iterative' * Rename back to test_callback * Rename back to test_jobs_cancelled_when_keyboard_interrupt * Move test_automl iterative tests to test_automl_iterative_algorithm.py * Move to respective iterative files * Fix imports * Remove dask engine space * Add missing iterative call * Revert back to 10 iterations for test_automl_tuner_exception * Move test_callback back * lint * Fix test I reverted for some reason * Lint * Rename tests * Fix test_describe_pipeline_with_ensembling * Fix coverage in test_pipeline_custom_hyperparameters_make_pipeline * Fix coverage for test_automl_respects_pipeline_custom_hyperparameters_with_duplicate_components * lint * Fix doc failures * Fix release notes * Fix release notes * Change `AutoMLSearch` parameters for AutoML Algorithm Upgrade (#3304) * Change default max batches to fast mode * Make automl_algorithm parameter public * RL * Fix release notes * Add more logic to max_batches change * Fix tests to work with new max batches * Drop periods down to 100 * Add default max batches property to algos * Lint * Add time series logic to default max batches of default algo * Fix test_max_batches_num_pipelines
Adds on to #3261.
Changes:
_automl_algorithmparameter public