Make DefaultAlgorithm the default AutoML Algorithm#3261
Make DefaultAlgorithm the default AutoML Algorithm#3261jeremyliweishih merged 76 commits intomainfrom
DefaultAlgorithm the default AutoML Algorithm#3261Conversation
Codecov Report
@@ Coverage Diff @@
## main #3261 +/- ##
=======================================
- Coverage 99.8% 99.7% -0.1%
=======================================
Files 324 327 +3
Lines 31764 31814 +50
=======================================
+ Hits 31674 31689 +15
- Misses 90 125 +35
Continue to review full report at Codecov.
|
…e created after the fact in default
…, must use iterative here
…d remove uncessary mock logic
…and remove extra call
…ents to test_automl_respects_pipeline_parameters_with_duplicate_components_iterative and use iterative
…ams algorithm agnostic
…line_parameter_warnings_component_graphs_iterative and use iterative
… and use iterative
…erative and use iterative
chukarsten
left a comment
There was a problem hiding this comment.
Jeremy - great job here. I like what you're doing with the test refactoring and I think you're really hinting at the need to split integration vs. unit testing, which I think will require more planning and thought. It would be nice if we could get some of those coverage gaps filled, but I won't block on that. I think if you want to hit the ones that are low hanging fruit, that would be great.
evalml/automl/automl_search.py
Outdated
| _ensembling_split_size=0.2, | ||
| _pipelines_per_batch=5, | ||
| _automl_algorithm="iterative", | ||
| _automl_algorithm="default", |
| "median", | ||
| ] | ||
| assert 10 <= row["parameters"]["One Hot Encoder"]["top_n"] <= 12 | ||
| assert automl._automl_algorithm._custom_hyperparameters == custom_hyperparameters |
There was a problem hiding this comment.
Agreed. So begins the work of disentangling unit tests from integration tests.
| max_batches=1, | ||
| verbose=verbose, | ||
| ) | ||
| env = AutoMLTestEnv("binary") |
| if "Mode Baseline Binary" in row["pipeline_name"]: | ||
| continue | ||
| if row["pipeline_name"] == "Name_linear": | ||
| assert row["parameters"]["Imputer"]["numeric_impute_strategy"] == "mean" |
There was a problem hiding this comment.
I think this will require some investigation. We probably want to clear these up before we merge this PR. Maybe run these tests and see what is contained in row["pipeline_name"] to try and verify the intent of this line. I suspect this is the product of some naming rot as "Name_linear" doesn't look like anything I've seen, stylistically, since being here.
|
@chukarsten I originally filed #3292 to fix the coverage issues but I'll see if I can get a quick win here! |
…_with_duplicate_components
bchen1116
left a comment
There was a problem hiding this comment.
This looks good to me! Great work on pushing this forward, I just left a few nits/questions, but nothing blocking.
I know you've run some perf tests with the Default Algo before, but I think it could be useful to run perf tests on this PR versus current main, just to ensure there is no change in performance for peace of mind.
| X, y = X_y_regression | ||
| if not allowed_families: | ||
| _ = AutoMLSearch(X_train=X, y_train=y, problem_type=type_, verbose=verbose) | ||
| _ = AutoMLSearch( |
There was a problem hiding this comment.
Is this going to be addressed for default algo in a future PR?
* Change default max batches to fast mode * Make automl_algorithm parameter public * RL * Fix release notes * Add more logic to max_batches change * Fix tests to work with new max batches * Drop periods down to 100 * Add default max batches property to algos * Lint * Add time series logic to default max batches of default algo * Fix test_max_batches_num_pipelines
Fixes #2867. Since there is a lot in this PR, I will use the PR for #2693 to fix docstrings and think about changes to
AutoMLSearchAPI.I will be doing the following in this PR:
_iterativetag_iterativetests to their own filesI will also create a list of tests that we can improve in #2868 with a
MockAlgorithmwith these following criteria:List of tests that we can use a mock algorithm with:
List of tests we can maybe generalize with much larger lift:
Performance tests vs. using top level search method (results are identical):
default_swap_com.html.zip