-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make DefaultAlgorithm
the default AutoML Algorithm
#3261
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3261 +/- ##
=======================================
- Coverage 99.8% 99.7% -0.1%
=======================================
Files 324 327 +3
Lines 31764 31814 +50
=======================================
+ Hits 31674 31689 +15
- Misses 90 125 +35
Continue to review full report at Codecov.
|
…e created after the fact in default
…, must use iterative here
…d remove uncessary mock logic
…and remove extra call
…ents to test_automl_respects_pipeline_parameters_with_duplicate_components_iterative and use iterative
…ams algorithm agnostic
…line_parameter_warnings_component_graphs_iterative and use iterative
… and use iterative
…erative and use iterative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jeremy - great job here. I like what you're doing with the test refactoring and I think you're really hinting at the need to split integration vs. unit testing, which I think will require more planning and thought. It would be nice if we could get some of those coverage gaps filled, but I won't block on that. I think if you want to hit the ones that are low hanging fruit, that would be great.
evalml/automl/automl_search.py
Outdated
@@ -446,7 +446,7 @@ def __init__( | |||
allow_long_running_models=False, | |||
_ensembling_split_size=0.2, | |||
_pipelines_per_batch=5, | |||
_automl_algorithm="iterative", | |||
_automl_algorithm="default", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable.
"median", | ||
] | ||
assert 10 <= row["parameters"]["One Hot Encoder"]["top_n"] <= 12 | ||
assert automl._automl_algorithm._custom_hyperparameters == custom_hyperparameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. So begins the work of disentangling unit tests from integration tests.
verbose=verbose, | ||
) | ||
env = AutoMLTestEnv("binary") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
if "Mode Baseline Binary" in row["pipeline_name"]: | ||
continue | ||
if row["pipeline_name"] == "Name_linear": | ||
assert row["parameters"]["Imputer"]["numeric_impute_strategy"] == "mean" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will require some investigation. We probably want to clear these up before we merge this PR. Maybe run these tests and see what is contained in row["pipeline_name"]
to try and verify the intent of this line. I suspect this is the product of some naming rot as "Name_linear" doesn't look like anything I've seen, stylistically, since being here.
@chukarsten I originally filed #3292 to fix the coverage issues but I'll see if I can get a quick win here! |
…_with_duplicate_components
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! Great work on pushing this forward, I just left a few nits/questions, but nothing blocking.
I know you've run some perf tests with the Default Algo before, but I think it could be useful to run perf tests on this PR versus current main, just to ensure there is no change in performance for peace of mind.
@@ -108,7 +108,13 @@ def test_pipeline_count( | |||
else: | |||
X, y = X_y_regression | |||
if not allowed_families: | |||
_ = AutoMLSearch(X_train=X, y_train=y, problem_type=type_, verbose=verbose) | |||
_ = AutoMLSearch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to be addressed for default algo in a future PR?
* Change default max batches to fast mode * Make automl_algorithm parameter public * RL * Fix release notes * Add more logic to max_batches change * Fix tests to work with new max batches * Drop periods down to 100 * Add default max batches property to algos * Lint * Add time series logic to default max batches of default algo * Fix test_max_batches_num_pipelines
Fixes #2867. Since there is a lot in this PR, I will use the PR for #2693 to fix docstrings and think about changes to
AutoMLSearch
API.I will be doing the following in this PR:
_iterative
tag_iterative
tests to their own filesI will also create a list of tests that we can improve in #2868 with a
MockAlgorithm
with these following criteria:List of tests that we can use a mock algorithm with:
List of tests we can maybe generalize with much larger lift:
Performance tests vs. using top level search method (results are identical):
default_swap_com.html.zip