-
Notifications
You must be signed in to change notification settings - Fork 91
Replace allowed_pipelines with allowed_component_graphs #2364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace allowed_pipelines with allowed_component_graphs #2364
Conversation
…yperparameter ranges
Codecov Report
@@ Coverage Diff @@
## main #2364 +/- ##
=======================================
- Coverage 99.7% 99.7% -0.0%
=======================================
Files 281 281
Lines 25014 25057 +43
=======================================
+ Hits 24917 24957 +40
- Misses 97 100 +3
Continue to review full report at Codecov.
|
bchen1116
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't yet finish, but left a few questions that would help me understand! Looking good so far!
| optimize_thresholds=False, | ||
| n_jobs=1, | ||
| ) | ||
| automl._automl_algorithm = IterativeAlgorithm( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you need to define this since we're patching IterativeAlgo? Why did we not need to do it before?
| objective="log loss binary", | ||
| additional_objectives=["f1"], | ||
| ) | ||
| automl._automl_algorithm = IterativeAlgorithm( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the DummyPipeline score is being mocked for the purpose of this test (which is something ComponentGraph doesn't have), I have to specify that pipeline being passed into the IterativeAlgorithm since we can't pass it through AutoMLSearch anymore. This is the same reason it was added for the other tests as well! Luckily it doesn't look like this increases the execution time
freddyaboulton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ParthivNaresh Looks good! I have some minor comments I'd like to discuss before merge though.
evalml/automl/automl_search.py
Outdated
| ) | ||
| unique_names = set() | ||
| for graph in allowed_component_graphs: | ||
| unique_names.add(list(graph.keys())[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should change the expected type of allowed_component_graphs from a list of dicts to just a dict.
So from
allowed_component_graphs=[
{"Name_0": [dummy_classifier_estimator_class]},
{"Name_1": [dummy_classifier_estimator_class]},
{"Name_2": [dummy_classifier_estimator_class]},
],to
allowed_component_graphs={
"Name_0": [dummy_classifier_estimator_class],
"Name_1": [dummy_classifier_estimator_class],
"Name_2": [dummy_classifier_estimator_class]
},I see there's some duplicated code in this pr to unpack the list, e.g list(graph.keys())[0] and next(iter(component_graph)) making this change may make things clearer/reduce code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go a step further...is there any reason to not just expect a list or dictionary of ComponentGraph objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally the format was set as such to support random_seed but since we're not passing that anymore, this would definitely be a better alternative.
| "mean", | ||
| } | ||
| if row["pipeline_name"] == "Name_linear": | ||
| assert row["parameters"]["Imputer"]["numeric_impute_strategy"] == "mean" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's causing this diff? Is it the deletion of _frozen_pipeline_parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think so, since the custom_hyperparameter is set to a constant value, all pipelines will return mean for the Imputer
chukarsten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that was a lot of work. Good job. I think we should discuss whether we want to preserve the two different ways of passing in the component graphs that are allowed. To me, it would seem the way to do it would be to reduce the complexity and only pass in a dict of component graphs (if they need to be named). The handfulls of instances where IterativeAlgorithm is set in the tests is also weird, but I don't think these are blocking to me.
evalml/automl/automl_search.py
Outdated
| ) | ||
| unique_names = set() | ||
| for graph in allowed_component_graphs: | ||
| unique_names.add(list(graph.keys())[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go a step further...is there any reason to not just expect a list or dictionary of ComponentGraph objects?
| automl._automl_algorithm = IterativeAlgorithm( | ||
| max_iterations=4, | ||
| allowed_pipelines=pipelines, | ||
| tuner_class=SKOptTuner, | ||
| random_seed=0, | ||
| n_jobs=-1, | ||
| number_features=X.shape[1], | ||
| pipelines_per_batch=5, | ||
| ensembling=False, | ||
| text_in_ensembling=False, | ||
| pipeline_params={}, | ||
| custom_hyperparameters=None, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these instances of having to set the IterativeAlgorithm are kinda weird.
# Conflicts: # docs/source/release_notes.rst # evalml/tests/automl_tests/test_automl.py # evalml/tests/automl_tests/test_automl_search_classification.py # evalml/tests/automl_tests/test_engine_base.py
…ub.com/alteryx/evalml into Replace-AllowedPipelines-ComponentGraphs
Fixes #2159 and #2166
Here are the docs
Here's an example of the new implementation: