-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move pipeline building into IterativeAlgorithm
#2854
Conversation
IterativeAlgorithm
Codecov Report
@@ Coverage Diff @@
## main #2854 +/- ##
=======================================
- Coverage 99.7% 99.7% -0.0%
=======================================
Files 302 302
Lines 28256 28296 +40
=======================================
+ Hits 28164 28200 +36
- Misses 92 96 +4
Continue to review full report at Codecov.
|
@@ -129,6 +193,88 @@ def __init__( | |||
" and Real!" | |||
) | |||
|
|||
def _create_pipelines(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is ripped out of AutoMLSearch
and the API changes in IterativeAlgorithm
accommodates this. Now DefaultAlgorithm
and IterativeAlgorithm
have more similar APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultAlgorithm doesn't have the same _create_pipelines
API, right? Or do you just mean because we moved the logic around so we have more similar dependencies / parameter expectations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the IterativeAlgorithm.__init__()
parameters!
raise ValueError("No allowed pipelines to search") | ||
|
||
if self.ensembling and len(self.allowed_pipelines) == 1: | ||
self.logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also opted to move all the logging in as well. From my understanding of the logger, there will be no change in output.
@@ -279,3 +481,27 @@ def _transform_parameters(self, pipeline, proposed_parameters): | |||
component_parameters[param_name] = value | |||
parameters[name] = component_parameters | |||
return parameters | |||
|
|||
def _catch_warnings(self, warning_list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was only used in pipeline building so I moved it in as well.
) | ||
self.logger.debug( | ||
f"allowed_model_families set to {self.allowed_model_families}" | ||
text_in_ensembling = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opted to leave this out as both IterativeAlgorithm
and DefaultAlgorithm
take in text_in_ensembling
as arguments.
@@ -193,6 +193,8 @@ def assert_allowed_pipelines_equal_helper(): | |||
def assert_allowed_pipelines_equal_helper( | |||
actual_allowed_pipelines, expected_allowed_pipelines | |||
): | |||
actual_allowed_pipelines.sort(key=lambda p: p.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since pipelines used to be built in AutoMLSearch
, tests would compare against an unsorted list of pipelines. Changed due to self.allowed_pipelines
in IterativeAlgorithm
sorting due to _ESTIMATOR_FAMILY_ORDER
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to do this, we have tests in place then to confirm that the order of allowed_pipelines as as expected? My concern here is that by sorting both lists, we're able to confirm that the types of pipelines match, but not the order in which they're executed anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree, but from what I could tell assert_allowed_pipelines_equal_helper
isn't used in any tests that are checking for order and we have tests in test_iterative_algorithm.py
like test_iterative_algorithm_first_batch_order_param
that do account for the case!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify: since most of the tests in test_automl
etc use the default iterative algorithm, the pipelines will be sorted in the order defined by _ESTIMATOR_FAMILY_ORDER
in iterative algorithm. But since these tests directly compare against make_pipeline
, the pipelines compared are in a different order. Another solution to this would be to sort the order into _ESTIMATOR_FAMILY_ORDER
but I opted for a simpler solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good--left some smaller comments to address before merging, but otherwise pretty excited by this cleanup and separation!
@@ -129,6 +193,88 @@ def __init__( | |||
" and Real!" | |||
) | |||
|
|||
def _create_pipelines(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultAlgorithm doesn't have the same _create_pipelines
API, right? Or do you just mean because we moved the logic around so we have more similar dependencies / parameter expectations?
@@ -193,6 +193,8 @@ def assert_allowed_pipelines_equal_helper(): | |||
def assert_allowed_pipelines_equal_helper( | |||
actual_allowed_pipelines, expected_allowed_pipelines | |||
): | |||
actual_allowed_pipelines.sort(key=lambda p: p.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to do this, we have tests in place then to confirm that the order of allowed_pipelines as as expected? My concern here is that by sorting both lists, we're able to confirm that the types of pipelines match, but not the order in which they're executed anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Left two nits about docs
Fixes #2656. Mainly moving the pipeline building logic out of
AutoMLSearch
and into_create_pipelines()
inIterativeAlgorithm
. Will comment with my decisions on the PR but most testing changes were due to the API changes inIterativeAlgorithm
or how pipelines were passed intoIterativeAlgorithm
.One of the requirements of #2656 is:
Due to the length of this PR, I will make another issue for that specific requirement.