-
Notifications
You must be signed in to change notification settings - Fork 92
Move pipeline building into IterativeAlgorithm
#2854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
IterativeAlgorithm
Codecov Report
@@ Coverage Diff @@
## main #2854 +/- ##
=======================================
- Coverage 99.7% 99.7% -0.0%
=======================================
Files 302 302
Lines 28256 28296 +40
=======================================
+ Hits 28164 28200 +36
- Misses 92 96 +4
Continue to review full report at Codecov.
|
| " and Real!" | ||
| ) | ||
|
|
||
| def _create_pipelines(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is ripped out of AutoMLSearch and the API changes in IterativeAlgorithm accommodates this. Now DefaultAlgorithm and IterativeAlgorithm have more similar APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultAlgorithm doesn't have the same _create_pipelines API, right? Or do you just mean because we moved the logic around so we have more similar dependencies / parameter expectations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the IterativeAlgorithm.__init__() parameters!
| raise ValueError("No allowed pipelines to search") | ||
|
|
||
| if self.ensembling and len(self.allowed_pipelines) == 1: | ||
| self.logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also opted to move all the logging in as well. From my understanding of the logger, there will be no change in output.
| parameters[name] = component_parameters | ||
| return parameters | ||
|
|
||
| def _catch_warnings(self, warning_list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was only used in pipeline building so I moved it in as well.
| ) | ||
| self.logger.debug( | ||
| f"allowed_model_families set to {self.allowed_model_families}" | ||
| text_in_ensembling = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opted to leave this out as both IterativeAlgorithm and DefaultAlgorithm take in text_in_ensembling as arguments.
| def assert_allowed_pipelines_equal_helper( | ||
| actual_allowed_pipelines, expected_allowed_pipelines | ||
| ): | ||
| actual_allowed_pipelines.sort(key=lambda p: p.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since pipelines used to be built in AutoMLSearch, tests would compare against an unsorted list of pipelines. Changed due to self.allowed_pipelines in IterativeAlgorithm sorting due to _ESTIMATOR_FAMILY_ORDER.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to do this, we have tests in place then to confirm that the order of allowed_pipelines as as expected? My concern here is that by sorting both lists, we're able to confirm that the types of pipelines match, but not the order in which they're executed anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree, but from what I could tell assert_allowed_pipelines_equal_helper isn't used in any tests that are checking for order and we have tests in test_iterative_algorithm.py like test_iterative_algorithm_first_batch_order_param that do account for the case!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify: since most of the tests in test_automl etc use the default iterative algorithm, the pipelines will be sorted in the order defined by _ESTIMATOR_FAMILY_ORDER in iterative algorithm. But since these tests directly compare against make_pipeline, the pipelines compared are in a different order. Another solution to this would be to sort the order into _ESTIMATOR_FAMILY_ORDER but I opted for a simpler solution.
angela97lin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good--left some smaller comments to address before merging, but otherwise pretty excited by this cleanup and separation!
| " and Real!" | ||
| ) | ||
|
|
||
| def _create_pipelines(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultAlgorithm doesn't have the same _create_pipelines API, right? Or do you just mean because we moved the logic around so we have more similar dependencies / parameter expectations?
| def assert_allowed_pipelines_equal_helper( | ||
| actual_allowed_pipelines, expected_allowed_pipelines | ||
| ): | ||
| actual_allowed_pipelines.sort(key=lambda p: p.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to do this, we have tests in place then to confirm that the order of allowed_pipelines as as expected? My concern here is that by sorting both lists, we're able to confirm that the types of pipelines match, but not the order in which they're executed anymore.
bchen1116
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Left two nits about docs
Fixes #2656. Mainly moving the pipeline building logic out of
AutoMLSearchand into_create_pipelines()inIterativeAlgorithm. Will comment with my decisions on the PR but most testing changes were due to the API changes inIterativeAlgorithmor how pipelines were passed intoIterativeAlgorithm.One of the requirements of #2656 is:
Due to the length of this PR, I will make another issue for that specific requirement.