Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move pipeline building into IterativeAlgorithm #2854

Merged
merged 33 commits into from
Oct 7, 2021

Conversation

jeremyliweishih
Copy link
Collaborator

@jeremyliweishih jeremyliweishih commented Sep 28, 2021

Fixes #2656. Mainly moving the pipeline building logic out of AutoMLSearch and into _create_pipelines() in IterativeAlgorithm. Will comment with my decisions on the PR but most testing changes were due to the API changes in IterativeAlgorithm or how pipelines were passed into IterativeAlgorithm.

One of the requirements of #2656 is:

  • move algorithm specific tests out of test_automl.py, test_automl_search_classification.py and test_automl_search_regression.py by mocking out next_batch and algorithm specific methods

Due to the length of this PR, I will make another issue for that specific requirement.

@jeremyliweishih jeremyliweishih changed the title Js 2656 pipeline building Move pipeline building into IterativeAlgorithm Sep 28, 2021
@codecov
Copy link

codecov bot commented Sep 28, 2021

Codecov Report

Merging #2854 (28ade4e) into main (322dcc0) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2854     +/-   ##
=======================================
- Coverage   99.7%   99.7%   -0.0%     
=======================================
  Files        302     302             
  Lines      28256   28296     +40     
=======================================
+ Hits       28164   28200     +36     
- Misses        92      96      +4     
Impacted Files Coverage Δ
...valml/automl/automl_algorithm/default_algorithm.py 100.0% <ø> (ø)
evalml/automl/automl_algorithm/automl_algorithm.py 100.0% <100.0%> (ø)
...lml/automl/automl_algorithm/iterative_algorithm.py 100.0% <100.0%> (ø)
evalml/automl/automl_search.py 99.9% <100.0%> (-<0.1%) ⬇️
...ts/automl_tests/parallel_tests/test_automl_dask.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_automl.py 99.5% <100.0%> (-<0.1%) ⬇️
...lml/tests/automl_tests/test_iterative_algorithm.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 98.3% <100.0%> (-0.3%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 322dcc0...28ade4e. Read the comment docs.

@@ -129,6 +193,88 @@ def __init__(
" and Real!"
)

def _create_pipelines(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is ripped out of AutoMLSearch and the API changes in IterativeAlgorithm accommodates this. Now DefaultAlgorithm and IterativeAlgorithm have more similar APIs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultAlgorithm doesn't have the same _create_pipelines API, right? Or do you just mean because we moved the logic around so we have more similar dependencies / parameter expectations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the IterativeAlgorithm.__init__() parameters!

raise ValueError("No allowed pipelines to search")

if self.ensembling and len(self.allowed_pipelines) == 1:
self.logger.warning(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also opted to move all the logging in as well. From my understanding of the logger, there will be no change in output.

@@ -279,3 +481,27 @@ def _transform_parameters(self, pipeline, proposed_parameters):
component_parameters[param_name] = value
parameters[name] = component_parameters
return parameters

def _catch_warnings(self, warning_list):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was only used in pipeline building so I moved it in as well.

)
self.logger.debug(
f"allowed_model_families set to {self.allowed_model_families}"
text_in_ensembling = (
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opted to leave this out as both IterativeAlgorithm and DefaultAlgorithm take in text_in_ensembling as arguments.

@@ -193,6 +193,8 @@ def assert_allowed_pipelines_equal_helper():
def assert_allowed_pipelines_equal_helper(
actual_allowed_pipelines, expected_allowed_pipelines
):
actual_allowed_pipelines.sort(key=lambda p: p.name)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since pipelines used to be built in AutoMLSearch, tests would compare against an unsorted list of pipelines. Changed due to self.allowed_pipelines in IterativeAlgorithm sorting due to _ESTIMATOR_FAMILY_ORDER.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to do this, we have tests in place then to confirm that the order of allowed_pipelines as as expected? My concern here is that by sorting both lists, we're able to confirm that the types of pipelines match, but not the order in which they're executed anymore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree, but from what I could tell assert_allowed_pipelines_equal_helper isn't used in any tests that are checking for order and we have tests in test_iterative_algorithm.py like test_iterative_algorithm_first_batch_order_param that do account for the case!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: since most of the tests in test_automl etc use the default iterative algorithm, the pipelines will be sorted in the order defined by _ESTIMATOR_FAMILY_ORDER in iterative algorithm. But since these tests directly compare against make_pipeline, the pipelines compared are in a different order. Another solution to this would be to sort the order into _ESTIMATOR_FAMILY_ORDER but I opted for a simpler solution.

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good--left some smaller comments to address before merging, but otherwise pretty excited by this cleanup and separation!

@@ -129,6 +193,88 @@ def __init__(
" and Real!"
)

def _create_pipelines(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultAlgorithm doesn't have the same _create_pipelines API, right? Or do you just mean because we moved the logic around so we have more similar dependencies / parameter expectations?

evalml/automl/automl_search.py Outdated Show resolved Hide resolved
evalml/tests/automl_tests/test_automl.py Show resolved Hide resolved
evalml/tests/automl_tests/test_iterative_algorithm.py Outdated Show resolved Hide resolved
evalml/tests/automl_tests/test_automl.py Show resolved Hide resolved
evalml/tests/automl_tests/test_iterative_algorithm.py Outdated Show resolved Hide resolved
evalml/tests/conftest.py Outdated Show resolved Hide resolved
@@ -193,6 +193,8 @@ def assert_allowed_pipelines_equal_helper():
def assert_allowed_pipelines_equal_helper(
actual_allowed_pipelines, expected_allowed_pipelines
):
actual_allowed_pipelines.sort(key=lambda p: p.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to do this, we have tests in place then to confirm that the order of allowed_pipelines as as expected? My concern here is that by sorting both lists, we're able to confirm that the types of pipelines match, but not the order in which they're executed anymore.

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left two nits about docs

@jeremyliweishih jeremyliweishih merged commit 023babf into main Oct 7, 2021
@chukarsten chukarsten mentioned this pull request Oct 14, 2021
@freddyaboulton freddyaboulton deleted the js_2656_pipeline_building branch May 13, 2022 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move pipeline building logic into IterativeAlgorithm
4 participants