Skip to content

Remove pipeline_parameters and custom_hyperparameters and replace with search_parameters#3373

Merged
bchen1116 merged 42 commits intomainfrom
bc_search_parameters
Mar 24, 2022
Merged

Remove pipeline_parameters and custom_hyperparameters and replace with search_parameters#3373
bchen1116 merged 42 commits intomainfrom
bc_search_parameters

Conversation

@bchen1116
Copy link
Contributor

fix #3153 and fix #3150

Design doc in confluence

@bchen1116 bchen1116 self-assigned this Mar 14, 2022
@codecov
Copy link

codecov bot commented Mar 14, 2022

Codecov Report

Merging #3373 (2847634) into main (da8f266) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3373     +/-   ##
=======================================
- Coverage   99.7%   99.6%   -0.0%     
=======================================
  Files        329     329             
  Lines      32405   32380     -25     
=======================================
- Hits       32276   32249     -27     
- Misses       129     131      +2     
Impacted Files Coverage Δ
...sts/test_automl_search_classification_iterative.py 100.0% <ø> (ø)
evalml/automl/automl_algorithm/automl_algorithm.py 100.0% <100.0%> (ø)
...valml/automl/automl_algorithm/default_algorithm.py 100.0% <100.0%> (ø)
...lml/automl/automl_algorithm/iterative_algorithm.py 97.4% <100.0%> (-1.0%) ⬇️
evalml/automl/automl_search.py 99.6% <100.0%> (-0.1%) ⬇️
...ts/automl_tests/parallel_tests/test_automl_dask.py 96.3% <100.0%> (ø)
evalml/tests/automl_tests/test_automl.py 99.5% <100.0%> (+0.1%) ⬆️
evalml/tests/automl_tests/test_automl_algorithm.py 98.6% <100.0%> (+0.5%) ⬆️
...ts/automl_tests/test_automl_iterative_algorithm.py 100.0% <100.0%> (ø)
.../automl_tests/test_automl_search_classification.py 96.5% <100.0%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update da8f266...2847634. Read the comment docs.

@bchen1116 bchen1116 marked this pull request as ready for review March 14, 2022 20:52
@bchen1116 bchen1116 requested a review from a team March 14, 2022 20:52
Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have some nitpicky doc comments for now, I'll come back and do a full review later!

for (
name,
component_instance,
) in pipeline.component_graph.component_instances.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 This code block is doing two things:

  1. Getting random values from the skopt spaces so that the parameters used in the first batch are in the space the tuner is tuning over
  2. Making sure the the _pipeline_parameters are correctly added to the parameters so that Drop Columns etc get the right parameters

I think this would be simpler if 1 was a tuner method, like get_starting_parameters ?

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 Thank you for your work on this! I left some suggestions for testing improvements. This is looking pretty good though.

)
if self._sampler_name not in parameters and self._sampler_name is not None:
parameters[self._sampler_name] = {
if (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved to the AutoMLAlgorithm? It's kind of awkward that there parameters are set in AutoMLSearch while the rest are set in the AutoMLAlgorithm.

Copy link
Contributor Author

@bchen1116 bchen1116 Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton I think this would be a weird move. We use a lot of information that isn't massed to the AutoMLAlgorithm to determine whether we use a sampler and which sampler to use. We would need to pass all of this relevant data to the AutoMLAlgorithm in order to move this logic, and I'm not sure if that's worth it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence on this:
On one side, I made the decision to move pipeline building into the algorithms and this certainly falls under that category. On the other side, I do understand @bchen1116's concern about bloat in AutoMLAlgorithm. @bchen1116 can you file an issue and use this discussion as context for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea If the long term plan is to move pipeline building logic to the algorithms then I think the logic for determining whether or not to add a sampler should move to the algorithms. I think there are some unused parameters in the automl algos right now that can be cleaned up too, e.g. number_features. We can do that in a separate issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed issue here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

assert aml._tuners.keys() == aml_add_pipelines._tuners.keys()
assert aml._tuner_class == aml_add_pipelines._tuner_class
aml.next_batch()
aml._transform_parameters(None, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this line do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codecov would raise errors if I didn't have calls to the next_batch and _transform_parameters methods. This was to satisfy that

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great work @bchen1116, a big value add in cleaning up the internal API as well as the external parameters API. Appreciate the cleanup in DefaultAlgo as well! Just left some general comments.

)
if self._sampler_name not in parameters and self._sampler_name is not None:
parameters[self._sampler_name] = {
if (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence on this:
On one side, I made the decision to move pipeline building into the algorithms and this certainly falls under that category. On the other side, I do understand @bchen1116's concern about bloat in AutoMLAlgorithm. @bchen1116 can you file an issue and use this discussion as context for it?

@bchen1116 bchen1116 merged commit b442453 into main Mar 24, 2022
@chukarsten chukarsten mentioned this pull request Mar 25, 2022
chukarsten added a commit that referenced this pull request Mar 28, 2022
… replace with `search_parameters` (#3373)"

This reverts commit b442453.
freddyaboulton pushed a commit that referenced this pull request Mar 28, 2022
… replace with `search_parameters`" (#3410)

* Revert "Remove `pipeline_parameters` and `custom_hyperparameters` and replace with `search_parameters` (#3373)"

This reverts commit b442453.

* Release notes.
@chukarsten chukarsten mentioned this pull request Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants