Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Pipeline parameters pt 2 #3427

Merged
merged 59 commits into from
Apr 5, 2022
Merged

Update Pipeline parameters pt 2 #3427

merged 59 commits into from
Apr 5, 2022

Conversation

bchen1116
Copy link
Contributor

@bchen1116 bchen1116 commented Mar 30, 2022

Fix #3414

Performance tests here:
report_param.html.zip

Running main locallly:
Pasted Graphic

Running this branch locally:
Pasted Graphic 1

The performances are more or less the same for both iterative and default (iterative was only tested 1 batch locally).

evalml/tuners/tuner.py Outdated Show resolved Hide resolved
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 Almost there! Let's make sure to run perf tests prior to merging so we can be extra confident as well.

evalml/tuners/tuner.py Outdated Show resolved Hide resolved
evalml/automl/automl_algorithm/default_algorithm.py Outdated Show resolved Hide resolved
@@ -326,8 +289,14 @@ def _create_n_pipelines(self, pipelines, n):
self._create_tuner(pipeline)

select_parameters = self._create_select_parameters()
proposed_parameters = self._tuners[pipeline.name].propose()
parameters = self._transform_parameters(pipeline, proposed_parameters)
parameters = (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the parameters we use when we call this through _create_fast_final versus _create_n_pipelines with a larger size of pipelines

@@ -89,6 +89,36 @@ def _convert_to_pipeline_parameters(self, flat_parameters):
pipeline_parameters[component_name][parameter_name] = parameter_value
return pipeline_parameters

def get_starting_parameters(self, hyperparameter_ranges, random_seed=0):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the hyperparam_ranges and random_seed args to limit the starting parameters to only custom hyperparameters that users pass in.

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 Looks good to me. Thank you for sticking with this! The perf tests look good and I was able to verify the previous bug in how the default algorithm created the starting batch is no longer there.

I feel confident about the merge but I left a suggestion for a unit test that I think will nail it home for us now and in the future.

Would be good to get two approvals on this one since it's a big refactor. Maybe wait for @jeremyliweishih !

docs/source/release_notes.rst Outdated Show resolved Hide resolved
self._tuners[pipeline.name].get_starting_parameters(
self._hyperparameters, self.random_seed
)
if n == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want _batch_number <= 2 here right? I think this works right now because we only call _create_n_pipelines with n=1 in the create_fast_final batch but would not work if the code was refactored I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a good alternative here would be to refactor this logic to be under a flag instead of basing it off of the number of pipelines generated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyliweishih great suggestion! I'll change that.

)
automl.automl_algorithm.allowed_pipelines = pipelines
automl.automl_algorithm._set_allowed_pipelines(pipelines)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the tuner not created otherwise? Wondering why we're making this change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! With the changes that we made to iterative, just setting allowed_pipelines doesn't automatically create tuners for them.

if "n_jobs" in init_params:
component_parameters["n_jobs"] = self.n_jobs
try:
if "number_features" in init_params:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will file an issue to delete this. Don't know what it's used for and why it's only for iterative algorithm.

],
)
@pytest.mark.parametrize("problem_type", ["binary", "time series binary"])
def test_search_parameters_held_automl(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on adding a test to check whether the estimator parameters used match the default values? I think e can use inspect to make sure the test does not get stale. We chose the default values precisely cause they're fast/get good enough performance so might be good to add that coverage. And I think that unit test would have caught the original problem with the first version of this pr right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, can do!

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Should be good to go once Freddy's comments are addressed. Great work @bchen1116!

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final revisions are great. Thanks!

@bchen1116 bchen1116 enabled auto-merge (squash) April 5, 2022 15:30
@bchen1116 bchen1116 merged commit 1f1069b into main Apr 5, 2022
@chukarsten chukarsten mentioned this pull request Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix performance bug in search_parameters PR
3 participants