Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tuner API to work with pipeline parameter dicts #779

Merged
merged 21 commits into from
May 20, 2020

Conversation

dsherry
Copy link
Contributor

@dsherry dsherry commented May 15, 2020

Changes

  • Update Tuner init to accept a pipeline class, and propose/add to use pipeline parameter dicts
  • Update pipeline hyperparameters and custom hyperparameters to use the same structure as pipeline parameters: register each components' parameters under the component name
  • Fix Tuner doesn't support non-unique parameter names (across all components in a pipeline) #450: when two components have a parameter with the same name, those parameters aren't tuned individually.

Description
Before this PR, the Tuner classes worked with flat parameter lists. They accepted a "space" description as input, which was a dict with parameter names as the keys and ranges as the values. Tuner.propose and Tuner.add worked with flat lists of parameters:

[
    (param a', 2.71828),
    (param b', 'value'),
    (param c', 1234)
]

The problem with this is that in order to use those parameters in pipelines, we had to convert from the flat format to the format which pipelines expect, where each component's parameters are in a separate dict

{
    'Component 1': {
        'param a': 2.71828
        'param b': 'value'
    },
    'Component 2': {
        'param c': 1234
    },
    ...
}

This PR updates Tuners to work with the pipeline parameter format instead of the flat list format, meaning Tuner.propose output can be converted directly into a pipeline, and a set of pipeline parameters can be registered directly in Tuner.add.

Why do this now
This is necessary for the automl algorithm work in #754. Without it, we'd need some ugly, buggy code to convert between the old flat Tuner format and the pipeline dict format. The work to add this appeared to be comparable in duration to the work in this PR, so I went with the more sustainable option :) It's also a good clarifying change to make, because our Tuner API is specific to pipelines, and this makes that quite clear.

Future Work
This PR doesn't address the case when we have two components with the same name. In the short-term, this isn't a pressing problem because we don't yet have pipelines with multiple instances of the same component. In the long-term, we should update this when we upgrade pipelines to use graphs of components (#273)

Also, if we use OrderedDict for pipeline parameters, that will simplify the tuner code, simplify some test cases and also potentially be easier to understand from a user perspective. TODO file that as a follow-on issue.

@dsherry dsherry added the enhancement An improvement to an existing feature. label May 15, 2020
@codecov
Copy link

codecov bot commented May 15, 2020

Codecov Report

Merging #779 into master will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #779      +/-   ##
==========================================
+ Coverage   99.38%   99.40%   +0.02%     
==========================================
  Files         151      150       -1     
  Lines        5545     5567      +22     
==========================================
+ Hits         5511     5534      +23     
+ Misses         34       33       -1     
Impacted Files Coverage Δ
evalml/pipelines/classification/catboost_binary.py 100.00% <ø> (ø)
...ml/pipelines/classification/catboost_multiclass.py 100.00% <ø> (ø)
evalml/pipelines/regression/catboost.py 100.00% <ø> (ø)
evalml/automl/auto_search_base.py 97.77% <100.00%> (+0.23%) ⬆️
evalml/pipelines/pipeline_base.py 100.00% <100.00%> (ø)
...ts/automl_tests/test_auto_classification_search.py 100.00% <100.00%> (ø)
.../tests/automl_tests/test_auto_regression_search.py 100.00% <100.00%> (ø)
evalml/tests/automl_tests/test_autobase.py 100.00% <100.00%> (ø)
evalml/tests/conftest.py 100.00% <100.00%> (ø)
...ation_pipeline_tests/test_binary_classification.py 100.00% <100.00%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3224476...6ffef2a. Read the comment docs.

@dsherry dsherry marked this pull request as ready for review May 18, 2020 15:32
@dsherry dsherry requested a review from kmax12 May 18, 2020 15:33
Copy link
Contributor

@kmax12 kmax12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how this API cleans up AutoSearchBase!


class Tuner(ABC):
"""Defines API for Tuners

Tuners implement different strategies for sampling from a search space. They're used in EvalML to search the space of pipeline hyperparameters.
"""

@abstractmethod
def __init__(self, space, random_state=0):
def __init__(self, pipeline_class, random_state=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the Tuner init took in in the hyperparameter range dictionary rather than the pipeline object. that has the advantage of making the Tuner API more general/reusable without sacrificing usability.

I think it would also make it easier to test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. I'll make it happen.

hyperparameter_ranges.update(cls.custom_hyperparameters)
component_hyperparameters = copy.copy(component.hyperparameter_ranges)
if cls.custom_hyperparameters and component.name in cls.custom_hyperparameters:
component_hyperparameters.update(cls.custom_hyperparameters.get(component.name, {}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think .get(component.name, {}) is redundant with component.name in cls.custom_hyperparameters in the if statement in the line above

return [Integer(0, 10), Real(0, 10), ['option_a', 'option_b', 'option_c'], ['option_a', 'option_b', 100, np.inf]]
def dummy_component_hyperparameters():
return {
'column a': Integer(0, 10),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor, but column seems a bit confusing here. perhaps it should be "hyperparameter a"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point

estimator = dummy_estimator()
component_graph = [estimator]
def dummy_binary_pipeline(dummy_binary_pipeline_class):
MockBinaryClassificationPipeline = dummy_binary_pipeline_class(_estimator_hyperparameter_ranges=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is _estimator_hyperparameter_ranges=None used at all? also, I wonder if we make the tuner API accept the hyper parameter range dictionary instead, we dont need to worry about all these mocking changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, true, that's convincing. And yeah, I don't think I had written a test yet which used it.

def _convert_to_pipeline_parameters(self, flat_parameters):
"""Convert from a flat list of values to a dict of pipeline parameters"""
pipeline_parameters = {component_name: dict() for component_name in self._component_names}
for i, parameter_value in enumerate(flat_parameters):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you could do for flat_parameter_name, parameter_value in zip(self._search_space_names, flat_parameters): here. I find that easier to read and zip is a commonly used built in python method.

self._search_space_names = []
self._search_space_ranges = []
hyperparameter_ranges = pipeline_class.hyperparameters
for component_name, component_ranges in hyperparameter_ranges.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder the parameter transformation could be done more cleaning if you converted the dictionary into an OrderedDict and then throughout this class you could assume things were iterated over in the same order each time.

I've been looking at this and I haven't been able to decide on this conclusively, but figured I share in case it sparks any ideas

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the implementation of the transformation could be simplified. Right now, I'm more interested in getting the interface solid. I'll take another peek at it though before we merge this.

Copy link
Contributor Author

@dsherry dsherry May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yeah, I think using OrderedDict would be helpful. It would simplify things. I'm just wary of making too many changes in what is already a big PR.

I'm going to file a follow-on ticket for it.

@dsherry dsherry added this to the May 2020 milestone May 19, 2020
@dsherry dsherry requested review from kmax12, jeremyliweishih and angela97lin and removed request for angela97lin and jeremyliweishih May 20, 2020 13:01
@dsherry
Copy link
Contributor Author

dsherry commented May 20, 2020

@kmax12 @angela97lin @jeremyliweishih I've addressed Max's comment's and I feel this is ready to merge. Please get me a re-review!

@@ -43,6 +43,8 @@ def test_init(X_y):

assert isinstance(automl.best_pipeline, PipelineBase)
assert isinstance(automl.get_pipeline(0), PipelineBase)
with pytest.raises(RuntimeError, match="Pipeline not found"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this more codecov testing 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep 😂

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just took a brief glance and left a few comments/questions but this is really neat stuff!

if 'number_features' in inspect.signature(component_class.__init__).parameters:
component_parameters['number_features'] = number_features

# Inspects each component and checks the parameters list for the right parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool that we can clean this up! neat :D

proposed_params = tuner.propose()
assert_params_almost_equal(proposed_params, [5.928446182250184])
assert proposed_params == {'Mock Classifier': {'param a': 0}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm maybe I haven't read far enough yet but I get that the test is changing because of our format, but why is the value proposed changing too? :o

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous value was incorrect! This is a grid search, and I'd expect a grid search to start at one side of the space and work its way to the other side, and "0" here is all the way at one side, haha. I'm not 100% sure why this was incorrect before, would have to do some detective work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, yeah, good eye, this is confusing :)

with pytest.raises(TypeError, match=type_error_text):
GridSearchTuner(((0, 1)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we removed this test :o

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a valid range! I changed the way the search space is constructed and validated in the tuners, and I believe that made this a valid input. I think this is the key change... but again, I'd have to go back and do some detective work in order to explain why this didn't work in the old implementation. I think it deferred to skopt. 🤷‍♂️Good eye though. I'll circle back on this.

parameters: Hyperparameters used
score: Associated score
pipeline_parameters (dict): a dict of the parameters used to evaluate a pipeline
score (float): the score obtained by evaluating the pipeline with the provided parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@kmax12 kmax12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. just a few minor comments

estimator = dummy_estimator()
component_graph = [estimator]
return MockBinaryClassificationPipeline(parameters={})
estimator = MockEstimator
Copy link
Contributor

@kmax12 kmax12 May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it may be incorrect to set this here. we do this automatically on init.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this may be left over from a while back. I'll make a note to circle back to it. I don't think it's breaking any tests / it was in place before this PR.

(1.0, 'B')
>>> print(tuner.propose())
(3.25, 'A')
>>> tuner = GridSearchTuner({'My Component': {'param a': [0.0, 10.0], 'param b': ['a', 'b', 'c']}}, n_points=5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you switch to the asserts here?

if we write it like

>>> proposal.keys()
{'My Component'}

>>> proposal['My Component']
{'param a': 0.0, 'param b': 'a'}

the tests should run it and validate everything worked correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the dicts aren't ordered. The previous output was a list, which was ordered. Here's a stackoverflow about it.

Copy link
Contributor

@kmax12 kmax12 May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, that being said, it should work for proposal.keys(). all good though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I understand what you mean. Simply because we only have one component. Sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd work because it's a set

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's not an ordered set! So if it has more than one element, it'll print in a different order depending on the RNG. I ran into this issue during testing, before we simplified the Tuner interface to accept hyperparams instead of a pipeline class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't an ordered set just a list? i see, so it compares the string representation.

@dsherry dsherry merged commit 5a5270d into master May 20, 2020
@dsherry dsherry deleted the ds_754_tuner_refactor branch May 20, 2020 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement to an existing feature.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tuner doesn't support non-unique parameter names (across all components in a pipeline)
3 participants