Skip to content

Component parameters: save all fields for builtin components #847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jun 12, 2020

Conversation

dsherry
Copy link
Contributor

@dsherry dsherry commented Jun 12, 2020

Fix #522 fix #355

Changes

  • Define a ComponentBase.parameters getter, to keep the saved parameters immutable after component init
  • Make the args in ComponentBase.__init__ optional with defaults: parameters (empty dict), component_obj (None) and random_state (0).
  • Update all our components to save all input values in parameters (except for random_state)
  • Add test coverage to enforce the last point for all components listed in all_components
  • Exclude ComponentBase, Transformer, Estimator and other transformer base classes from all_components list

@@ -19,7 +19,7 @@ def __init__(self, columns=None, random_state=0):
random_state=random_state)

def _check_input_for_columns(self, X):
cols = self.parameters["columns"] or []
cols = self.parameters.get("columns", [])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a style tweak, not important

OneHotEncoder,
PerColumnImputer,
RFClassifierSelectFromModel,
RFRegressorSelectFromModel,
SimpleImputer,
StandardScaler,
Transformer
Copy link
Contributor Author

@dsherry dsherry Jun 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its good to remove these -- they shouldn't show up in all_components() because they're base classes and would never be used in a pipeline directly. Right now, any subclass of ComponentBase which is imported in this file will be included in all_components().

@codecov
Copy link

codecov bot commented Jun 12, 2020

Codecov Report

Merging #847 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #847   +/-   ##
=======================================
  Coverage   99.68%   99.68%           
=======================================
  Files         195      195           
  Lines        7699     7738   +39     
=======================================
+ Hits         7675     7714   +39     
  Misses         24       24           
Impacted Files Coverage Δ
...valml/pipelines/components/estimators/estimator.py 100.00% <ø> (ø)
...s/components/estimators/regressors/et_regressor.py 100.00% <ø> (ø)
...s/components/estimators/regressors/rf_regressor.py 100.00% <ø> (ø)
...l/pipelines/components/transformers/transformer.py 100.00% <ø> (ø)
evalml/tests/component_tests/test_en_classifier.py 100.00% <ø> (ø)
evalml/tests/component_tests/test_en_regressor.py 100.00% <ø> (ø)
evalml/tests/component_tests/test_et_classifier.py 100.00% <ø> (ø)
evalml/tests/component_tests/test_et_regressor.py 100.00% <ø> (ø)
...ification_pipeline_tests/test_en_classification.py 100.00% <ø> (ø)
...ification_pipeline_tests/test_et_classification.py 100.00% <ø> (ø)
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2b54ced...ad7dae3. Read the comment docs.

@@ -47,7 +43,7 @@ def _components_dict():
components = dict()
for _, obj in inspect.getmembers(sys.modules[__name__], inspect.isclass):
params = inspect.getargspec(obj.__init__)
if issubclass(obj, ComponentBase):
if issubclass(obj, ComponentBase) and obj is not ComponentBase:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This excludes ComponentBase from all_components()

assert defaults.pop(-1) == 0

expected_parameters = {arg: default for (arg, default) in zip(args, defaults)}
assert parameters == expected_parameters
Copy link
Contributor Author

@dsherry dsherry Jun 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main test. It introspects on the __init__ of all components listed in all_components(), and expects they've saved all args in parameters except for random_state, which should be the last __init__ arg.

I'd love to find a way to get this code into a metaclass so it can be used to raise exceptions at class-definition-time when a user defines a custom component. I was working on that a couple months ago, but... it's hard!! 😂

component2 = component_class(**parameters)
parameters2 = component2.parameters

assert parameters == parameters2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test ensures that instantiating a component with no parameters and instantiating a component with default parameters produces the same parameters attached to the instance.

We need to add default_parameters, will file that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry dsherry marked this pull request as ready for review June 12, 2020 01:12
@dsherry dsherry mentioned this pull request Jun 12, 2020
"""A component that fits and predicts given data"""
"""A component that fits and predicts given data

To implement a new Transformer, define your own class which is a subclass of Transformer, including
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean Estimator here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, thanks, good eye, will fix!

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Will be great to get this in for to assist #842 and #474.

cb_classifier = catboost.CatBoostClassifier(**parameters,
# catboost will choose an intelligent default for bootstrap_type, so only set if provided
cb_parameters = copy.copy(parameters)
if bootstrap_type is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this inversion!

@dsherry dsherry merged commit bcc9242 into master Jun 12, 2020
@dsherry dsherry deleted the ds_522_standardize_components branch June 12, 2020 14:34
@angela97lin angela97lin mentioned this pull request Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Standardize component definitions Custom component support: components should not require component_obj
3 participants