Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have component_graph support ComponentBase subclasses instead of instances #850

Merged
merged 4 commits into from
Jun 12, 2020

Conversation

dsherry
Copy link
Contributor

@dsherry dsherry commented Jun 12, 2020

Problem
Currently, pipelines' component_graph attribute accepts a list of str and ComponentBase subclass instances, and the helper handle_component standardizes to an instance. However, component_graph is static, so it doesn't make sense to have component instances saved in that static attribute.

class CustomPipeline(BinaryClassificationPipeline):
    component_graph = ['Simple Imputer', 'One Hot Encoder', StandardScaler(), 'Logistic Regression Classifier']

assert isinstance(CustomPipeline.component_graph[2], ComponentBase)

This isn't a bug at the moment, because we never access the instances directly, we just grab the class and use that to create the actual instance during pipeline instantiation. However, it makes our core pipeline code more confusing and could lead to bugs down the road.

Fix
This PR updates component_graph to accept a list of str and ComponentBase subclasses, but no longer instances. And the helper handle_component_class (I renamed the old function) standardizes to a class, not an instance.

class CustomPipeline(BinaryClassificationPipeline):
    component_graph = ['Simple Imputer', 'One Hot Encoder', StandardScaler, 'Logistic Regression Classifier']

assert issubclass(CustomPipeline.component_graph[2], ComponentBase)

@dsherry dsherry added the enhancement An improvement to an existing feature. label Jun 12, 2020
@dsherry dsherry changed the base branch from master to ds_523_init_components_pipelines June 12, 2020 02:13
@dsherry dsherry changed the title Pipelines' component_graph: support ComponentBase subclass instead of instances Have component_graph support ComponentBase subclasses instead of instances Jun 12, 2020
@codecov
Copy link

codecov bot commented Jun 12, 2020

Codecov Report

Merging #850 into master will increase coverage by 9.38%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #850      +/-   ##
==========================================
+ Coverage   90.31%   99.69%   +9.38%     
==========================================
  Files         195      195              
  Lines        7740     7745       +5     
==========================================
+ Hits         6990     7721     +731     
+ Misses        750       24     -726     
Impacted Files Coverage Δ
evalml/utils/gen_utils.py 100.00% <ø> (ø)
...lml/automl/automl_algorithm/iterative_algorithm.py 100.00% <100.00%> (ø)
evalml/pipelines/components/__init__.py 100.00% <100.00%> (ø)
evalml/pipelines/components/utils.py 100.00% <100.00%> (ø)
evalml/pipelines/pipeline_base.py 100.00% <100.00%> (+5.88%) ⬆️
...lml/tests/automl_tests/test_iterative_algorithm.py 100.00% <100.00%> (ø)
evalml/tests/component_tests/test_utils.py 97.67% <100.00%> (+3.55%) ⬆️
evalml/tests/conftest.py 100.00% <100.00%> (ø)
...ification_pipeline_tests/test_en_classification.py 100.00% <100.00%> (ø)
...ts/regression_pipeline_tests/test_en_regression.py 100.00% <100.00%> (ø)
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 92db722...8bcf35d. Read the comment docs.

for component in component_graph:
component_parameters = proposed_parameters.get(component.name, {})
init_params = inspect.signature(component.__class__.__init__).parameters
component_graph = [handle_component_class(c) for c in pipeline_class.component_graph]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this in at least a couple places in our codebase. Could be nice to define a method for it. Or override PipelineBase.component_graph to return a handled version by default.

raise MissingComponentError(err) from e

component_class = component.__class__
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line in particular is great to get rid of. Before this PR, we were calling handle_component to get a component instance, then calling __class__ to get the component class, then using that to create a fresh component instance below. Now, we just get the class and create the instance.

@@ -27,19 +32,28 @@ def test_all_components_core_dependencies_mock():
assert len(all_components()) == 17


def test_handle_component_names():
def test_handle_component_class_names():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct unit test coverage for handle_component_class

@dsherry dsherry marked this pull request as ready for review June 12, 2020 12:55
@dsherry dsherry force-pushed the ds_523_init_components_pipelines branch from 7745a61 to 25368c2 Compare June 12, 2020 14:57
Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Base automatically changed from ds_523_init_components_pipelines to master June 12, 2020 15:15
@dsherry dsherry force-pushed the ds_update_handle_components_class branch from 0dd58b4 to 8bcf35d Compare June 12, 2020 15:15
@dsherry dsherry merged commit e2ea9a2 into master Jun 12, 2020
@dsherry dsherry deleted the ds_update_handle_components_class branch June 12, 2020 15:38
@angela97lin angela97lin mentioned this pull request Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement to an existing feature.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants