Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ComponentGraph to enforce needing .x and .y for each component specified in the graph #2563

Merged
merged 68 commits into from Jul 30, 2021

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Jul 27, 2021

Closes #2481, based on #2490 :)

The output of estimators (and all other components) must be accessed via .x or .y; simply using the component name will no longer suffice. The convention to defaulting to .x if not specified was chosen when we did not have many (if any) components that outputted an appropriate .y value. Now that we do have more target transformers, I think enforcing explicit edges is helpful in clarity.

This PR also deletes some fixtures and does some cleanup for our tests.

@@ -1064,7 +1063,6 @@ def test_add_to_rankings(
y_train=y,
problem_type="binary",
max_iterations=1,
allowed_component_graphs=dummy_classifier_linear_component_graph,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowed_component_graphs plays no role in the validity of this test, so removing :)



@pytest.fixture
def dummy_classifier_dict_component_graph(dummy_classifier_estimator_class):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed these in favor of example_graph and example_regression_graph since they were confusing; they used dummy classifier but were labeled as Random Forest. Didn't provide much, if any, value over the example_graph fixture

@@ -91,23 +91,6 @@ def dummy_components():
return TransformerA, TransformerB, TransformerC, EstimatorA, EstimatorB, EstimatorC


@pytest.fixture
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to conftest so that it could be used in other files

@angela97lin
Copy link
Contributor Author

@chukarsten I know you keep hoping for a large PR but hopefully, after splitting out the original PR, this is as big as it gets 😂

Copy link
Collaborator

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, this is a really great PR and I really appreciate your cleanup as you go, philosophy. Looks really good. Thank you.

component_class = handle_component_class(component_info[0])
self.component_instances[component_name] = component_class
self.input_feature_names = {}
self._feature_provenance = {}
self._i = 0
self._compute_order = self.generate_order(self.component_dict)

def _validate_component_dict(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this function - super readable.

Comment on lines 247 to 249
import pdb

pdb.set_trace()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NoooooooOOOooooooOOoooo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱

Comment on lines +3811 to +3813
"Imputer": {"numeric_impute_strategy": Categorical(["most_frequent", "mean"])},
"Imputer_1": {"numeric_impute_strategy": Categorical(["median", "mean"])},
"Random Forest Classifier": {"n_estimators": Categorical([50, 100])},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How'd you get this past black without it exploding into 20+ lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL I just double checked: my line is 88 chars long, and black's limit is 88 chars per line--so I just bareeeely made it 😅

if is_linear
else dummy_classifier_dict_component_graph
)
component_graph = {"CG": example_graph}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@angela97lin angela97lin marked this pull request as draft July 29, 2021 16:50
@angela97lin angela97lin marked this pull request as ready for review July 29, 2021 20:38
@angela97lin angela97lin merged commit 0a12418 into main Jul 30, 2021
@angela97lin angela97lin deleted the 2482_x_y branch July 30, 2021 15:46
@chukarsten chukarsten mentioned this pull request Aug 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update ComponentGraph to enforce needing .x and .y for each component specified in the graph
3 participants