Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ComponentGraph _validate_component_dict logic #2599

Merged
merged 8 commits into from Aug 6, 2021

Conversation

angela97lin
Copy link
Contributor

  • Remove "Cannot have multiple y parents for a single component" check in _compute_features, all checking is done in _validate_component_dict now.
  • Cleaned up and added edge case to _validate_component_dict (where user specifies .x, .y, but also "thisisnotvalid" 馃槀). Before, this check would pass as long as there was one .x, one .y properly. This update ensures that all specified values must be .x/.y/X, y :)

@angela97lin angela97lin self-assigned this Aug 5, 2021
@codecov
Copy link

codecov bot commented Aug 5, 2021

Codecov Report

Merging #2599 (ef953f6) into main (6a814dd) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2599     +/-   ##
=======================================
- Coverage   99.9%   99.9%   -0.0%     
=======================================
  Files        295     295             
  Lines      27059   27063      +4     
=======================================
+ Hits       27014   27017      +3     
- Misses        45      46      +1     
Impacted Files Coverage 螖
evalml/pipelines/component_graph.py 99.1% <100.0%> (-0.3%) 猬囷笍
...valml/tests/pipeline_tests/test_component_graph.py 100.0% <100.0%> (酶)

Continue to review full report at Codecov.

Legend - Click here to learn more
螖 = absolute <relative> (impact), 酶 = not affected, ? = missing data
Powered by Codecov. Last update 6a814dd...ef953f6. Read the comment docs.

component_input.endswith(".y") or component_input == "y"
for component_input in component_inputs
)
if not (has_feature_input and has_target_input):
if not has_feature_input:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of redundant / can be covered by case below. Kept for the specific ValueError, but open to just removing and having it encapsulated below 馃槀

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @angela97lin !!

evalml/pipelines/component_graph.py Outdated Show resolved Hide resolved
):
ComponentGraph(
{
"Imputer": [Imputer, "X", "y"], # offending line
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not actually the offending line right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I swore I updated this but I must have dreamed it. Thank you 馃槀

"Imputer",
"One Hot Encoder.x",
"y",
], # offending line: Imputer not allowed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick: I think the comment would make more sense if it was above "Target Imputer"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to put it next to the "Imputer" :)

@@ -700,7 +693,7 @@ def test_component_graph_evaluation_plumbing(
graph = {
"transformer a": [TransformerA, "X", "y"],
"transformer b": [TransformerB, "transformer a.x", "y"],
"transformer c": [TransformerC, "transformer a", "transformer b.x", "y"],
"transformer c": [TransformerC, "transformer a.x", "transformer b.x", "y"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so before "transformer a" was being silently ignored? I'm curious why this change doesn't mean we also have to change the actual expected call args for mock_transc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh it's cause the output of transformer b already had the output of transformer a. I wonder if we should change the test so that the column names don't overlap? Duplicate column names would cause ww to error out anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe so right now, parent_x = output_cache.get(parent_input, output_cache.get(f"{parent_input}.x")) means that it'll try either getting transformer a or transformer a.x as a backup, which is why this worked. We're no longer going to have to default to this anymore but but #2583 is going to clean that up further :)

@angela97lin angela97lin merged commit 2b56388 into main Aug 6, 2021
@angela97lin angela97lin deleted the update_validate_cg branch August 6, 2021 03:05
@chukarsten chukarsten mentioned this pull request Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants