Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ComponentGraph to accept X and y as inputs #2507

Merged
merged 5 commits into from Jul 15, 2021

Conversation

angela97lin
Copy link
Contributor

Closes #2480

@codecov
Copy link

codecov bot commented Jul 14, 2021

Codecov Report

Merging #2507 (0069d42) into main (3404a82) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2507     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        283     283             
  Lines      25635   25674     +39     
=======================================
+ Hits       25534   25573     +39     
  Misses       101     101             
Impacted Files Coverage Δ
evalml/pipelines/component_graph.py 99.4% <100.0%> (+0.1%) ⬆️
...valml/tests/pipeline_tests/test_component_graph.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3404a82...0069d42. Read the comment docs.

Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I think one of the main updates was also Users must specify the X and y inputs for each component in the graph. Currently our graph implementation accepts this:

graph = {
        "Log": [LogTransform],
        "Imputer": ["Imputer", "Log.x", "y"],
        "Random Forest": ["Random Forest Classifier", "Imputer.x", "Log.y"],
    }

Are we getting rid of the solo Component name as the first step in the graph altogether? Will that be part of a different PR?

parent_x = output_cache.get(
parent_input, output_cache.get(f"{parent_input}.x")
y_input = (
output_cache[parent_input] if parent_input[-2:] == ".y" else y
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, so this all works together to ensure both X and y are returned, whether it's the original or belongs to a parent, even if one of them is None. I like it!


component_graph.fit(X, y)

# Check that we have columns from both the output of DummyColumnNameTransformer as well as the original columns since "X" was specified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point

@angela97lin
Copy link
Contributor Author

@ParthivNaresh This is just to enable X and y for now, but doesn't enforce Users must specify the X and y inputs for each component in the graph. yet. I figured this was an important first step if we require users to enforce that they have the option to choose the original data too!

@angela97lin angela97lin merged commit 0bb0f55 into main Jul 15, 2021
@angela97lin angela97lin deleted the 2480_accept_X_y_as_inputs branch July 15, 2021 02:09
@angela97lin angela97lin self-assigned this Jul 15, 2021
@chukarsten chukarsten mentioned this pull request Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update ComponentGraph to accept X and y as inputs
2 participants