Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pipeline graph() to label edges with X, y or both #2654

Merged
merged 7 commits into from
Aug 30, 2021

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Aug 18, 2021

Closes #1962

The current prototype:

graph = {
        "Imputer": ["Imputer", "X", "y"],
        "Target Imputer": ["Target Imputer", "X", "y"],
        "OneHot_RandomForest": ["One Hot Encoder", "Imputer.x", "Target Imputer.y"],
        "OneHot_ElasticNet": ["One Hot Encoder", "Imputer.x", "y"],
        "Random Forest": ["Random Forest Classifier", "OneHot_RandomForest.x", "y"],
        "Elastic Net": ["Elastic Net Classifier", "OneHot_ElasticNet.x", "y"],
        "Logistic Regression": [
            "Logistic Regression Classifier",
            "Random Forest.x",
            "Elastic Net.x",
            "y",
        ],
}

image

I wasn't able to find a great way to create a nice-looking legend, so here's the current prototype, seeking feedback / opinions 👀

@angela97lin angela97lin self-assigned this Aug 18, 2021
@codecov
Copy link

codecov bot commented Aug 18, 2021

Codecov Report

Merging #2654 (ccac88d) into main (1def4fe) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2654     +/-   ##
=======================================
+ Coverage   99.9%   99.9%   +0.1%     
=======================================
  Files        300     300             
  Lines      27426   27439     +13     
=======================================
+ Hits       27382   27395     +13     
  Misses        44      44             
Impacted Files Coverage Δ
evalml/pipelines/component_graph.py 99.8% <100.0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1def4fe...ccac88d. Read the comment docs.

@jeremyliweishih
Copy link
Collaborator

jeremyliweishih commented Aug 25, 2021

couple suggestions but not sure if this is achievable in graphviz:

  • I think the edge labels are a little overwhelming. Instead, we can keep the coloring but also color the y-node to make the color association more obvious
  • if "X" and "y" go into the same component it would be nice to bundle them next to each other like a "=" sign
  • Since "y" feeds into many components in your example, it might be nice to make "y" a dotted line or some just not a solid line

@freddyaboulton
Copy link
Contributor

freddyaboulton commented Aug 25, 2021

@angela97lin Agree with @jeremyliweishih that the edge labels are overwhelming! I like his suggestion. If that's not possible with graphviz, maybe we can instead skip the labels that are coming from the original input X and y as a way to declutter?

@chukarsten
Copy link
Contributor

I'm a little confused, should the output of TargetImputer be like "TargetImputer.y" indicating that that component has acted upon y? Same question for Imputer with X. I think if you don't see that change in the labels, then I wonder what the labels are really conveying.

@angela97lin
Copy link
Contributor Author

@chukarsten I think your confusion stems from the labelling, which I'm going to remove. The X and y refer to "this target is connected to that component's target input" but does not actually specify what the target value is.

@angela97lin
Copy link
Contributor Author

Here's what it looks like using dotted lines for the target edges:

image

Here's what it looks like, initalized:

image

I like this much more! Per discussion in OH, I'll file a spike to move away from graphviz and explore other libraries :)

Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I like the solid vs dotted lines, and I think it's a great step in the right direction as we look into moving away from graphviz in the future.

@@ -5,6 +5,7 @@ Release Notes
* Add ``ProphetRegressor`` to AutoML :pr:`2619`
* Removed SVM "linear" and "precomputed" kernel hyperparameter options, and improved default parameters :pr:`2651`
* Updated ``ComponentGraph`` initalization to raise ``ValueError`` when user attempts to use ``.y`` for a component that does not produce a tuple output :pr:`2662`
* Updated pipeline ``graph()`` to label edges with X, y or both :pr:`2654`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer true since the edges aren't labeled, just dotted/not!

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I like the dotted and solid lines here compared to the previous iteration. Great work getting it done and iterating on this!

@angela97lin angela97lin merged commit d3ae705 into main Aug 30, 2021
@angela97lin angela97lin deleted the 1962_graph_edges branch August 30, 2021 20:09
@chukarsten chukarsten mentioned this pull request Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pipeline graph: label edges with X, y or both
6 participants