-
Notifications
You must be signed in to change notification settings - Fork 92
Add documentation for using ComponentGraphs #2673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2673 +/- ##
=======================================
+ Coverage 99.2% 99.9% +0.8%
=======================================
Files 301 301
Lines 27819 27819
=======================================
+ Hits 27573 27770 +197
+ Misses 246 49 -197
Continue to review full report at Codecov.
|
eccabay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I left a couple requests for small new sections.
The biggest thing missing for me is discussion of the flow of a graph, as was mentioned in the original issue. I would love to see a section talking about compute order, and how to know what components to link together and how. There's a little bit of that implicitly shown in the larger example, but it would be great to have a whole section dedicated to discussing it.
| "source": [ | ||
| "## Defining a Component Graph\n", | ||
| "\n", | ||
| "Component graphs can be defined by specifying the dictionary of components and edges that describe the graph. In this dictionary, each key is the name that should be used to reference a component by, and each corresponding value is a list where the first element is the component or component name, and the remaining elements are the input edges that should be connected to that component.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple suggestions, feel free to take them or leave them!
- We could reference Dask computation graphs, since the component graph design was originally modeled after them
- "In this dictionary, each key is the name that should be used to reference a component by, and each corresponding value is a list where the first element is the component or component name, and the remaining elements are the input edges that should be connected to that component." --> "In this dictionary, each key is a reference name for a component. Each corresponding value is a list, where the first element is the component itself, and the remaining elements are the input edges that should be connected to that component. The component as listed in the value can either be the component object itself or its string name."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this, thank you for the context!
(Long sentences have always been my weakness, thank you for helping me break it down 😁)
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Visualizing Component Graphs\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth it to highlight ComponentGraph.describe() here as well!
| "source": [ | ||
| "## Components in the Component Graph\n", | ||
| "\n", | ||
| "You can use `.get_component(name)` and provide the unique component name to access any component in the component graph. Below, we can grab our Imputer component and confirm that `numeric_impute_strategy` has indeed been set to \"most_frequent\"." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a section about get_inputs here as well?
| "source": [ | ||
| "# Component Graphs\n", | ||
| "\n", | ||
| "EvalML component graphs represent and describe the flow of data in a collection of related components. A component graph is comprised of nodes representing components, and edges between pairs of nodes representing where the inputs and outputs of each component should go. It is the backbone of the features offered by the EvalML [pipeline](pipelines.ipynb), but is also a powerful data structure on its own. EvalML currently supports component graphs as linear and directed acyclic graphs (DAG)." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a link to a description of what a DAG is here, for anyone unfamiliar with graph theory
bchen1116
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with Becca's comments, but looks good!
chukarsten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks solid to me! Thank you!
Closes #2367
Docs here: https://feature-labs-inc-evalml--2673.com.readthedocs.build/en/2673/user_guide/component_graphs.html