Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ComponentGraph class #1415

Merged
merged 56 commits into from Dec 9, 2020
Merged

Add ComponentGraph class #1415

merged 56 commits into from Dec 9, 2020

Conversation

eccabay
Copy link
Contributor

@eccabay eccabay commented Nov 6, 2020

Closes #1277 (design doc here)

@codecov
Copy link

codecov bot commented Nov 6, 2020

Codecov Report

Merging #1415 (f7178a1) into main (06babaa) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1415     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         230      232      +2     
  Lines       15831    16430    +599     
=========================================
+ Hits        15823    16422    +599     
  Misses          8        8             
Impacted Files Coverage Δ
evalml/pipelines/__init__.py 100.0% <100.0%> (ø)
evalml/pipelines/component_graph.py 100.0% <100.0%> (ø)
...valml/tests/pipeline_tests/test_component_graph.py 100.0% <100.0%> (ø)
evalml/tests/pipeline_tests/test_graphs.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 06babaa...f7178a1. Read the comment docs.

@eccabay eccabay marked this pull request as ready for review November 9, 2020 16:48
@eccabay eccabay self-assigned this Nov 9, 2020
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eccabay this is huge!! Nice going, this API is looking great. Very excited to get it into the pipelines! I really like the way we use that recompute method, and I think that from_list method is gonna be super helpful. The try-first-and-fail logic in graph is neat.

I left some comments and suggestions. I didn't dig into the tests yet, will do so on next pass. Some things to call out:

  • I think we should have public methods for fitting and predicting. They could certainly share impl.
  • I think we should keep the user-defined component_dict completely separate from whatever structure we use to hold the component instances. That way, the component_dict holds the template for the graph. I suppose add_node and add_edge would have to modify that structure.

I can also see us wanting to evaluate part(s) of the graph, i.e. call transform on all preprocessing features. We don't have to add that to this PR, but I think that would be great to file as something to add once the DAGs work is complete.

evalml/pipelines/component_graph.py Outdated Show resolved Hide resolved
evalml/pipelines/component_graph.py Outdated Show resolved Hide resolved
evalml/pipelines/component_graph.py Outdated Show resolved Hide resolved
evalml/pipelines/component_graph.py Outdated Show resolved Hide resolved
evalml/pipelines/component_graph.py Outdated Show resolved Hide resolved
evalml/pipelines/component_graph.py Outdated Show resolved Hide resolved
evalml/pipelines/component_graph.py Show resolved Hide resolved
evalml/pipelines/component_graph.py Show resolved Hide resolved
evalml/pipelines/component_graph.py Show resolved Hide resolved
evalml/pipelines/component_graph.py Outdated Show resolved Hide resolved
@eccabay eccabay requested a review from dsherry November 16, 2020 14:24
Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving tiny comments for now, would love to take a closer look when I get the chance later 😁

docs/source/release_notes.rst Outdated Show resolved Hide resolved
evalml/pipelines/__init__.py Show resolved Hide resolved
evalml/pipelines/component_graph.py Outdated Show resolved Hide resolved
evalml/tests/pipeline_tests/test_graphs.py Show resolved Hide resolved
Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks great, and the tests are awesome! Just had a few questions/comments.

I agree with Dylan about submitting an issue to track the caching for this class, since the current cache might run into memory issues.

There currently is no implementation for adding additional nodes or adding two graphs together, as mentioned in the original issue, is there? Not sure if I missed something, but might be nice to file an issue to track adding those two!

Also, not sure if this was intentional, but this class hasn't been added to the docs yet.

Otherwise, I left a few comments, but looks good to me!

@dsherry
Copy link
Contributor

dsherry commented Dec 8, 2020

I agree with Dylan about submitting an issue to track the caching for this class, since the current cache might run into memory issues.

@bchen1116 good point, already filed! (idk if my prev comment mentioned this) #466

@eccabay
Copy link
Contributor Author

eccabay commented Dec 8, 2020

There currently is no implementation for adding additional nodes or adding two graphs together, as mentioned in the original issue, is there? Not sure if I missed something, but might be nice to file an issue to track adding those two!

Great point @bchen1116! I did remove these capabilities as I iterated through this PR. With the decision for the ComponentGraph class to be completely internal/private (users shouldn't ever need to muck with it), any use cases for adding nodes or merging graphs disappeared. That being said, it may be worth it to file another issue to track adding that in the future, should use cases arise. (This PR is already so large, that work would probably be left to a separate issue)

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eccabay solid work on this!! 🎉

Here's all that I found which I'd consider blocking:

  • I proposed a simplification in compute_final_component_features / _compute_features, so that we have one evaluation impl instead of two.
  • I left a few little suggested cleanups in _compute_features.

evalml/pipelines/component_graph.py Show resolved Hide resolved
evalml/pipelines/component_graph.py Outdated Show resolved Hide resolved
evalml/pipelines/component_graph.py Outdated Show resolved Hide resolved
evalml/pipelines/component_graph.py Show resolved Hide resolved
graph.node(component_name, shape='record', label=label)
edges = self._get_edges()
graph.edges(edges)
return graph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have any example images handy it would be cool to see what this looks like!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Here's an uninstantiated pipeline
uninstantiated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here's the same pipeline after instantiation with default parameters!
instantiated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooooh. This rocks. Thanks!

evalml/pipelines/component_graph.py Outdated Show resolved Hide resolved
evalml/pipelines/component_graph.py Show resolved Hide resolved
evalml/pipelines/component_graph.py Outdated Show resolved Hide resolved
evalml/pipelines/component_graph.py Outdated Show resolved Hide resolved
evalml/pipelines/component_graph.py Outdated Show resolved Hide resolved
@eccabay eccabay merged commit 3de3b12 into main Dec 9, 2020
@eccabay eccabay deleted the 1277_component_graph branch December 10, 2020 18:59
@dsherry dsherry mentioned this pull request Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ComponentGraph class
4 participants