Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up PipelineBase's component_graph and _component_graph attributes, add __eq__ for ComponentGraph, and add support for non-linear pipelines for generate_pipeline_code #2332

Merged
merged 31 commits into from Jun 8, 2021

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Jun 2, 2021

Closes #2331 Closes #1563

  • Removed _component_graph in favor of just component_graph
  • Add __eq__ for ComponentGraph
  • Update repr to use component dict. Supports nonlinear, which previously just used compute_order but did not really represent the nonlinear pipeline well.
  • Add nonlinear support for generate_pipeline_code

@angela97lin angela97lin self-assigned this Jun 2, 2021
@codecov
Copy link

codecov bot commented Jun 3, 2021

Codecov Report

Merging #2332 (19025c8) into main (96dc245) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2332     +/-   ##
=======================================
- Coverage   99.9%   99.9%   -0.0%     
=======================================
  Files        281     281             
  Lines      24760   24849     +89     
=======================================
+ Hits       24731   24817     +86     
- Misses        29      32      +3     
Impacted Files Coverage Δ
evalml/tests/automl_tests/test_automl.py 99.7% <ø> (ø)
evalml/pipelines/binary_classification_pipeline.py 100.0% <100.0%> (ø)
evalml/pipelines/classification_pipeline.py 100.0% <100.0%> (ø)
evalml/pipelines/component_graph.py 99.4% <100.0%> (-0.6%) ⬇️
evalml/pipelines/pipeline_base.py 99.7% <100.0%> (+0.1%) ⬆️
.../pipelines/time_series_classification_pipelines.py 100.0% <100.0%> (ø)
...valml/pipelines/time_series_regression_pipeline.py 100.0% <100.0%> (ø)
.../automl_tests/test_automl_search_classification.py 100.0% <100.0%> (ø)
...lml/tests/automl_tests/test_iterative_algorithm.py 100.0% <100.0%> (ø)
...valml/tests/pipeline_tests/test_component_graph.py 100.0% <100.0%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96dc245...19025c8. Read the comment docs.

@angela97lin angela97lin changed the title Clean up PipelineBase's component_graph and _component_graph attributes Clean up PipelineBase's component_graph and _component_graph attributes, add __eq__ for ComponentGraph, and add support for non-linear pipelines for generate_pipeline_code Jun 4, 2021
@angela97lin angela97lin marked this pull request as ready for review June 4, 2021 14:08
@angela97lin angela97lin requested review from dsherry, freddyaboulton and chukarsten and removed request for dsherry and freddyaboulton June 4, 2021 14:08
Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, Angela! This looks awesome! I like how the default repr is now the dictionary repr rather than string for the component graph.

My main concern is that this could be confusing for users, since now the pipeline attribute component_graph won't necessarily have the same value as the arg they pass in, but rather holds the ComponentGraph instance. Not sure if we have a precedent where we let users know about this. Maybe we can add to 'breaking changes' something about deleting attribute _component_graph?

Not necessary, but just wanted to see what people thoght here. Left some nit, but nothing blocking!

evalml/pipelines/component_graph.py Show resolved Hide resolved
evalml/pipelines/pipeline_base.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid, I like this change. _component_graph is one of those semiprivate attributes I was hoping we'd get around to addressing. I left a comment about maybe extending the test for the equality operator to explicitly demonstrate inequality when compute order is different. But nothing blocking. Do what you feel is best.

evalml/pipelines/pipeline_base.py Outdated Show resolved Hide resolved
evalml/tests/automl_tests/test_automl.py Outdated Show resolved Hide resolved
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin This looks good to me! I think this would be good to merge if we list is as a breaking change in the release notes and resolve our discussion about whether the compute order should be used to determine if component graphs are equal.

@angela97lin angela97lin added this to the One Week Sprint milestone Jun 8, 2021
@angela97lin angela97lin merged commit a2e3e34 into main Jun 8, 2021
@angela97lin angela97lin deleted the 2331_component_graph branch June 8, 2021 19:12
@chukarsten chukarsten mentioned this pull request Jun 9, 2021
@chukarsten chukarsten mentioned this pull request Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants