Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโ€™ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove linearized_component_graph and ComponentGraph.from_list #2556

Merged
merged 6 commits into from Jul 27, 2021

Conversation

angela97lin
Copy link
Contributor

#2549 removed almost all usages of ComponentGraph.from_list but didn't actually remove the method ๐Ÿ˜‚

This PR does that, and removes the idea of linearized_component_graph. I think this is a good idea because this method assumes that there is a way to make something "linear", which is suspicious in DAG cases. In addition, most places where our code currently uses this can be easily replaced with component_instances, which makes me suspicious that order did not really matter; instead, linearized_component_graph was used as a convenient way to get component name mapping to the component class. If this is useful, it should be revamped into a simple dictionary, but for now, functionality is covered by component_instances.

@angela97lin angela97lin self-assigned this Jul 26, 2021
@codecov
Copy link

codecov bot commented Jul 26, 2021

Codecov Report

Merging #2556 (8b88a2d) into main (3a803e0) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2556     +/-   ##
=======================================
+ Coverage   99.9%   99.9%   +0.1%     
=======================================
  Files        285     285             
  Lines      26212   26097    -115     
=======================================
- Hits       26176   26063    -113     
+ Misses        36      34      -2     
Impacted Files Coverage ฮ”
evalml/pipelines/component_graph.py 100.0% <รธ> (+0.7%) โฌ†๏ธ
evalml/pipelines/pipeline_base.py 99.7% <รธ> (-<0.1%) โฌ‡๏ธ
...lml/automl/automl_algorithm/iterative_algorithm.py 100.0% <100.0%> (รธ)
...valml/tests/pipeline_tests/test_component_graph.py 100.0% <100.0%> (รธ)
evalml/tests/pipeline_tests/test_pipelines.py 99.9% <100.0%> (+0.1%) โฌ†๏ธ

Continue to review full report at Codecov.

Legend - Click here to learn more
ฮ” = absolute <relative> (impact), รธ = not affected, ? = missing data
Powered by Codecov. Last update 3a803e0...8b88a2d. Read the comment docs.

@@ -4,18 +4,19 @@ Release Notes
* Enhancements
* Fixes
* Changes
* Removed ``ComponentGraph.from_list`` and update PipelineBase implementation for creating pipelines from a list of components :pr:`2549`
* Updated ``PipelineBase`` implementation for creating pipelines from a list of components :pr:`2549`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating release notes to make clear that this is actually the PR that removed from_list :)

@@ -417,7 +367,7 @@ def test_get_estimators(example_graph):
LogisticRegressionClassifier(),
]

component_graph = ComponentGraph.from_list(["Imputer", "One Hot Encoder"])
component_graph = ComponentGraph({"Imputer": ["Imputer", "X", "y"]})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this test still had validity: it was testing that get_estimators when a component graph has no estimators will return an empty list. I'm just changing how the graph is initialized :)

"Logistic Regression Classifier",
]
component_graph = ComponentGraph.from_list(component_list)
component_graph = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I thought this could be valuable to keep around even though the original test (does y value get passed even though edge from standard scaler is unclear?) is different; still valuable to double check that we are indeed getting the oversampled y :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a tad confused by does y value get passed even though edge from standard scaler is unclear?, we're not expecting it to be passed right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, we expect that the oversampler's y value gets passed to the estimator because standard scaler doesn't produce a y value so the "most recent y" value the estimator sees is still the oversampler's y, but this isn't really clear using the list API.

Now that we're specifying edges, it's more clear that the estimator gets the oversampler's y edge, which would have meant that this test isn't necessary to confirm that anymore, but I wanted to keep it around anyways in the case that the estimator does not get the oversampler's y :)

Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

"Logistic Regression Classifier",
]
component_graph = ComponentGraph.from_list(component_list)
component_graph = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a tad confused by does y value get passed even though edge from standard scaler is unclear?, we're not expecting it to be passed right?

@@ -1531,31 +1412,6 @@ def test_component_graph_sampler_y_passes(mock_estimator_fit):
assert len(mock_estimator_fit.call_args[0][0]) == int(1.25 * 90)


@patch("evalml.pipelines.components.estimators.RandomForestClassifier.fit")
@patch("evalml.pipelines.components.estimators.DecisionTreeClassifier.fit")
def test_component_graph_sampler_same_given_components(mock_dt_fit, mock_rf_fit):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the original purpose of this test was, just verifying that SMOTE Oversampler was included in both graphs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was just testing that with different estimators that the oversampler was the same and received the same input :)

@@ -1921,114 +1777,6 @@ def test_component_graph_inverse_transform(
pd.testing.assert_series_equal(answer, expected)


lists = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goodnight dear lists

@angela97lin angela97lin merged commit 31866ba into main Jul 27, 2021
@angela97lin angela97lin deleted the remove_linearized_component_graph branch July 27, 2021 17:40
@chukarsten chukarsten mentioned this pull request Aug 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants