Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update make_pipelines for AutoML to create component dictionaries, not lists, to initialize pipelines #2504

Merged
merged 28 commits into from Jul 20, 2021

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Jul 13, 2021

Closes #2493, closes #2494 (in lieu of #2496).

This PR does a few things:

  • Adds a private variable to be used in deprecating our list API for complex pipelines (_supported_by_list_API). Also adds two class properties, modifies_target and modifies_features. _supported_by_list_API will be used in Deprecate the list API for ComponentGraph #2490 when we actually deprecate the list API, but adding here because of its current connection to modifies_target.
  • Updates our component graph to first generate the appropriate component dict from a list of components internally for make_pipeline, rather than passing in the list of components directly. This is also related to eventually deprecating the list API.

Q:

  • We wanted to deprecate the list API because of ambiguity, but in make_pipeline, we needed a way to convert a list of components to a ComponentGraph. Is this reliable enough that we don't have to deprecate the list API after all? My hunch says it is okay to be more restrictive for now (still deprecate list API) since it relies on the user to have modifies_target and modifies_features properly set for their components, and this is a restraint we can always loosen later, but curious about thoughts 😁

Also welcome any other graph examples you guys think of for testing 😁

@angela97lin angela97lin self-assigned this Jul 13, 2021
@codecov
Copy link

codecov bot commented Jul 13, 2021

Codecov Report

Merging #2504 (c802e00) into main (b72cf1f) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2504     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        283     283             
  Lines      25729   25808     +79     
=======================================
+ Hits       25628   25707     +79     
  Misses       101     101             
Impacted Files Coverage Δ
...transformers/preprocessing/polynomial_detrender.py 97.6% <ø> (ø)
evalml/pipelines/components/component_base.py 100.0% <100.0%> (ø)
...valml/pipelines/components/estimators/estimator.py 100.0% <100.0%> (ø)
...components/transformers/imputers/target_imputer.py 100.0% <100.0%> (ø)
...s/components/transformers/samplers/base_sampler.py 100.0% <100.0%> (ø)
...l/pipelines/components/transformers/transformer.py 100.0% <100.0%> (ø)
evalml/pipelines/utils.py 99.2% <100.0%> (+0.2%) ⬆️
evalml/tests/component_tests/test_components.py 100.0% <100.0%> (ø)
evalml/tests/pipeline_tests/test_pipeline_utils.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b72cf1f...c802e00. Read the comment docs.

assert component_class.modifies_features


def test_component_parameters_supported_by_list_API():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will smoke test in #2490. Right now, all components are still "supported" in list API, until it's deprecated in #2490 :)

@angela97lin angela97lin changed the title Update make_pipelines for AutoML to create ComponentGraphs, not lists, to initialize pipelines Update make_pipelines for AutoML to create component dictionaries, not lists, to initialize pipelines Jul 15, 2021
@angela97lin angela97lin requested a review from dsherry July 15, 2021 17:45
Copy link
Collaborator

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesomesauce. Looks good to me! Nice work.

Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, and I love the diversity of tests. What would happen if you added a non-estimator as the last component of the component list after an estimator?
Regarding your question, I think it's perfectly fine to deprecate the list API for the reason you gave. However if we choose not to, I think it would be helpful to update the docs with a line about the inclusion of modifies_target and modifies_features.

evalml/pipelines/utils.py Outdated Show resolved Hide resolved
evalml/pipelines/utils.py Show resolved Hide resolved
Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job with the clean implementation and the easy-to-follow and thorough tests! LGTM! Left a suggestion on something we can maybe simplify, but not blocking.

Agreed with @ParthivNaresh about updating the docs with modifies_features and modifies_targets.

evalml/pipelines/components/component_base.py Outdated Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
@@ -80,6 +94,10 @@ def default_parameters(cls):

return cls._default_parameters

@classproperty
def _supported_by_list_API(cls):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the plan is move ahead with deprecating the list api for pipelines, I'm wondering if we even need this anymore?

It's currently only used in make_pipeline and it only called on the components from _get_preprocessing_components. That's an implementation we control - so those components should always support the list api right?

I guess what I'm trying to get at is that we decide to get rid of the list api, then the only "list to dict" conversion that would exist is in make_pipeline but that's only because _make_preprocessing_components was written way before the ComponentGraph abstraction! If you're with me so far, then I think what we should do is have _make_preprocessing_components generate a component graph dict directly.

That being said, I think this suggestion would only apply if we decide to deprecate the list api completely. If we go with the "half-half" solution from last week, I think the way you have this laid out currently is good!

I'm curious what you think about all this. It seems we're still on the fence about whether to get rid of the list api? 😅 Maybe this is something to broach with more members of the team?

@angela97lin
Copy link
Contributor Author

@ParthivNaresh Added a test with transformer after estimator, and answered your question. LMK if there's still any confusion! :) Thanks for the doc suggestion, added! (@bchen1116)

@freddyaboulton Ahh yes, there's so much back and forth 😂 I ultimately decided that since post-conversation last week, we decided to do the half-half solution, that's what I'll do in #2490 (in the process of updating right now, RIP). With that in mind, I'll keep _supported_by_list_API with its simple impl right now. make_pipeline still has some logic of grabbing all of the components in order first, then generating the graph rather than generating the graph as we add components because when I tried to impl the dict as we went along, the logic was not any simpler :/ With that in mind, going to keep with our half-half sol and as Parthiv suggested, add the public attributes to our doc 😁

@angela97lin angela97lin merged commit b01fe69 into main Jul 20, 2021
@angela97lin angela97lin deleted the 2493_component_graphs_make_pipelines branch July 20, 2021 04:27
@chukarsten chukarsten mentioned this pull request Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants