Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use one sampler in split DefaultAlgorithm pipelines #3696

Merged
merged 10 commits into from
Sep 13, 2022

Conversation

jeremyliweishih
Copy link
Collaborator

Fixes #3076.

@codecov
Copy link

codecov bot commented Sep 6, 2022

Codecov Report

Merging #3696 (0f4f20d) into main (0ac4dae) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3696     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        339     339             
  Lines      34431   34451     +20     
=======================================
+ Hits       34304   34324     +20     
  Misses       127     127             
Impacted Files Coverage Δ
...valml/automl/automl_algorithm/default_algorithm.py 100.0% <100.0%> (ø)
evalml/pipelines/utils.py 99.5% <100.0%> (+0.1%) ⬆️
...valml/tests/automl_tests/test_default_algorithm.py 100.0% <100.0%> (ø)
evalml/tests/pipeline_tests/test_pipeline_utils.py 99.8% <100.0%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@jeremyliweishih jeremyliweishih marked this pull request as ready for review September 8, 2022 17:38
@jeremyliweishih jeremyliweishih requested review from eccabay, chukarsten, christopherbunn and fjlanasa and removed request for eccabay September 8, 2022 17:41
@@ -742,6 +743,7 @@ def _make_pipeline_from_multiple_graphs(
pipeline_name (str): Custom name for the final pipeline.
sub_pipeline_names (Dict): Dictionary mapping original input pipeline names to new names. This will be used to rename components. Defaults to None.
prior_components (Dict): Component graph of components preceding the split of multiple graphs. Must be in component graph format, {"Label Encoder": ["Label Encoder", "X", "y"]} and currently restricted to components that only alter X input.
pre_estimator_components (Dict): Component graph of components before the estimator after the split of multiple graphs. Must be in component graph format, {"Label Encoder": ["Label Encoder", "X", "y"]} and currently restricted to components that only alter X input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this implementation - it's a smooth way of hooking up these sorts of components into the component graph. Slightly overengineered for the problem at hand, but will make our lives easier if/when we need to add more pre-estimator components in the future. Do you have any sort of plan for how we'll do that if that becomes the case? Since right now, we just have the one sampler and we set the input X and y knowing they'll be reset as the first_pre_estimator_component, but what would this look like with more components?
I don't think this is a blocking question, just something to keep in mind and maybe file an issue to track.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this implementation should work with more components if pre_estimator_components is a a component graph that hooks all the pre_estimator_components together. For example I believe the impl still would work if we have:

{sampler.name: [sampler.name, "X", "y"], "StandardScaler": ["StandardScaler, "sampler.X", "Sampler.y"]}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this implementation, I think this will be super useful in the future. I'm okay with merging it as-is, but we might want to consider updating the parameter names for prior_components and pre_estimator_components to be a bit more precise? Maybe something with a pre/post scheme like pre_pipeline_components/post_pipelines_components.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update @christopherbunn

@@ -742,6 +743,7 @@ def _make_pipeline_from_multiple_graphs(
pipeline_name (str): Custom name for the final pipeline.
sub_pipeline_names (Dict): Dictionary mapping original input pipeline names to new names. This will be used to rename components. Defaults to None.
prior_components (Dict): Component graph of components preceding the split of multiple graphs. Must be in component graph format, {"Label Encoder": ["Label Encoder", "X", "y"]} and currently restricted to components that only alter X input.
pre_estimator_components (Dict): Component graph of components before the estimator after the split of multiple graphs. Must be in component graph format, {"Label Encoder": ["Label Encoder", "X", "y"]} and currently restricted to components that only alter X input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this implementation, I think this will be super useful in the future. I'm okay with merging it as-is, but we might want to consider updating the parameter names for prior_components and pre_estimator_components to be a bit more precise? Maybe something with a pre/post scheme like pre_pipeline_components/post_pipelines_components.

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just need some minor cleanups with respect to logic, but otherwise, good to go.

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks man.

@jeremyliweishih jeremyliweishih merged commit 07d7af6 into main Sep 13, 2022
@jeremyliweishih jeremyliweishih deleted the js_one_sampler branch September 13, 2022 18:03
@chukarsten chukarsten mentioned this pull request Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use one sampler for split preprocessing pipeline
4 participants