Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move get_estimators to evalml/pipelines/components/utils.py #934

Merged
merged 13 commits into from Jul 21, 2020

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Jul 15, 2020

Closes #911

Although the original issue states suggests moving list_model_families back to pipelines, this PR simply moves get_estimators to evalml/pipelines/components/utils.py while avoiding circular dependencies by deferring import until inside the function.

Alternatively, we could combine the modules?

@angela97lin angela97lin self-assigned this Jul 15, 2020
@codecov
Copy link

codecov bot commented Jul 15, 2020

Codecov Report

Merging #934 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #934   +/-   ##
=======================================
  Coverage   99.87%   99.87%           
=======================================
  Files         171      171           
  Lines        8766     8771    +5     
=======================================
+ Hits         8755     8760    +5     
  Misses         11       11           
Impacted Files Coverage Δ
evalml/automl/automl_search.py 99.53% <100.00%> (+<0.01%) ⬆️
evalml/model_family/utils.py 100.00% <100.00%> (ø)
evalml/pipelines/components/utils.py 100.00% <100.00%> (ø)
evalml/pipelines/utils.py 100.00% <100.00%> (ø)
evalml/tests/automl_tests/test_automl.py 100.00% <100.00%> (ø)
.../automl_tests/test_automl_search_classification.py 100.00% <100.00%> (ø)
...ests/automl_tests/test_automl_search_regression.py 100.00% <100.00%> (ø)
evalml/tests/component_tests/test_estimators.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 66e2347...1c415af. Read the comment docs.

@@ -35,6 +34,7 @@ def list_model_families(problem_type):

estimators = []
problem_type = handle_problem_types(problem_type)
from evalml.pipelines.components.utils import _all_estimators_used_in_search
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delaying import until here to avoid circular dependencies 😅

https://stackabuse.com/python-circular-imports/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine but we may want to think about a more sustainable solution before a wide release.

I think the issue is that model_family/utils depends on components/utils and components/utils depends on model_family/utils.

Maybe the solution is to pull components out of pipelines and place it at the top level of the repo and move get_estimators and handle_component_class from components/utils to pipeline/utils.

I think this structure more adequately reflects the dependence relation between the modules because model_family and pipelines both depend on the components and so components should be its own isolated module. Moreover, get_estimators and handle_component_class are only used for pipeline construction so pipelines/utils is probably a more suitable home.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that sounds like a viable solution! I could see an argument for get_estimators and handle_component_class being in components/utils though since handle_component_class takes an input and returns a ComponentBase object and similarly, get_estimators returns Estimators which are also of ComponentBase. 🤔

I guess for now, this will do but if we continue to have issues crop up we should revisit this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, @angela97lin your solution fixes the immediate issue, but I agree we should figure out the long-term plan. Let's talk after standup today @angela97lin @freddyaboulton !

@angela97lin angela97lin marked this pull request as ready for review July 16, 2020 14:05
@dsherry
Copy link
Collaborator

dsherry commented Jul 16, 2020

@angela97lin should this be marked as fixing #911 in the PR body?

@angela97lin
Copy link
Contributor Author

@dsherry Yup, thanks for catching that. Updated!

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin Looks good! But I'm concerned that if we don't make a more "sustainable" fix in the near future, we'll keep running into this issue. I propose that we rethink how the repo is laid out - the solution I have in mind is pulling components out of pipelines and having it be a stand-alone module. This drastically changes the scope of this issue so we should file a new issue and discuss more possible solutions.

What are your thoughts?

@@ -35,6 +34,7 @@ def list_model_families(problem_type):

estimators = []
problem_type = handle_problem_types(problem_type)
from evalml.pipelines.components.utils import _all_estimators_used_in_search
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine but we may want to think about a more sustainable solution before a wide release.

I think the issue is that model_family/utils depends on components/utils and components/utils depends on model_family/utils.

Maybe the solution is to pull components out of pipelines and place it at the top level of the repo and move get_estimators and handle_component_class from components/utils to pipeline/utils.

I think this structure more adequately reflects the dependence relation between the modules because model_family and pipelines both depend on the components and so components should be its own isolated module. Moreover, get_estimators and handle_component_class are only used for pipeline construction so pipelines/utils is probably a more suitable home.

@angela97lin angela97lin added this to the July 2020 milestone Jul 17, 2020
@angela97lin angela97lin merged commit d710bc7 into main Jul 21, 2020
2 checks passed
@angela97lin angela97lin deleted the 911_circular branch July 21, 2020 15:13
@angela97lin angela97lin mentioned this pull request Jul 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consolidate utils to avoid circular imports
3 participants