-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move get_estimators
to evalml/pipelines/components/utils.py
#934
Conversation
Codecov Report
@@ Coverage Diff @@
## main #934 +/- ##
=======================================
Coverage 99.87% 99.87%
=======================================
Files 171 171
Lines 8766 8771 +5
=======================================
+ Hits 8755 8760 +5
Misses 11 11
Continue to review full report at Codecov.
|
@@ -35,6 +34,7 @@ def list_model_families(problem_type): | |||
|
|||
estimators = [] | |||
problem_type = handle_problem_types(problem_type) | |||
from evalml.pipelines.components.utils import _all_estimators_used_in_search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delaying import until here to avoid circular dependencies 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine but we may want to think about a more sustainable solution before a wide release.
I think the issue is that model_family/utils
depends on components/utils
and components/utils
depends on model_family/utils
.
Maybe the solution is to pull components
out of pipelines
and place it at the top level of the repo and move get_estimators
and handle_component_class
from components/utils
to pipeline/utils
.
I think this structure more adequately reflects the dependence relation between the modules because model_family
and pipelines
both depend on the components and so components should be its own isolated module. Moreover, get_estimators
and handle_component_class
are only used for pipeline construction so pipelines/utils
is probably a more suitable home.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that sounds like a viable solution! I could see an argument for get_estimators
and handle_component_class
being in components/utils
though since handle_component_class
takes an input and returns a ComponentBase
object and similarly, get_estimators
returns Estimators which are also of ComponentBase
. 🤔
I guess for now, this will do but if we continue to have issues crop up we should revisit this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, @angela97lin your solution fixes the immediate issue, but I agree we should figure out the long-term plan. Let's talk after standup today @angela97lin @freddyaboulton !
@angela97lin should this be marked as fixing #911 in the PR body? |
@dsherry Yup, thanks for catching that. Updated! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin Looks good! But I'm concerned that if we don't make a more "sustainable" fix in the near future, we'll keep running into this issue. I propose that we rethink how the repo is laid out - the solution I have in mind is pulling components
out of pipelines
and having it be a stand-alone module. This drastically changes the scope of this issue so we should file a new issue and discuss more possible solutions.
What are your thoughts?
@@ -35,6 +34,7 @@ def list_model_families(problem_type): | |||
|
|||
estimators = [] | |||
problem_type = handle_problem_types(problem_type) | |||
from evalml.pipelines.components.utils import _all_estimators_used_in_search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine but we may want to think about a more sustainable solution before a wide release.
I think the issue is that model_family/utils
depends on components/utils
and components/utils
depends on model_family/utils
.
Maybe the solution is to pull components
out of pipelines
and place it at the top level of the repo and move get_estimators
and handle_component_class
from components/utils
to pipeline/utils
.
I think this structure more adequately reflects the dependence relation between the modules because model_family
and pipelines
both depend on the components and so components should be its own isolated module. Moreover, get_estimators
and handle_component_class
are only used for pipeline construction so pipelines/utils
is probably a more suitable home.
Closes #911
Although the original issue states suggests moving
list_model_families
back to pipelines, this PR simply movesget_estimators
to evalml/pipelines/components/utils.py while avoiding circular dependencies by deferring import until inside the function.Alternatively, we could combine the modules?