-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
363 common method for registering components #898
363 common method for registering components #898
Conversation
Codecov Report
@@ Coverage Diff @@
## master #898 +/- ##
==========================================
- Coverage 99.83% 99.83% -0.01%
==========================================
Files 168 168
Lines 8352 8331 -21
==========================================
- Hits 8338 8317 -21
Misses 14 14
Continue to review full report at Codecov.
|
83cb4f2
to
0adc224
Compare
52370da
to
bb62700
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just discussed with Freddy, agreed on plan for changes. Its close though!
evalml/utils/gen_utils.py
Outdated
|
||
return classes | ||
|
||
return get_all_classes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the extra level of indirection here? Why not just return the list of classes?
evalml/utils/gen_utils.py
Outdated
classes = [] | ||
for cls in all_classes: | ||
try: | ||
cls(*args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm not blocking but perhaps we should just update pipelines to have parameters
default to an empty dict. Not sure there's a good reason why we require that for pipelines. What if you want a pipeline with default params? The way to get that would be to pass in an empty dict.
Could file separately
subclasses = [] | ||
|
||
while classes_to_check: | ||
subclass = classes_to_check.pop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we do this:
# __module__ == 'evalml.pipelines.components'
# __name__ == 'MyComponent'
if not 'evalml.pipelines' in subclass.__module__:
continue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
logger.debug('Estimator {} failed import, withholding from all_estimators'.format(estimator_name)) | ||
return estimators | ||
|
||
|
||
def get_estimators(problem_type, model_families=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also move this to evalml/pipelines/components/utils.py
, to be alongside the all_estimators
list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed offline, this leads to a circular dependency becausemodel_family
depends on pipeline_base.py
, which depends on pipelines/components/utils
which depends on model_family
evalml/pipelines/components/utils.py
Outdated
_estimator_message = 'Estimator {} failed import, withholding from all_estimators' | ||
_all_estimators = get_importable_subclasses(Estimator, args=[], | ||
message=_estimator_message, | ||
used_in_automl=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In discussion with Freddy. We are including this so that our tests can look up the components which we don't currently support in automl, e.g. extratrees, elasticnet, and baseline estimators. We hope to delete this in the future once those are supported for automl.
evalml/pipelines/components/utils.py
Outdated
used_in_automl=False) | ||
all_estimators_used_in_search = get_importable_subclasses(Estimator, args=[], | ||
message=_estimator_message, | ||
used_in_automl=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In discussion with Freddy. Let's keep all_estimators_used_in_search
private for now. Once we add all components to automl, we won't need to draw a distinction between those in our codebase vs those in our codebase supported for automl. And at that point we'll want to rename this. So, keeping private.
Use-cases to support:
Plan for this PR:
|
bb62700
to
e475845
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry This should be good for a second look now! The one thing I wanted to call out is that I saw another circular dependency issue with model_family and pipeline utils. I resolved this by not importing the pipeline utils in pipelines/__init__.py
and I left a comment explaining my reasoning. This change had some minor impacts on the api_reference.rst
file but apart from that, I kept pretty close to what we discussed earlier.
@@ -16,7 +16,7 @@ | |||
"import evalml\n", | |||
"from evalml import AutoMLSearch\n", | |||
"from evalml.demos import load_diabetes\n", | |||
"from evalml.pipelines import PipelineBase, get_pipelines\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed get_pipelines
because it is not used.
@@ -59,14 +59,7 @@ | |||
BaselineRegressionPipeline, | |||
MeanBaselineRegressionPipeline | |||
) | |||
from .utils import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either the model family utils or the pipeline utils are exposed at the top level of their respective modules but not both.
If both are exposed at the top level, we will get circular dependency errors. The reason is as follows:
model_family
is imported first. This needs to importpipelines/components/utils
because of the line fromevalml.pipelines.components.utils
import_all_estimators_used_in_search
- Before you can import
components/utils
, you need to runpipelines/__init__.py
If you expose, pipeline utils inpipelines/__init__.py
, then you will need to importmodel_family
which takes you full circle.
I decided to not expose the pipeline utils because they are only used by automl search internally and so users are not likely to do from evalml.pipelines.get_pipeline
(at least for now) whereas model_family.list_model_families
feels more intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, yeah sounds good to me. We can always change this later if there's a good reason to do so.
Also, there seems to be another issue with code cov like the one I brought up at stand up earliear this week. I reduced the hits and lines by the same amount and now I fail project coverage. I can't find any dark red lines in my diff though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton looks good!
I think we should wait to merge this until @angela97lin mergeas #904 . Unfortunately it looks like there will be conflicts to resolve but hopefully not too bad
all_pipelines | ||
get_pipelines | ||
get_estimators | ||
make_pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton why make this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since pipelines.utils
are no longer listed in pipelines/__init__.py
, we need to change the current module so that sphinx can build the docs.
@@ -59,14 +59,7 @@ | |||
BaselineRegressionPipeline, | |||
MeanBaselineRegressionPipeline | |||
) | |||
from .utils import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, yeah sounds good to me. We can always change this later if there's a good reason to do so.
_all_estimators = get_importable_subclasses(Estimator, args=[], used_in_automl=False) | ||
_all_estimators_used_in_search = get_importable_subclasses(Estimator, args=[], used_in_automl=True) | ||
_all_transformers = get_importable_subclasses(Transformer, args=[], used_in_automl=False) | ||
all_components = _all_estimators + _all_transformers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this is great.
evalml/pipelines/utils.py
Outdated
pipeline_name = pipeline_class.name | ||
logger.debug('Pipeline {} failed import, withholding from all_pipelines'.format(pipeline_name)) | ||
return pipelines | ||
all_pipelines = get_importable_subclasses(PipelineBase, args=[{}], used_in_automl=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once #904 is merged, we can delete this, and we can then also delete the args
input.
components = all_components() | ||
for component_name, component_class in components.items(): | ||
components = all_components | ||
for component_class in components: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could make into one line
class ChildClass(ComponentBase): | ||
pass | ||
|
||
assert ChildClass not in get_importable_subclasses(ComponentBase, args=[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
Pull Request Description
Addresses #363 by creating a function called
get_importable_subclasses
for creating a list of all non-base-class subclasses ofTransformer
,Estimator
, andPipelineBase
. This function has a flag for filtering for estimators/pipelines used for automl search. It should be removed in #899.After creating the pull request: in order to pass the changelog_updated check you will need to update the "Future Release" section of
docs/source/changelog.rst
to include this pull request by adding :pr:123
.