Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add universal error for predict/transform before fitting #969

Merged
merged 29 commits into from
Jul 28, 2020

Conversation

jeremyliweishih
Copy link
Collaborator

@jeremyliweishih jeremyliweishih commented Jul 23, 2020

Fixes #851

  • Add metaclass and wrappers
  • Add basic mock tests
  • Fix component tests
  • Remove random code that checks for it
  • Feature importance
  • Deal with pipelines
  • Fix all tests
  • Add test iterating through all components

Make new issues about pipelines/needs_fitting/mocking.

@jeremyliweishih jeremyliweishih changed the title Js 851 error Add universal error for predict/transform before fitting Jul 23, 2020
@codecov
Copy link

codecov bot commented Jul 24, 2020

Codecov Report

Merging #969 into main will increase coverage by 0.17%.
The diff coverage is 99.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #969      +/-   ##
==========================================
+ Coverage   99.68%   99.85%   +0.17%     
==========================================
  Files         178      178              
  Lines        9163     9282     +119     
==========================================
+ Hits         9134     9269     +135     
+ Misses         29       13      -16     
Impacted Files Coverage Δ
...ents/estimators/classifiers/baseline_classifier.py 100.00% <ø> (ø)
...onents/estimators/regressors/baseline_regressor.py 100.00% <ø> (ø)
...components/transformers/encoders/onehot_encoder.py 100.00% <ø> (ø)
...components/transformers/imputers/simple_imputer.py 100.00% <ø> (ø)
.../transformers/preprocessing/datetime_featurizer.py 100.00% <ø> (ø)
...ts/transformers/preprocessing/drop_null_columns.py 100.00% <ø> (ø)
.../tests/component_tests/test_baseline_classifier.py 100.00% <ø> (ø)
...l/tests/component_tests/test_baseline_regressor.py 100.00% <ø> (ø)
.../tests/component_tests/test_datetime_featurizer.py 100.00% <ø> (ø)
...ponent_tests/test_drop_null_columns_transformer.py 100.00% <ø> (ø)
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b71cb22...4ca2d4d. Read the comment docs.

@jeremyliweishih
Copy link
Collaborator Author

@dsherry PR should be ready but there's this weird lint issue going on (where I'm not getting locally but failing on CircleCI).

@jeremyliweishih
Copy link
Collaborator Author

@dsherry I think we should give the same treatment for pipelines by metaclass wrappers for predict etc. as well but I'm pushing that out of the scope of this PR. However, as it currently stands if someone class predict before fit on a pipeline the error will surface from the component itself.

@jeremyliweishih jeremyliweishih marked this pull request as ready for review July 24, 2020 20:18
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyliweishih This is so cool! Never seen metaclasses in action before. My only blocking comment is about whether there are better alternatives than defining a static NO_FITTING_REQUIRED list. I can see pros and cons so looking forward to hear what you think. Apart from that, nothing blocking.

evalml/pipelines/components/wrappers.py Outdated Show resolved Hide resolved
evalml/pipelines/components/wrappers.py Outdated Show resolved Hide resolved
evalml/pipelines/components/component_base.py Outdated Show resolved Hide resolved

from evalml.exceptions import UnfitComponentError

NO_FITTING_REQUIRED = ['DropColumns', 'SelectColumns']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to this would be adding a _no_fitting_required flag to ComponentBase like you did with _has_fit. Might be nice to let developers turn the check_for_fit functionality off if they need to without having them modify this file. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton oh that's a neat idea. So like this?

class NonLearningComponent(Transformer):
    fitting_required = False
    ...
# the call below won't error
NonLearningComponent().predict(X)

I think that's cool. I think it would be good to do this in a separate PR since there's already a lot in here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that's what I was thinking and it makes sense to move it to another PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton I like the idea! We used to have a field on components called needs_fitting and from what I recall we removed it and made the assumption that all components needs "fitting" even if fitting doesn't do anything. From the components I had to alter (Drop Columns etc.) it seems like there has been more components where it would be more clear to add a field back in denoting fitting.

I'll file another issue after this PR is merged to add that back in @dsherry!

evalml/tests/component_tests/test_components.py Outdated Show resolved Hide resolved
evalml/pipelines/components/wrappers.py Outdated Show resolved Hide resolved
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyliweishih wow, it's so cool that you got this working!! Especially for the properties. Very nice 🎉

I think we should keep all the metaclass code in the same file. I think its fine if that's component_base.py. I also think we should keep the wrapper-making functions as classmethods on the metaclass.

I left a few naming suggestions. I agree we should delete the old test coverage left over from individual components.

Will approve once we round off those conversations! I left other suggestions but nothing else which was blocking.

parameters = {'param_a': param_a, 'param_b': param_b}
super().__init__(parameters=parameters,
component_obj=None,
random_state=0)

def fit(self, X, y=None):
self.is_fitted = True
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will cause issues with codecov. Try

def fit(self, X, y=None):
    """Docstring"""

I vaguely remember talking with @angela97lin a couple months back and realizing this would resolve a codecov issue we were seeing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But its weird because the exceptions use pass and are fine in codecov! Idk 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I remember using docstrings as a way to cover this case after reading this SO post: https://stackoverflow.com/questions/9202723/excluding-abstractproperties-from-coverage-reports

Not sure why exceptions are fine, but maybe a guess is the way python executes code? That is, when we raise an exception, what's within the class gets scanned? Idk, no idea 🤷‍♀️

evalml/exceptions/exceptions.py Show resolved Hide resolved
evalml/pipelines/components/component_base.py Outdated Show resolved Hide resolved
evalml/pipelines/components/component_base.py Outdated Show resolved Hide resolved
evalml/pipelines/components/component_base.py Show resolved Hide resolved
evalml/pipelines/components/wrappers.py Outdated Show resolved Hide resolved
trans.transform(X)


def test_all_components_check_fit(X_y_binary):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyliweishih what's the runtime of this test? If its high, we could consider mocking the impl methods

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we don't care what happens in the actual fit/transform/predict, just that the wrappers get called correctly

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Cool stuff!!

Just left a blocking comment about updating tests :D

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyliweishih This is awesome! Thanks for the hard work on this. My comments have been addressed. The only question I have is whether we need to check the X is None and y is None case in check_for_fit

klass = type(self).__name__
if not self._is_fitted and klass not in cls.NO_FITTING_REQUIRED:
raise ComponentNotYetFittedError('This {klass} is not fitted yet. You must fit {klass} before calling {method.__name__}.')
elif X is None and y is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton I consolidated the property version (that I had before) into one method so this is needed for properties.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Thanks for clarifying.

@jeremyliweishih jeremyliweishih merged commit ad5a695 into main Jul 28, 2020
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice going

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Components and pipelines: standardize error when calling predict/transform before fit
4 participants