-
Notifications
You must be signed in to change notification settings - Fork 91
Adding inverse transform method to pipelines #2256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
842f409 to
a795b19
Compare
| if self.estimator.predict_uses_y: | ||
| y_arg = y | ||
| predictions = self.estimator.predict(features_no_nan, y_arg).to_series() | ||
| predictions.index = y.index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The polynomial detrender needs the predictions to have an index of the same type as the data it was fit on.
I think this is another reason to do #1639
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line equivalent to using this function? It seems nit picky and trivial, but I feel like I've personally been bitten with weird behavior when I don't use pandas' established API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think unfortunately in this case reindex_like is not avaliable for series, but even if I'm wrong it'll fill NaNs in the places where the indices don't match. I think reindex does the same thing.
| return {} | ||
|
|
||
|
|
||
| class TargetTransformer(Transformer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need this abstraction to make it clear which components we expect to be able to invert. It also makes the implementation of ComponentGraph.from_list much easier (imo) without having to make every component return both X and y.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
Codecov Report
@@ Coverage Diff @@
## main #2256 +/- ##
=======================================
+ Coverage 99.9% 99.9% +0.1%
=======================================
Files 281 281
Lines 24607 24719 +112
=======================================
+ Hits 24578 24690 +112
Misses 29 29
Continue to review full report at Codecov.
|
0c52447 to
2c640ca
Compare
| y_arg = y | ||
| predictions = self.estimator.predict(features_no_nan, y_arg).to_series() | ||
| predictions.index = y.index | ||
| predictions = self.inverse_transform(predictions).to_series() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now the time series regression pipeline is the only pipeline that uses the inverse_transform method.
As we add more target transformers (target encoder as a component, #914), I expect the other pipeline classes to call the inverse_transform method.
| return infer_feature_types(y / 2) | ||
|
|
||
|
|
||
| class SubsetData(Transformer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To simulate a transformer that modifies the target but is not a target transformer, e.g. a sampler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to add this to the class' docstring so that if someone needs to write a new test, they might be able to reuse this.
2c640ca to
e6f7068
Compare
7b67aa8 to
180b2dc
Compare
chukarsten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like great work!
evalml/pipelines/component_graph.py
Outdated
| sampler_index = next(iter([i for i, tup in enumerate(without_inverse_transform) if "sampler" in tup[0]]), -1) | ||
| component_dict[without_inverse_transform[sampler_index][0]].append(f"{with_inverse_transform[-1][0]}.y") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hahaha, oh my dude. This is some big brain stuff.
evalml/pipelines/component_graph.py
Outdated
| def _get_parent_y(self, component_name): | ||
| """Helper for inverse_transform method.""" | ||
| parents = self.get_parents(component_name) | ||
| return next(iter(p[:-2] for p in parents if '.y' in p), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the -2 so that we capture the string of the parent? I'm a little about what happens if we change all these ".y's"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I think our component graph would break but it would be covered by unit tests at least. I think if a user does something crazy like .z or something, an exception about the graph not being connected should be raised. Maybe we can make that clearer in the future.
| return {} | ||
|
|
||
|
|
||
| class TargetTransformer(Transformer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
| if self.estimator.predict_uses_y: | ||
| y_arg = y | ||
| predictions = self.estimator.predict(features_no_nan, y_arg).to_series() | ||
| predictions.index = y.index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line equivalent to using this function? It seems nit picky and trivial, but I feel like I've personally been bitten with weird behavior when I don't use pandas' established API
| return infer_feature_types(y / 2) | ||
|
|
||
|
|
||
| class SubsetData(Transformer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to add this to the class' docstring so that if someone needs to write a new test, they might be able to reuse this.
| component_graphs = [(ComponentGraph({"Imputer": [Imputer], | ||
| "Log": [LogTransform], | ||
| "Random Forest": ["Random Forest Regressor", "Imputer.x", "Log.y"]}), | ||
| lambda y: infer_feature_types(np.exp(y.to_series()))), | ||
| (ComponentGraph({"Imputer": [Imputer], | ||
| "Log": [LogTransform], | ||
| "Double": [DoubleTransform, "Log.y"], | ||
| "Random Forest": ["Random Forest Regressor", "Imputer.x", "Double.y"]}), | ||
| lambda y: infer_feature_types(np.exp(y.to_series() / 2))), | ||
| (ComponentGraph({"Imputer": [Imputer], | ||
| "Log": [LogTransform, "Imputer.x"], | ||
| "Double": [DoubleTransform, "Log.x", "Log.y"], | ||
| "Random Forest": ["Random Forest Regressor", "Double.x", "Double.y"]}), | ||
| lambda y: infer_feature_types(np.exp(y.to_series() / 2))), | ||
| (ComponentGraph({"Imputer": [Imputer], | ||
| "OneHot": [OneHotEncoder, "Imputer.x"], | ||
| "DateTime": [DateTimeFeaturizer, "OneHot.x"], | ||
| "Log": [LogTransform], | ||
| "Double": [DoubleTransform, "Log.y"], | ||
| "Random Forest": ["Random Forest Regressor", "DateTime.x", "Double.y"]}), | ||
| lambda y: infer_feature_types(np.exp(y.to_series() / 2))), | ||
| (ComponentGraph({"Imputer": [Imputer], | ||
| "OneHot": [OneHotEncoder, "Imputer.x"], | ||
| "DateTime": [DateTimeFeaturizer, "OneHot.x"], | ||
| "Log": [LogTransform], | ||
| "Double": [DoubleTransform, "Log.y"], | ||
| "Double2": [DoubleTransform, "Double.y"], | ||
| "Random Forest": ["Random Forest Regressor", "DateTime.x", "Double2.y"]}), | ||
| lambda y: infer_feature_types(np.exp(y.to_series() / 4))), | ||
| (ComponentGraph({"Imputer": ['Imputer'], | ||
| "Double": [DoubleTransform], | ||
| "DateTime 1": ["DateTime Featurization Component", "Imputer"], | ||
| "ET": ["Extra Trees Regressor", "DateTime 1.x", "Double.y"], | ||
| "Double 2": [DoubleTransform], | ||
| "DateTime 2": ["DateTime Featurization Component", "Imputer"], | ||
| "Double 3": [DoubleTransform, "Double 2.y"], | ||
| "RandomForest": ["Random Forest Regressor", "DateTime 2.x", "Double 3.y"], | ||
| "DateTime 3": ["DateTime Featurization Component", "Imputer"], | ||
| "Double 4": [DoubleTransform], | ||
| "Linear": ["Linear Regressor", "DateTime 3.x", "Double 4.y"], | ||
| "Logistic Regression": ["Linear Regressor", "Linear", "RandomForest", "ET", 'Double 3.y']}), | ||
| lambda y: infer_feature_types(y.to_series() / 4)), | ||
| (ComponentGraph({"Imputer": [Imputer], | ||
| "OneHot": [OneHotEncoder, "Imputer.x"], | ||
| "DateTime": [DateTimeFeaturizer, "OneHot.x"], | ||
| "Log": [LogTransform], | ||
| "Double": [DoubleTransform, "Log.y"], | ||
| "Double2": [DoubleTransform, "Double.y"], | ||
| "Subset": [SubsetData, "DateTime.x", "Double2.y"], | ||
| "Random Forest": ["Random Forest Regressor", "Subset.x", "Subset.y"]}), | ||
| lambda y: infer_feature_types(np.exp(y.to_series() / 4))), | ||
| (ComponentGraph({"Imputer": [Imputer], | ||
| "Random Forest": ["Random Forest Regressor", "Imputer.x"]}), | ||
| lambda y: y), | ||
| (ComponentGraph({"Imputer": [Imputer], | ||
| "DateTime": [DateTimeFeaturizer, "Imputer.x"], | ||
| "OneHot": [OneHotEncoder, "DateTime.x"], | ||
| "Random Forest": ["Random Forest Regressor", "OneHot.x"]}), | ||
| lambda y: y), | ||
| (ComponentGraph({"Random Forest": ["Random Forest Regressor"]}), | ||
| lambda y: y), | ||
| (ComponentGraph({"Imputer": ['Imputer'], | ||
| "Double": [DoubleTransform], | ||
| "DateTime 1": ["DateTime Featurization Component", "Imputer"], | ||
| "ET": ["Extra Trees Regressor", "DateTime 1.x", "Double.y"], | ||
| "Double 2": [DoubleTransform], | ||
| "DateTime 2": ["DateTime Featurization Component", "Imputer"], | ||
| "Double 3": [DoubleTransform, "Double 2.y"], | ||
| "RandomForest": ["Random Forest Regressor", "DateTime 2.x", "Double 3.y"], | ||
| "DateTime 3": ["DateTime Featurization Component", "Imputer"], | ||
| "Double 4": [DoubleTransform], | ||
| "Linear": ["Linear Regressor", "DateTime 3.x", "Double 4.y"], | ||
| "Logistic Regression": ["Linear Regressor", "Linear", "RandomForest", "ET"]}), | ||
| lambda y: y) | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, my dude...nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
holy potates
180b2dc to
1dbecbf
Compare
bchen1116
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments and questions just for my own clarification purposes. The tests look great!
| raise ComponentNotYetFittedError(f'This {klass} is not fitted yet. You must fit {klass} before calling {method.__name__}.') | ||
| elif method.__name__ == 'inverse_transform': | ||
| return method(self, X, y) | ||
| return method(self, X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't we return method(self, y)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since inverse transform only takes one argument, the y is actually "called" X in this piece of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow that's trippy... Any chance you can leave that as a comment for future notes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do!
evalml/pipelines/pipeline_base.py
Outdated
| Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd). | ||
| Arguments: | ||
| y: (pd.Series, ww.DataTable): Final component features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit pick (no colon after y):
y (pd.Series, ww.DataTable): Final component features
evalml/pipelines/pipeline_meta.py
Outdated
| return method(self, X) | ||
| elif method.__name__ == 'predict': | ||
| return method(self, X, objective) | ||
| elif method.__name__ == 'inverse_transform': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above?
|
|
||
| @pytest.mark.parametrize("component_list,answer", lists) | ||
| def test_from_list_with_target_transformers(component_list, answer): | ||
| assert ComponentGraph.from_list(component_list).component_dict == answer.component_dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are 👌
ed17111 to
f54bc19
Compare
| * Added ``inverse_transform`` method to pipelines :pr:`2256`` | ||
| * Fixes | ||
| * Changes | ||
| * Updated logging information in ``AutoMLSearch.__init__`` to clarify pipeline generation :pr:`2263` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ParthivNaresh I'm moving this from the 0.24.1 release to the future release!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton Sure thing!
ParthivNaresh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent job with the tests, looks great!
| * Added ``inverse_transform`` method to pipelines :pr:`2256`` | ||
| * Fixes | ||
| * Changes | ||
| * Updated logging information in ``AutoMLSearch.__init__`` to clarify pipeline generation :pr:`2263` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton Sure thing!
evalml/pipelines/component_graph.py
Outdated
| # If there are target transformers present, connect them together and then pass the final output | ||
| # to the sampler (if present) or the final estimator | ||
| with_inverse_transform = list(filter(lambda tup: issubclass(tup[1], TargetTransformer), component_list)) | ||
| without_inverse_transform = list(filter(lambda tup: not issubclass(tup[1], TargetTransformer), component_list)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done
bchen1116
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
69dfa01 to
94b690d
Compare
angela97lin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some nitpicky comments but otherwise, really cool stuff!!
evalml/pipelines/component_graph.py
Outdated
| Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd). | ||
| Arguments: | ||
| y: (pd.Series, ww.DataTable): Final component features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo, DataColumn?
| class PolynomialDetrender(TargetTransformer): | ||
| """Removes trends from time series by fitting a polynomial to the data.""" | ||
| name = "Polynomial Detrender" | ||
| _defines_inverse_transform = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to get rid of this!
| """A component that transforms the target.""" | ||
|
|
||
| @abstractmethod | ||
| def inverse_transform(self, y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be useful in docstring here to specify arguments + what we expect to return for when people override :)
evalml/pipelines/pipeline_base.py
Outdated
| Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd). | ||
| Arguments: | ||
| y (pd.Series, ww.DataTable): Final component features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataColumn? :O
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
| component_graphs = [(ComponentGraph({"Imputer": [Imputer], | ||
| "Log": [LogTransform], | ||
| "Random Forest": ["Random Forest Regressor", "Imputer.x", "Log.y"]}), | ||
| lambda y: infer_feature_types(np.exp(y.to_series()))), | ||
| (ComponentGraph({"Imputer": [Imputer], | ||
| "Log": [LogTransform], | ||
| "Double": [DoubleTransform, "Log.y"], | ||
| "Random Forest": ["Random Forest Regressor", "Imputer.x", "Double.y"]}), | ||
| lambda y: infer_feature_types(np.exp(y.to_series() / 2))), | ||
| (ComponentGraph({"Imputer": [Imputer], | ||
| "Log": [LogTransform, "Imputer.x"], | ||
| "Double": [DoubleTransform, "Log.x", "Log.y"], | ||
| "Random Forest": ["Random Forest Regressor", "Double.x", "Double.y"]}), | ||
| lambda y: infer_feature_types(np.exp(y.to_series() / 2))), | ||
| (ComponentGraph({"Imputer": [Imputer], | ||
| "OneHot": [OneHotEncoder, "Imputer.x"], | ||
| "DateTime": [DateTimeFeaturizer, "OneHot.x"], | ||
| "Log": [LogTransform], | ||
| "Double": [DoubleTransform, "Log.y"], | ||
| "Random Forest": ["Random Forest Regressor", "DateTime.x", "Double.y"]}), | ||
| lambda y: infer_feature_types(np.exp(y.to_series() / 2))), | ||
| (ComponentGraph({"Imputer": [Imputer], | ||
| "OneHot": [OneHotEncoder, "Imputer.x"], | ||
| "DateTime": [DateTimeFeaturizer, "OneHot.x"], | ||
| "Log": [LogTransform], | ||
| "Double": [DoubleTransform, "Log.y"], | ||
| "Double2": [DoubleTransform, "Double.y"], | ||
| "Random Forest": ["Random Forest Regressor", "DateTime.x", "Double2.y"]}), | ||
| lambda y: infer_feature_types(np.exp(y.to_series() / 4))), | ||
| (ComponentGraph({"Imputer": ['Imputer'], | ||
| "Double": [DoubleTransform], | ||
| "DateTime 1": ["DateTime Featurization Component", "Imputer"], | ||
| "ET": ["Extra Trees Regressor", "DateTime 1.x", "Double.y"], | ||
| "Double 2": [DoubleTransform], | ||
| "DateTime 2": ["DateTime Featurization Component", "Imputer"], | ||
| "Double 3": [DoubleTransform, "Double 2.y"], | ||
| "RandomForest": ["Random Forest Regressor", "DateTime 2.x", "Double 3.y"], | ||
| "DateTime 3": ["DateTime Featurization Component", "Imputer"], | ||
| "Double 4": [DoubleTransform], | ||
| "Linear": ["Linear Regressor", "DateTime 3.x", "Double 4.y"], | ||
| "Logistic Regression": ["Linear Regressor", "Linear", "RandomForest", "ET", 'Double 3.y']}), | ||
| lambda y: infer_feature_types(y.to_series() / 4)), | ||
| (ComponentGraph({"Imputer": [Imputer], | ||
| "OneHot": [OneHotEncoder, "Imputer.x"], | ||
| "DateTime": [DateTimeFeaturizer, "OneHot.x"], | ||
| "Log": [LogTransform], | ||
| "Double": [DoubleTransform, "Log.y"], | ||
| "Double2": [DoubleTransform, "Double.y"], | ||
| "Subset": [SubsetData, "DateTime.x", "Double2.y"], | ||
| "Random Forest": ["Random Forest Regressor", "Subset.x", "Subset.y"]}), | ||
| lambda y: infer_feature_types(np.exp(y.to_series() / 4))), | ||
| (ComponentGraph({"Imputer": [Imputer], | ||
| "Random Forest": ["Random Forest Regressor", "Imputer.x"]}), | ||
| lambda y: y), | ||
| (ComponentGraph({"Imputer": [Imputer], | ||
| "DateTime": [DateTimeFeaturizer, "Imputer.x"], | ||
| "OneHot": [OneHotEncoder, "DateTime.x"], | ||
| "Random Forest": ["Random Forest Regressor", "OneHot.x"]}), | ||
| lambda y: y), | ||
| (ComponentGraph({"Random Forest": ["Random Forest Regressor"]}), | ||
| lambda y: y), | ||
| (ComponentGraph({"Imputer": ['Imputer'], | ||
| "Double": [DoubleTransform], | ||
| "DateTime 1": ["DateTime Featurization Component", "Imputer"], | ||
| "ET": ["Extra Trees Regressor", "DateTime 1.x", "Double.y"], | ||
| "Double 2": [DoubleTransform], | ||
| "DateTime 2": ["DateTime Featurization Component", "Imputer"], | ||
| "Double 3": [DoubleTransform, "Double 2.y"], | ||
| "RandomForest": ["Random Forest Regressor", "DateTime 2.x", "Double 3.y"], | ||
| "DateTime 3": ["DateTime Featurization Component", "Imputer"], | ||
| "Double 4": [DoubleTransform], | ||
| "Linear": ["Linear Regressor", "DateTime 3.x", "Double 4.y"], | ||
| "Logistic Regression": ["Linear Regressor", "Linear", "RandomForest", "ET"]}), | ||
| lambda y: y) | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
holy potates
b17e826 to
df40b9a
Compare
8705adf to
d9e484d
Compare
d9e484d to
a67528c
Compare
Pull Request Description
Fixes #2046
Follows the recommendation and implementation plan outlined in the design document
After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rstto include this pull request by adding :pr:123.