Skip to content

Conversation

@freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton commented May 11, 2021

Pull Request Description

Fixes #2046

Follows the recommendation and implementation plan outlined in the design document


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@freddyaboulton freddyaboulton force-pushed the 2046-inverse-transform branch from 842f409 to a795b19 Compare May 11, 2021 22:26
if self.estimator.predict_uses_y:
y_arg = y
predictions = self.estimator.predict(features_no_nan, y_arg).to_series()
predictions.index = y.index
Copy link
Contributor Author

@freddyaboulton freddyaboulton May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polynomial detrender needs the predictions to have an index of the same type as the data it was fit on.
I think this is another reason to do #1639

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line equivalent to using this function? It seems nit picky and trivial, but I feel like I've personally been bitten with weird behavior when I don't use pandas' established API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think unfortunately in this case reindex_like is not avaliable for series, but even if I'm wrong it'll fill NaNs in the places where the indices don't match. I think reindex does the same thing.

return {}


class TargetTransformer(Transformer):
Copy link
Contributor Author

@freddyaboulton freddyaboulton May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need this abstraction to make it clear which components we expect to be able to invert. It also makes the implementation of ComponentGraph.from_list much easier (imo) without having to make every component return both X and y.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

@codecov
Copy link

codecov bot commented May 12, 2021

Codecov Report

Merging #2256 (a67528c) into main (432f876) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2256     +/-   ##
=======================================
+ Coverage   99.9%   99.9%   +0.1%     
=======================================
  Files        281     281             
  Lines      24607   24719    +112     
=======================================
+ Hits       24578   24690    +112     
  Misses        29      29             
Impacted Files Coverage Δ
evalml/pipelines/component_graph.py 100.0% <100.0%> (ø)
evalml/pipelines/components/component_base_meta.py 100.0% <100.0%> (ø)
...transformers/preprocessing/polynomial_detrender.py 97.6% <100.0%> (ø)
...l/pipelines/components/transformers/transformer.py 100.0% <100.0%> (ø)
evalml/pipelines/pipeline_base.py 99.7% <100.0%> (+0.1%) ⬆️
evalml/pipelines/pipeline_meta.py 100.0% <100.0%> (ø)
...valml/pipelines/time_series_regression_pipeline.py 100.0% <100.0%> (ø)
...tests/component_tests/test_polynomial_detrender.py 100.0% <100.0%> (ø)
...valml/tests/pipeline_tests/test_component_graph.py 100.0% <100.0%> (ø)
.../tests/pipeline_tests/test_time_series_pipeline.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 432f876...a67528c. Read the comment docs.

@freddyaboulton freddyaboulton force-pushed the 2046-inverse-transform branch 2 times, most recently from 0c52447 to 2c640ca Compare May 13, 2021 14:09
y_arg = y
predictions = self.estimator.predict(features_no_nan, y_arg).to_series()
predictions.index = y.index
predictions = self.inverse_transform(predictions).to_series()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the time series regression pipeline is the only pipeline that uses the inverse_transform method.

As we add more target transformers (target encoder as a component, #914), I expect the other pipeline classes to call the inverse_transform method.

return infer_feature_types(y / 2)


class SubsetData(Transformer):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simulate a transformer that modifies the target but is not a target transformer, e.g. a sampler

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to add this to the class' docstring so that if someone needs to write a new test, they might be able to reuse this.

@freddyaboulton freddyaboulton force-pushed the 2046-inverse-transform branch from 2c640ca to e6f7068 Compare May 13, 2021 17:01
@freddyaboulton freddyaboulton marked this pull request as ready for review May 13, 2021 20:09
@freddyaboulton freddyaboulton force-pushed the 2046-inverse-transform branch 2 times, most recently from 7b67aa8 to 180b2dc Compare May 14, 2021 17:50
Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like great work!

Comment on lines 105 to 106
sampler_index = next(iter([i for i, tup in enumerate(without_inverse_transform) if "sampler" in tup[0]]), -1)
component_dict[without_inverse_transform[sampler_index][0]].append(f"{with_inverse_transform[-1][0]}.y")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahaha, oh my dude. This is some big brain stuff.

def _get_parent_y(self, component_name):
"""Helper for inverse_transform method."""
parents = self.get_parents(component_name)
return next(iter(p[:-2] for p in parents if '.y' in p), None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the -2 so that we capture the string of the parent? I'm a little about what happens if we change all these ".y's"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think our component graph would break but it would be covered by unit tests at least. I think if a user does something crazy like .z or something, an exception about the graph not being connected should be raised. Maybe we can make that clearer in the future.

return {}


class TargetTransformer(Transformer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

if self.estimator.predict_uses_y:
y_arg = y
predictions = self.estimator.predict(features_no_nan, y_arg).to_series()
predictions.index = y.index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line equivalent to using this function? It seems nit picky and trivial, but I feel like I've personally been bitten with weird behavior when I don't use pandas' established API

return infer_feature_types(y / 2)


class SubsetData(Transformer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to add this to the class' docstring so that if someone needs to write a new test, they might be able to reuse this.

Comment on lines 904 to 978
component_graphs = [(ComponentGraph({"Imputer": [Imputer],
"Log": [LogTransform],
"Random Forest": ["Random Forest Regressor", "Imputer.x", "Log.y"]}),
lambda y: infer_feature_types(np.exp(y.to_series()))),
(ComponentGraph({"Imputer": [Imputer],
"Log": [LogTransform],
"Double": [DoubleTransform, "Log.y"],
"Random Forest": ["Random Forest Regressor", "Imputer.x", "Double.y"]}),
lambda y: infer_feature_types(np.exp(y.to_series() / 2))),
(ComponentGraph({"Imputer": [Imputer],
"Log": [LogTransform, "Imputer.x"],
"Double": [DoubleTransform, "Log.x", "Log.y"],
"Random Forest": ["Random Forest Regressor", "Double.x", "Double.y"]}),
lambda y: infer_feature_types(np.exp(y.to_series() / 2))),
(ComponentGraph({"Imputer": [Imputer],
"OneHot": [OneHotEncoder, "Imputer.x"],
"DateTime": [DateTimeFeaturizer, "OneHot.x"],
"Log": [LogTransform],
"Double": [DoubleTransform, "Log.y"],
"Random Forest": ["Random Forest Regressor", "DateTime.x", "Double.y"]}),
lambda y: infer_feature_types(np.exp(y.to_series() / 2))),
(ComponentGraph({"Imputer": [Imputer],
"OneHot": [OneHotEncoder, "Imputer.x"],
"DateTime": [DateTimeFeaturizer, "OneHot.x"],
"Log": [LogTransform],
"Double": [DoubleTransform, "Log.y"],
"Double2": [DoubleTransform, "Double.y"],
"Random Forest": ["Random Forest Regressor", "DateTime.x", "Double2.y"]}),
lambda y: infer_feature_types(np.exp(y.to_series() / 4))),
(ComponentGraph({"Imputer": ['Imputer'],
"Double": [DoubleTransform],
"DateTime 1": ["DateTime Featurization Component", "Imputer"],
"ET": ["Extra Trees Regressor", "DateTime 1.x", "Double.y"],
"Double 2": [DoubleTransform],
"DateTime 2": ["DateTime Featurization Component", "Imputer"],
"Double 3": [DoubleTransform, "Double 2.y"],
"RandomForest": ["Random Forest Regressor", "DateTime 2.x", "Double 3.y"],
"DateTime 3": ["DateTime Featurization Component", "Imputer"],
"Double 4": [DoubleTransform],
"Linear": ["Linear Regressor", "DateTime 3.x", "Double 4.y"],
"Logistic Regression": ["Linear Regressor", "Linear", "RandomForest", "ET", 'Double 3.y']}),
lambda y: infer_feature_types(y.to_series() / 4)),
(ComponentGraph({"Imputer": [Imputer],
"OneHot": [OneHotEncoder, "Imputer.x"],
"DateTime": [DateTimeFeaturizer, "OneHot.x"],
"Log": [LogTransform],
"Double": [DoubleTransform, "Log.y"],
"Double2": [DoubleTransform, "Double.y"],
"Subset": [SubsetData, "DateTime.x", "Double2.y"],
"Random Forest": ["Random Forest Regressor", "Subset.x", "Subset.y"]}),
lambda y: infer_feature_types(np.exp(y.to_series() / 4))),
(ComponentGraph({"Imputer": [Imputer],
"Random Forest": ["Random Forest Regressor", "Imputer.x"]}),
lambda y: y),
(ComponentGraph({"Imputer": [Imputer],
"DateTime": [DateTimeFeaturizer, "Imputer.x"],
"OneHot": [OneHotEncoder, "DateTime.x"],
"Random Forest": ["Random Forest Regressor", "OneHot.x"]}),
lambda y: y),
(ComponentGraph({"Random Forest": ["Random Forest Regressor"]}),
lambda y: y),
(ComponentGraph({"Imputer": ['Imputer'],
"Double": [DoubleTransform],
"DateTime 1": ["DateTime Featurization Component", "Imputer"],
"ET": ["Extra Trees Regressor", "DateTime 1.x", "Double.y"],
"Double 2": [DoubleTransform],
"DateTime 2": ["DateTime Featurization Component", "Imputer"],
"Double 3": [DoubleTransform, "Double 2.y"],
"RandomForest": ["Random Forest Regressor", "DateTime 2.x", "Double 3.y"],
"DateTime 3": ["DateTime Featurization Component", "Imputer"],
"Double 4": [DoubleTransform],
"Linear": ["Linear Regressor", "DateTime 3.x", "Double 4.y"],
"Logistic Regression": ["Linear Regressor", "Linear", "RandomForest", "ET"]}),
lambda y: y)
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, my dude...nice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

holy potates

@freddyaboulton freddyaboulton force-pushed the 2046-inverse-transform branch from 180b2dc to 1dbecbf Compare May 18, 2021 15:04
Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments and questions just for my own clarification purposes. The tests look great!

raise ComponentNotYetFittedError(f'This {klass} is not fitted yet. You must fit {klass} before calling {method.__name__}.')
elif method.__name__ == 'inverse_transform':
return method(self, X, y)
return method(self, X)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't we return method(self, y)?

Copy link
Contributor Author

@freddyaboulton freddyaboulton May 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since inverse transform only takes one argument, the y is actually "called" X in this piece of code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow that's trippy... Any chance you can leave that as a comment for future notes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd).
Arguments:
y: (pd.Series, ww.DataTable): Final component features
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit pick (no colon after y):
y (pd.Series, ww.DataTable): Final component features

return method(self, X)
elif method.__name__ == 'predict':
return method(self, X, objective)
elif method.__name__ == 'inverse_transform':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above?


@pytest.mark.parametrize("component_list,answer", lists)
def test_from_list_with_target_transformers(component_list, answer):
assert ComponentGraph.from_list(component_list).component_dict == answer.component_dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are 👌

@freddyaboulton freddyaboulton force-pushed the 2046-inverse-transform branch from ed17111 to f54bc19 Compare May 18, 2021 17:17
* Added ``inverse_transform`` method to pipelines :pr:`2256``
* Fixes
* Changes
* Updated logging information in ``AutoMLSearch.__init__`` to clarify pipeline generation :pr:`2263`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthivNaresh I'm moving this from the 0.24.1 release to the future release!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton Sure thing!

Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent job with the tests, looks great!

* Added ``inverse_transform`` method to pipelines :pr:`2256``
* Fixes
* Changes
* Updated logging information in ``AutoMLSearch.__init__`` to clarify pipeline generation :pr:`2263`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton Sure thing!

# If there are target transformers present, connect them together and then pass the final output
# to the sampler (if present) or the final estimator
with_inverse_transform = list(filter(lambda tup: issubclass(tup[1], TargetTransformer), component_list))
without_inverse_transform = list(filter(lambda tup: not issubclass(tup[1], TargetTransformer), component_list))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@freddyaboulton freddyaboulton force-pushed the 2046-inverse-transform branch 3 times, most recently from 69dfa01 to 94b690d Compare May 19, 2021 14:32
Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nitpicky comments but otherwise, really cool stuff!!

Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd).
Arguments:
y: (pd.Series, ww.DataTable): Final component features
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, DataColumn?

class PolynomialDetrender(TargetTransformer):
"""Removes trends from time series by fitting a polynomial to the data."""
name = "Polynomial Detrender"
_defines_inverse_transform = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to get rid of this!

"""A component that transforms the target."""

@abstractmethod
def inverse_transform(self, y):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be useful in docstring here to specify arguments + what we expect to return for when people override :)

Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd).
Arguments:
y (pd.Series, ww.DataTable): Final component features
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataColumn? :O

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Comment on lines 904 to 978
component_graphs = [(ComponentGraph({"Imputer": [Imputer],
"Log": [LogTransform],
"Random Forest": ["Random Forest Regressor", "Imputer.x", "Log.y"]}),
lambda y: infer_feature_types(np.exp(y.to_series()))),
(ComponentGraph({"Imputer": [Imputer],
"Log": [LogTransform],
"Double": [DoubleTransform, "Log.y"],
"Random Forest": ["Random Forest Regressor", "Imputer.x", "Double.y"]}),
lambda y: infer_feature_types(np.exp(y.to_series() / 2))),
(ComponentGraph({"Imputer": [Imputer],
"Log": [LogTransform, "Imputer.x"],
"Double": [DoubleTransform, "Log.x", "Log.y"],
"Random Forest": ["Random Forest Regressor", "Double.x", "Double.y"]}),
lambda y: infer_feature_types(np.exp(y.to_series() / 2))),
(ComponentGraph({"Imputer": [Imputer],
"OneHot": [OneHotEncoder, "Imputer.x"],
"DateTime": [DateTimeFeaturizer, "OneHot.x"],
"Log": [LogTransform],
"Double": [DoubleTransform, "Log.y"],
"Random Forest": ["Random Forest Regressor", "DateTime.x", "Double.y"]}),
lambda y: infer_feature_types(np.exp(y.to_series() / 2))),
(ComponentGraph({"Imputer": [Imputer],
"OneHot": [OneHotEncoder, "Imputer.x"],
"DateTime": [DateTimeFeaturizer, "OneHot.x"],
"Log": [LogTransform],
"Double": [DoubleTransform, "Log.y"],
"Double2": [DoubleTransform, "Double.y"],
"Random Forest": ["Random Forest Regressor", "DateTime.x", "Double2.y"]}),
lambda y: infer_feature_types(np.exp(y.to_series() / 4))),
(ComponentGraph({"Imputer": ['Imputer'],
"Double": [DoubleTransform],
"DateTime 1": ["DateTime Featurization Component", "Imputer"],
"ET": ["Extra Trees Regressor", "DateTime 1.x", "Double.y"],
"Double 2": [DoubleTransform],
"DateTime 2": ["DateTime Featurization Component", "Imputer"],
"Double 3": [DoubleTransform, "Double 2.y"],
"RandomForest": ["Random Forest Regressor", "DateTime 2.x", "Double 3.y"],
"DateTime 3": ["DateTime Featurization Component", "Imputer"],
"Double 4": [DoubleTransform],
"Linear": ["Linear Regressor", "DateTime 3.x", "Double 4.y"],
"Logistic Regression": ["Linear Regressor", "Linear", "RandomForest", "ET", 'Double 3.y']}),
lambda y: infer_feature_types(y.to_series() / 4)),
(ComponentGraph({"Imputer": [Imputer],
"OneHot": [OneHotEncoder, "Imputer.x"],
"DateTime": [DateTimeFeaturizer, "OneHot.x"],
"Log": [LogTransform],
"Double": [DoubleTransform, "Log.y"],
"Double2": [DoubleTransform, "Double.y"],
"Subset": [SubsetData, "DateTime.x", "Double2.y"],
"Random Forest": ["Random Forest Regressor", "Subset.x", "Subset.y"]}),
lambda y: infer_feature_types(np.exp(y.to_series() / 4))),
(ComponentGraph({"Imputer": [Imputer],
"Random Forest": ["Random Forest Regressor", "Imputer.x"]}),
lambda y: y),
(ComponentGraph({"Imputer": [Imputer],
"DateTime": [DateTimeFeaturizer, "Imputer.x"],
"OneHot": [OneHotEncoder, "DateTime.x"],
"Random Forest": ["Random Forest Regressor", "OneHot.x"]}),
lambda y: y),
(ComponentGraph({"Random Forest": ["Random Forest Regressor"]}),
lambda y: y),
(ComponentGraph({"Imputer": ['Imputer'],
"Double": [DoubleTransform],
"DateTime 1": ["DateTime Featurization Component", "Imputer"],
"ET": ["Extra Trees Regressor", "DateTime 1.x", "Double.y"],
"Double 2": [DoubleTransform],
"DateTime 2": ["DateTime Featurization Component", "Imputer"],
"Double 3": [DoubleTransform, "Double 2.y"],
"RandomForest": ["Random Forest Regressor", "DateTime 2.x", "Double 3.y"],
"DateTime 3": ["DateTime Featurization Component", "Imputer"],
"Double 4": [DoubleTransform],
"Linear": ["Linear Regressor", "DateTime 3.x", "Double 4.y"],
"Logistic Regression": ["Linear Regressor", "Linear", "RandomForest", "ET"]}),
lambda y: y)
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

holy potates

@freddyaboulton freddyaboulton force-pushed the 2046-inverse-transform branch 3 times, most recently from b17e826 to df40b9a Compare May 25, 2021 16:11
@freddyaboulton freddyaboulton force-pushed the 2046-inverse-transform branch 2 times, most recently from 8705adf to d9e484d Compare June 7, 2021 16:59
@freddyaboulton freddyaboulton force-pushed the 2046-inverse-transform branch from d9e484d to a67528c Compare June 7, 2021 20:47
@freddyaboulton freddyaboulton merged commit 9c414d3 into main Jun 7, 2021
@freddyaboulton freddyaboulton deleted the 2046-inverse-transform branch June 7, 2021 21:53
This was referenced Jun 9, 2021
@chukarsten chukarsten mentioned this pull request Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PipelineBase inverse_transform method

6 participants