-
Notifications
You must be signed in to change notification settings - Fork 91
Include target name for pipeline predict #1578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1578 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 240 240
Lines 18409 18476 +67
=========================================
+ Hits 18401 18468 +67
Misses 8 8
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin I think this looks good! I think we should add input_target_name
to the api ref (and input_feature_names
as well?) because it's currently not documented anywhere.
@@ -271,6 +271,13 @@ def test_pad_with_nans(data, num_to_pad, expected): | |||
_check_equality(padded, expected) | |||
|
|||
|
|||
def test_pad_with_nans_with_series_name(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this test. I think this gives us coverage in the case where gap/max_delay != 0
@@ -57,6 +57,7 @@ def fit(self, X, y): | |||
X, y = self._convert_to_woodwork(X, y) | |||
X = _convert_woodwork_types_wrapper(X.to_dataframe()) | |||
y = _convert_woodwork_types_wrapper(y.to_series()) | |||
self.input_target_name = y.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've been looking at this for a bit now and I've been stressing over why grand-child class TimeSeriesClassificationPipeline and TimeSeriesRegressionPipeline don't just inherit the renaming code from the grand-parent PipelineBase class. I guess it's because the grand-children don't invoke a private ._fit function, but I might be missing something. Regardless, I think this code seems to accomplish what it set out to accomplish but I also think it might open up a discussion on bringing the TimeSeries pipes into compliance with the existing paradigm or deciding on.
Just to throw something out there, does it make sense for ._fit() to do it's own special fitting, be it with the component_graph, or the encoding and computing features and fitting and then have .fit() in the base class capture the name of the target and be responsible for the naming? It seems that the behavior of .fit() should have the commonalities between the derivative classes while ._fit() should contain variations relative to that specific pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chukarsten Yeah, I hear ya, and I think your intuition is right in that our pipeline code is a bit messy. Just deleted some stuff which hopefully makes this more clear but to summarize, we currently have:
- `fit() is an abstract method of PipelineBase
_fit()
used by Classification/RegressionPipeline classes, where I addedinput_feature_names
_compute_features_during_fit()
is used by the TimeSeriesPipeline classes, where I also addedinput_feature_names
Closes #1493