Skip to content

Include target name for pipeline predict #1578

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jan 7, 2021
Merged

Conversation

angela97lin
Copy link
Contributor

Closes #1493

@angela97lin angela97lin self-assigned this Dec 17, 2020
@codecov
Copy link

codecov bot commented Dec 18, 2020

Codecov Report

Merging #1578 (917344e) into main (417091c) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1578     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         240      240             
  Lines       18409    18476     +67     
=========================================
+ Hits        18401    18468     +67     
  Misses          8        8             
Impacted Files Coverage Δ
evalml/pipelines/classification_pipeline.py 100.0% <100.0%> (ø)
evalml/pipelines/pipeline_base.py 100.0% <100.0%> (ø)
.../pipelines/time_series_classification_pipelines.py 100.0% <100.0%> (ø)
...valml/pipelines/time_series_regression_pipeline.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 100.0% <100.0%> (ø)
evalml/tests/pipeline_tests/test_pipelines.py 100.0% <100.0%> (ø)
evalml/tests/utils_tests/test_gen_utils.py 100.0% <100.0%> (ø)
evalml/utils/gen_utils.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 417091c...917344e. Read the comment docs.

@angela97lin angela97lin marked this pull request as ready for review January 5, 2021 19:28
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin I think this looks good! I think we should add input_target_name to the api ref (and input_feature_names as well?) because it's currently not documented anywhere.

@@ -271,6 +271,13 @@ def test_pad_with_nans(data, num_to_pad, expected):
_check_equality(padded, expected)


def test_pad_with_nans_with_series_name():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this test. I think this gives us coverage in the case where gap/max_delay != 0

@@ -57,6 +57,7 @@ def fit(self, X, y):
X, y = self._convert_to_woodwork(X, y)
X = _convert_woodwork_types_wrapper(X.to_dataframe())
y = _convert_woodwork_types_wrapper(y.to_series())
self.input_target_name = y.name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've been looking at this for a bit now and I've been stressing over why grand-child class TimeSeriesClassificationPipeline and TimeSeriesRegressionPipeline don't just inherit the renaming code from the grand-parent PipelineBase class. I guess it's because the grand-children don't invoke a private ._fit function, but I might be missing something. Regardless, I think this code seems to accomplish what it set out to accomplish but I also think it might open up a discussion on bringing the TimeSeries pipes into compliance with the existing paradigm or deciding on.

Just to throw something out there, does it make sense for ._fit() to do it's own special fitting, be it with the component_graph, or the encoding and computing features and fitting and then have .fit() in the base class capture the name of the target and be responsible for the naming? It seems that the behavior of .fit() should have the commonalities between the derivative classes while ._fit() should contain variations relative to that specific pipeline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chukarsten Yeah, I hear ya, and I think your intuition is right in that our pipeline code is a bit messy. Just deleted some stuff which hopefully makes this more clear but to summarize, we currently have:

  • `fit() is an abstract method of PipelineBase
  • _fit() used by Classification/RegressionPipeline classes, where I added input_feature_names
  • _compute_features_during_fit() is used by the TimeSeriesPipeline classes, where I also added input_feature_names

@angela97lin angela97lin merged commit 77eb1c2 into main Jan 7, 2021
@angela97lin angela97lin deleted the 1493_include_target_name branch January 7, 2021 00:52
@bchen1116 bchen1116 mentioned this pull request Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pipeline/Estimator predict: include the target name
3 participants