Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return transformed target in ComponentGraph.fit_features #2780

Merged

Conversation

freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton commented Sep 14, 2021

Pull Request Description

Fixes #2703

I realized it's easier to update fit_features as opposed to compute_estimator_features. It's also a smaller change this way.

This can go in the release in two weeks. Just opening up for review.


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Sep 14, 2021

Codecov Report

Merging #2780 (dd83f62) into main (2e6b048) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2780     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        298     298             
  Lines      27604   27631     +27     
=======================================
+ Hits       27536   27563     +27     
  Misses        68      68             
Impacted Files Coverage Δ
evalml/pipelines/component_graph.py 99.8% <100.0%> (+0.1%) ⬆️
evalml/pipelines/time_series_pipeline_base.py 99.0% <100.0%> (ø)
.../tests/pipeline_tests/test_time_series_pipeline.py 99.8% <100.0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e6b048...dd83f62. Read the comment docs.

Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Does this require performance testing and/or pre and post comparison for permutation importance?

},
)
pipeline.fit(X, y)
pd.testing.assert_series_equal(mock_to_check.call_args[0][1], y + 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting how this fails with Woodwork 0.6.0 because you're trying to add an int to a categorical but passes with 0.7.1 :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then how come min-dependencies git-test-other passes? That installs ww 0.6.0? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I have woodwork 0.6.0 locally and this passes 🤔 )

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you! 🚢

@@ -203,7 +203,7 @@ def fit_features(self, X, y):
y (pd.Series): The target training data of length [n_samples].

Returns:
pd.DataFrame: Transformed values.
Tuple: pd.DataFrame, pd.Series: Transformed features and target.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 😂

I'm not sure if this intended but it looks like sphinx / napoleon separates the return type only on the first colon: https://feature-labs-inc-evalml--2780.com.readthedocs.build/en/2780/autoapi/evalml/pipelines/index.html#evalml.pipelines.ComponentGraph.fit_features

If we do Tuple(pd.DataFrame, pd.Series) (unclear of exact syntax, might be square brackets heh), we could get it all on the return type line :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for letting me know!

},
)
pipeline.fit(X, y)
pd.testing.assert_series_equal(mock_to_check.call_args[0][1], y + 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I have woodwork 0.6.0 locally and this passes 🤔 )

@freddyaboulton freddyaboulton force-pushed the 2703-compute-estimator-features-transformed-target branch from 3e92131 to dd83f62 Compare September 15, 2021 21:18
@freddyaboulton freddyaboulton merged commit c980a4f into main Sep 15, 2021
@freddyaboulton freddyaboulton deleted the 2703-compute-estimator-features-transformed-target branch September 15, 2021 23:26
@freddyaboulton
Copy link
Contributor Author

@ParthivNaresh Forgot to reply to your comment. I don't think the perf tests are necessary in this case. This really only impacts time series pipelines with a detrender or log transformer and those are not added to the perf tests pipelines (because they don't meet the log normal check and because we don't automatically add detrenders to pipelines).

@chukarsten chukarsten mentioned this pull request Oct 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Have fit_features return the transformed target
3 participants