Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve ww schema in partial dependence #2929

Merged
merged 4 commits into from
Oct 18, 2021
Merged

Conversation

freddyaboulton
Copy link
Contributor

Pull Request Description

Fixes #2928


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Oct 15, 2021

Codecov Report

Merging #2929 (74d87f9) into main (7d97020) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2929     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        302     302             
  Lines      28396   28412     +16     
=======================================
+ Hits       28303   28319     +16     
  Misses        93      93             
Impacted Files Coverage Δ
evalml/model_understanding/_partial_dependence.py 98.8% <100.0%> (+0.1%) ⬆️
...del_understanding_tests/test_partial_dependence.py 99.3% <100.0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7d97020...74d87f9. Read the comment docs.

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, Freddy! Just a quick question about the copy in partial dependence, but nothing blocking.

@@ -142,10 +143,15 @@ def _partial_dependence_calculation(pipeline, grid, features, X):
else:
prediction_method = pipeline.predict_proba

X_eval = X.copy()
X_eval = X.ww.copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we copy this first rather than just build a new df by concatting the series? I've seen this a few times but never had the courage to ask :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The partial dependence computation requires us to fill a given feature will only one value while keeping all other features the same. To not override the user's original data, I think we need a copy. Since we need all the features to be present in the data, I think concatting all the features will be equivalent to a copy + modify operation (and result in the same memory).

Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid, LGTM!

@freddyaboulton freddyaboulton merged commit d83e074 into main Oct 18, 2021
@freddyaboulton freddyaboulton deleted the preserve-ww-in-part-dep branch October 18, 2021 17:08
freddyaboulton added a commit to freddyaboulton/evalml that referenced this pull request Oct 18, 2021
* Preserve ww schema

* Fix index

* Add to release notes
@chukarsten chukarsten mentioned this pull request Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Partial dependence fails when categorical column gets typed as natural language by user
3 participants