Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find rows near the decision boundary #2908

Merged
merged 17 commits into from Oct 19, 2021
Merged

Find rows near the decision boundary #2908

merged 17 commits into from Oct 19, 2021

Conversation

bchen1116
Copy link
Contributor

@bchen1116 bchen1116 commented Oct 14, 2021

Address this issue

Design doc here

Walkthrough use in doc here. The link in this walkthrough should work once we publish the doc to main

API doc here

@bchen1116 bchen1116 self-assigned this Oct 14, 2021
@codecov
Copy link

codecov bot commented Oct 14, 2021

Codecov Report

Merging #2908 (db16985) into main (2767e33) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2908     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        302     302             
  Lines      28433   28587    +154     
=======================================
+ Hits       28340   28494    +154     
  Misses        93      93             
Impacted Files Coverage Δ
evalml/pipelines/utils.py 99.5% <100.0%> (+0.1%) ⬆️
evalml/tests/pipeline_tests/test_pipeline_utils.py 99.7% <100.0%> (+0.3%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2767e33...db16985. Read the comment docs.

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 Thanks for this! I left some minor comments for improving the implementation and tests. I am only "blocking" because I would like to discuss whether "epsilon" is a more useful parameter than "num_rows". I think it's confusing we return all rows by default.

docs/source/user_guide/pipelines.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/pipelines.ipynb Outdated Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
evalml/pipelines/utils.py Show resolved Hide resolved
evalml/tests/pipeline_tests/test_pipeline_utils.py Outdated Show resolved Hide resolved
)
assert all(vals.values == expected_vals)

if types == "all":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this check redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just double-checking that we can exclude passing in y and still get the same results

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining!

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super thorough testing. Some of the tests took me a minute to realize what exactly was being tested. Perhaps this could be remedied with maybe more specific names for the tests? I feel like test_rows_of_interest_threshold was the one that I spent the most time on. Anyway, nothing blocking. Just food for thought.

evalml/pipelines/utils.py Show resolved Hide resolved

if threshold is not None and (threshold < 0 or threshold > 1):
raise ValueError(
"Provided threshold {} must be between [0, 1]".format(threshold)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: hard brackets is for inclusive, might want to switch to (0, 1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chukarsten I was thinking we should allow the user to set the threshold as 0 or 1 if they wanted to get the rows closest to those values. What do you think?

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making the changes @bchen1116 ! This looks good to me!

evalml/pipelines/utils.py Outdated Show resolved Hide resolved
@bchen1116 bchen1116 merged commit 6f8d37a into main Oct 19, 2021
@chukarsten chukarsten mentioned this pull request Oct 27, 2021
@freddyaboulton freddyaboulton deleted the bc_decision_boundary branch May 13, 2022 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants