Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing fraud cost decision function #254

Merged
merged 7 commits into from
Dec 9, 2019
Merged

Fixing fraud cost decision function #254

merged 7 commits into from
Dec 9, 2019

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Dec 6, 2019

Closes #252

@codecov
Copy link

codecov bot commented Dec 6, 2019

Codecov Report

Merging #254 into master will decrease coverage by 0.25%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #254      +/-   ##
==========================================
- Coverage   97.32%   97.07%   -0.26%     
==========================================
  Files          95       95              
  Lines        2731     2731              
==========================================
- Hits         2658     2651       -7     
- Misses         73       80       +7
Impacted Files Coverage Δ
evalml/tests/automl_tests/test_autoclassifier.py 100% <100%> (ø) ⬆️
evalml/tests/automl_tests/test_autobase.py 100% <100%> (ø) ⬆️
...alml/tests/objective_tests/test_fraud_detection.py 100% <100%> (ø) ⬆️
evalml/tests/automl_tests/test_autoregressor.py 100% <100%> (ø) ⬆️
evalml/objectives/fraud_cost.py 100% <100%> (ø) ⬆️
evalml/tests/objective_tests/test_lead_scoring.py 100% <100%> (ø) ⬆️
evalml/models/auto_base.py 93.69% <0%> (-3.16%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6f23eea...4f75ff4. Read the comment docs.

@@ -46,7 +46,7 @@ def decision_function(self, y_predicted, extra_cols, threshold):
if not isinstance(y_predicted, pd.Series):
y_predicted = pd.Series(y_predicted)

transformed_probs = (y_predicted * extra_cols[self.amount_col])
transformed_probs = (y_predicted.values * extra_cols[self.amount_col])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add/change a test case that would have caught this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated all tests for Auto(*) so that the raise_error flag is set to True. Unsure if there's a better alternative (that doesn't require us to remember to set this?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thought would be that we could create a global raise_errors setting. perhaps set with an env variable. i like what you did for now though.

@angela97lin angela97lin requested a review from kmax12 December 9, 2019 20:31
@angela97lin angela97lin self-assigned this Dec 9, 2019
kmax12
kmax12 previously approved these changes Dec 9, 2019
Copy link
Contributor

@kmax12 kmax12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -46,7 +46,7 @@ def decision_function(self, y_predicted, extra_cols, threshold):
if not isinstance(y_predicted, pd.Series):
y_predicted = pd.Series(y_predicted)

transformed_probs = (y_predicted * extra_cols[self.amount_col])
transformed_probs = (y_predicted.values * extra_cols[self.amount_col])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thought would be that we could create a global raise_errors setting. perhaps set with an env variable. i like what you did for now though.

@angela97lin angela97lin merged commit 68ee2d0 into master Dec 9, 2019
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good to me :)

How did we end up catching this bug? And how did we not catch it sooner? I ask purely because perhaps the answer would point out a place where we're missing test coverage.

@angela97lin
Copy link
Contributor Author

@dsherry I caught the bug purely by coincidence when I was trying to figure out why catboost wasn't working on circleci and was clicking around the docs. We probably didn't catch it earlier because raise_errors for our calls to .fit() defaults to False so it silently failed, so part of this PR was to set our raise_errors flag to True :)

@dsherry
Copy link
Contributor

dsherry commented Dec 10, 2019

Oh, oops, I missed all of those changes! I must've been looking at a single commit. Cool. I do have more comments but I'll create a discussion ticket for them.

@angela97lin angela97lin mentioned this pull request Dec 16, 2019
@angela97lin angela97lin deleted the fraud branch April 17, 2020 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

errors while training model for fraudcost
3 participants