Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Fraud Cost #2450

Merged
merged 8 commits into from
Jul 7, 2021
Merged

Fix Fraud Cost #2450

merged 8 commits into from
Jul 7, 2021

Conversation

bchen1116
Copy link
Contributor

@bchen1116 bchen1116 commented Jun 28, 2021

fix #2318

In this PR, I revert the minimization objective back to Golden. I did this because I found that Brent and Bounded methods didn't perform well with the optimization process. While Brent and Bounded are preferred and suggested by scipy due to the speed increase they have over the Golden method, I found that these two weren't able to find the threshold properly when I was testing fraud. As we can see from the previous perf test, these methods performed similarly here, although Bounded did take longer to run compared to Golden.

Fraud Doc:
image
Lead scoring doc build passes without triggering asserts!

Quick examples of thresholding here

Perf test here

@codecov
Copy link

codecov bot commented Jun 28, 2021

Codecov Report

Merging #2450 (604f768) into main (d5b8602) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2450     +/-   ##
=======================================
- Coverage   99.7%   99.7%   -0.0%     
=======================================
  Files        283     283             
  Lines      25568   25566      -2     
=======================================
- Hits       25466   25464      -2     
  Misses       102     102             
Impacted Files Coverage Δ
...alml/objectives/binary_classification_objective.py 100.0% <100.0%> (ø)
evalml/objectives/fraud_cost.py 100.0% <100.0%> (ø)
...alml/tests/objective_tests/test_fraud_detection.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d5b8602...604f768. Read the comment docs.


Arguments:
ypred_proba (pd.Series): Predicted probablities
threshold (float): Dollar threshold to determine if transaction is fraud
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This threshold parameter didn't really make sense to me. For our fraud prediction models, we are predicting whether or not a transaction is fraud, so the predicted_proba output should be a probability on whether the transaction was fraud or not. Multiplying this probability by the amount spent shouldn't be helpful in determine if there was fraud or not, so I don't think we should threshold based on some spent amount.

I removed this so that we use the probabilities like our other objectives. Let me know if I'm misinterpreting!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the idea was to label as fraud the transactions whose expected loss due to fraud was larger than some threshold.

Maybe this would be helpful in prioritizing larger transactions?

I'm just thinking out loud. I think this is ok from my point of view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that seemed reasonable. This also definitely didn't work well with the Bounded method, since we bound search from [0, 1], but it also didn't work well with Brent and Golden methods since those resulting thresholds generally stayed <2, and oftentimes went negative!

It makes sense to prioritize larger transactions, but that should hopefully be something the model can find rather than relying on the objective, imo

@@ -90,7 +73,9 @@ def objective_function(self, y_true, y_predicted, X, sample_weight=None):
# calculate money lost from fees
false_positives = (~y_true & y_predicted) * interchange_cost

loss = false_negatives.sum() + false_positives.sum()
# add a penalty if we output naive predictions
all_one_prediction_cost = (2 - len(set(y_predicted))) * fraud_cost.sum()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a penalty to naive predictions. Without this penalty, we could incur small fraud costs if we predict every transaction as fraudulent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this "small fraud cost" is currently not captured by the objective, right? I guess I'm wondering why this isn't currently captured by how the objective is computed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is currently captured, but that would only be the cost coming from false_positive cases, which small, especially when interchange_cost is small. I wanted to add a big penalization to prevent us from choosing a naive estimator.

@@ -23,7 +23,7 @@ def __init__(
retry_percentage (float): What percentage of customers that will retry a transaction if it
is declined. Between 0 and 1. Defaults to .5

interchange_fee (float): How much of each successful transaction you can collect.
interchange_fee (float): How much of each successful transaction you pay.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interchange fees are fees that a merchant pays when a customer uses their card

@bchen1116 bchen1116 marked this pull request as ready for review June 28, 2021 21:50
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 I think this looks good! I have a question on the penalty you're adding.

@@ -43,7 +43,10 @@ def cost(threshold):
cost = self.objective_function(y_true, y_predicted, X=X)
return -cost if self.greater_is_better else cost

optimal = minimize_scalar(cost, bounds=(0, 1), method="Bounded")
optimal = minimize_scalar(
cost, bracket=(0, 1), method="Golden", options={"maxiter": 250}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if you've shared this with us before, but do you have plots/data for the claim - "I did this because I found that Brent and Bounded methods didn't perform well with the optimization process." ? If you shared it before, it definitely got lost in my inbox 🙈 I'm just curious what the process was to arrive at this conclusion!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I was referencing the old perf test I ran when I switched from Golden to Bounded. I saw the perforamcne was the same, but the time took a little longer with these new minimization methods. I assumed the reverse would hold true here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general rule of thumb - I would say optimization methods and their performance on different objective functions and datasets may vary depending on the characteristics of the dataset and luck as well (there's generally some randomness). Might be a good idea to run performance tests just to make sure but I don't think performance would be greatly impacted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be cool to play around with maxiter as well! My hunch is that 250 might be a little low but thats just a guess 😄


Arguments:
ypred_proba (pd.Series): Predicted probablities
threshold (float): Dollar threshold to determine if transaction is fraud
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the idea was to label as fraud the transactions whose expected loss due to fraud was larger than some threshold.

Maybe this would be helpful in prioritizing larger transactions?

I'm just thinking out loud. I think this is ok from my point of view.

@@ -90,7 +73,9 @@ def objective_function(self, y_true, y_predicted, X, sample_weight=None):
# calculate money lost from fees
false_positives = (~y_true & y_predicted) * interchange_cost

loss = false_negatives.sum() + false_positives.sum()
# add a penalty if we output naive predictions
all_one_prediction_cost = (2 - len(set(y_predicted))) * fraud_cost.sum()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this "small fraud cost" is currently not captured by the objective, right? I guess I'm wondering why this isn't currently captured by how the objective is computed.

@bchen1116
Copy link
Contributor Author

@jeremyliweishih @freddyaboulton @chukarsten perf test has been updated using numiter=250.

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Nice work!

@bchen1116 bchen1116 merged commit 2ce178e into main Jul 7, 2021
@chukarsten chukarsten mentioned this pull request Jul 22, 2021
@freddyaboulton freddyaboulton deleted the bc_2318_fraud branch May 13, 2022 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix Fraud Objective
4 participants