# Machine Learning Economics
Or, how to evaluate the costs and benefits of a machine learning project propsal in a business setting. There are many types of machine learning that you might want to evaluate; unsupervised clustering, recommendation, anomalty detection, forecasting, etc. Here, we'll focus on classification.

You will have to assign a quantitative value for each of your model outputs. For classifcation, those are:
- False Positives
- False Negatives
- True Positives
- True Negative

A false positive is when your model incorrectly classifies an object as falling into the positive class. The ground truth is negative, but your model is classifying it as a positive. Think of an example from disease detection. False positives are highly preferable, because the impact of failing to an instance of the disease is deleterious to human life. Think, "overzealous detection."

A false negative is the reverse. Your model fails to detect a positive case. You are classifying the event as negative, but the ground truth is actually positive. Think, "failure to detect."

A true positive is when your model correctly predicts a positive case.

A true negative is when your model correctly predicts a negative case.

## Business Case
Now, let's decide on the business case. This is a real scenario from the physical world, and you are going to attempt to build a model that will literally resemble the physical universe. That's certainly a tough task, but with our sophisticated machine learning techniques we are discovering ways of doing this reliably.

Let's consider credit card fraud. Every day, hundreds of thousands of credit card transactions are placed, and only a tiny fraction of those are fraudulent. We need to design a machine learning system capable of detecting that fraud, and we want to make sure that our end result is adding quantitative business value, not detracting from it.

### Model Outcome Valuation

The true positive in fraudulent credit card transactions is INCREDIBLY high value. Every true positive means you are automatically capturing a potentially fraudlent transaction, saving your customer and yourself a tremendous amount of labor and risk.

Let's assume that, over the course of a year, we're seeing approximate transaction amounts in the following values:

- 25 dollars for average legitimate transaction amount
- 300 dollars for average fraudulent transaction amount

For a start, let's just set the value of our model outputs to those average transaction amounts. In reality the cost and benefits are larger, because it's saving you hours of manual labor to handle these cases.  We'll consider fraud as the positive case, and normal transactions as the negative case.

In [3]:
true_positive = 300
true_negative = 25

Now, let's look at the false indicators. 

In [4]:
# overzealous detection
# cost to customer and bank of confirming a legitimate transaction
false_positive = 50 

In [5]:
# failure to detect
# cost to customer and bank of failing to detect a fraudulent transaction
false_negative = 500

Given these rough, high-level estimates, let's see what our total year costs are going to be if we assume average transaction amounts of the following.
- 100,000 transactions per day
- 5,000 fraudulent transaction per day

In [6]:
total_normal_transactions = 95000 * 365
total_fraudulent_transactions = 5000 * 365

## Simulations
Now, let's run a few very lightweight tests to understand the expected costs and values of our machine learning systems.

In [7]:
# if we catch all fraud
expected_value_for_perfect_recall = total_fraudulent_transactions * true_positive

print ("Expected value is {}".format(expected_value_for_perfect_recall))
print ("Not bad! That's over $547 million. Over half a billion dollars.")

Expected value is 547500000
Not bad! That's over $547 million. Over half a billion dollars.


In [8]:
# if we don't catch any fraud
expected_vallue_for_zero_recall = total_fraudulent_transactions * false_negative
print ("Expected value is {}".format(expected_vallue_for_zero_recall))
print ("Yikes, that's over $900 million in loss, almost a full billion dollars.")

Expected value is 912500000
Yikes, that's over $900 million in loss, almost a full billion dollars.


In reality, the model output is going to be somewhere in the middle. Let's assume that after you train your model and tune it in SageMaker, your best recall is 90%.

In [10]:
print (expected_value_for_perfect_recall * 0.9)

print ("Ok! So we're still saving almost $500M in aggregation.")

492750000.0
Ok! So we're still saving almost $500M in aggregation.


But there are costs. If our model is missing 10% of the anomalies, that means 10% of the time we're totally failing to detect these.

In [12]:
expected_cost = false_negative * total_fraudulent_transactions * .1
print (-expected_cost)

-91250000.0


In [13]:
expected_value_fraud_capture = expected_value_for_perfect_recall * 0.9 - expected_cost

This means, for the positive class, we're driving revenue over $400 million. Now let's take a look at the negative class, or the legitimate transactions.

In [14]:
expected_value_for_perfect_precision = total_normal_transactions * true_negative
print (expected_value_for_perfect_precision)

866875000


Great! For correctly classifying all of the normal transactions, we're providing almost $900 million in value. In reality we will probably incorrectly classify at least some of those, so let's reset our precision to 90%.

In [15]:
expected_value_for_normal = total_normal_transactions * true_negative * 0.9
print (expected_value_for_normal)
print ("With 90% precision, we're anticipating providing over $700 million in value!")

780187500.0
With 90% precision, we're anticipating providing over $700 million in value!


There are also costs. For the normal case, this means the cost of the false positive. Let's assume we have a 10% false positive rate.

In [17]:
expected_cost_false_positive = total_normal_transactions * false_positive * 0.1
print (-expected_cost_false_positive)

-173375000.0


That looks pricy too! Total costs for the false positive at a rate of 10% brings us to just over $150 million. Let's sum up the expected value for handling normal cases.

In [18]:
total_expected_value_normal = expected_value_for_normal - expected_cost_false_positive
print (total_expected_value_normal)

606812500.0


Not bad! Our expected value is positive, which is great. That's over $600 million for the normal case. Let's add up the expected value for both the positive and the negative classes, to see where our project stands as a whole.

In [19]:
expected_value_total_project = expected_value_for_normal + expected_value_fraud_capture
print (expected_value_total_project)
print ("This is great!! We've hit over $1 billion here in anticipated value.")

1181687500.0
This is great!! We've hit over $1 billion here in anticipated value.
