## Key Performance Indicator

Calculating the expected profit/loss based on the different models and comparing them to the baseline.

## Calculator

**Number of total customers that received an offer**

- Since there is an 80 / 20 train test split the number will represent 20% of the total for model evaluation
    

In [None]:
select count(distinct customer_id) from history;

In [None]:
total_num_customers = 160057
num_customers = int(total_num_customers * 0.2)
num_customers

**Calculating baseline**

Calculating a baseline percentage of returning customers to compare against ML models. The baseline percentage will simply be number of returning customers divided by the total number of customers that received an offer.


In [None]:
select count(*) from history
where repeater = True;

In [None]:
num_repeaters = 43438
baseline_percentage = num_repeaters / total_num_customers
print(f"The Baseline Percentage of returners: {round(baseline_percentage * 100,2)}%")

In [None]:
print(f"Number of true positives: {43438 * 0.2}")
print(f"Number of false positives: {(32011)-(43438 * 0.2)}")

**Calculating the value of a returning customer**

Basing this on the average spend per shop of all customers and the avereage number of repeat trips after an offer. 

In [None]:
select avg(checkout_amount) from checkouts;

In [None]:
select avg(repeat_trips) from history
where repeater = True;

In [None]:
val_returning_customer = 59.2402 * 2.4184
print(f"The value of a returning customer is: {val_returning_customer}")

## Profit calculator


The number of true_positive customeers times the value of a returning customers minus the number of positively predicted customers times the cost per offer. In simpler terms, how much did you earn from returning customers - the cost of sending out offers.

In [None]:


def profit(tp, fp, tn, fn, cost_per_offer, val_returning_customer, num_customers):
    """
    tp, fp, tn, fn, etc. on test set
    - calcualte %
    - use on full dataset = num_customers

    tp_customers = %tp * num customers
    fp_customers = %fp * num customers
    
    profit = (tp_customers * val_of_tp) - ((fp_customers + tp_customers)) * cost_per_offer)
    """
    total_test_set = fp + tn + fn + tp
    tp_percentage = tp / total_test_set
    p_percentage = (tp + fp) / total_test_set
    
    tp_customers = num_customers * tp_percentage
    p_customers = num_customers * p_percentage

    profit = round((tp_customers * val_returning_customer) - (p_customers * cost_per_offer), 2)
    return profit
    
    

In [None]:
cost_per_offer = [0.1, 0.25, 0.5, 0.75, 1, 5, 10, 25, 50]
num_customers = 160057
val_returning_customer = 59.2402 * 2.4184
    
def cost_for_model(tp, fp, tn, fn):
    
    profit_per_cost = []
    for cost in cost_per_offer:
        profit_cost = profit(tp, fp, tn, fn, cost, val_returning_customer, num_customers)
        print(profit_cost)
        profit_per_cost.append(profit_cost)
    return profit_per_cost

In [None]:
# Precision optimal model
tp = 1559
tn = 1767
fp = 21524
fn = 7162

precision_profit = cost_for_model(tp, fp, tn, fn)

In [None]:
# Recall optimal model (threshold)

tp = 8343
tn = 1786
fp = 21505
fn = 378

recall_th_profit = cost_for_model(tp, fp, tn, fn)

In [None]:
# Baseline 

tp = 8687
tn = 0
fp = 23324
fn = 0

baseline_profit = cost_for_model(tp, fp, tn, fn)

In [None]:
# Accuracy optimal model
tp = 1549
tn = 21532
fp = 1759
fn = 7172

accuracy_profit = cost_for_model(tp, fp, tn, fn)

In [None]:
# F1 Score optimal model

tp = 2561
tn = 18338
fp = 4953
fn = 6160

f1_score_profit = cost_for_model(tp, fp, tn, fn)

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10,6))  # Increased figure size for better readability
plt.plot(cost_per_offer, precision_profit, label='Precision')
plt.plot(cost_per_offer, accuracy_profit, label='Accuracy')
plt.plot(cost_per_offer, recall_th_profit, label='Recall Threshold')
plt.plot(cost_per_offer, f1_score_profit, label='F1 Score')
plt.plot(cost_per_offer, baseline_profit, label='Baseline')

plt.xlabel('Cost per Offer')
plt.ylabel('Profit / Loss')
plt.title('Profit Comparison')
plt.legend()
plt.grid(True)  # Added grid for easier reading
plt.tight_layout()  # Adjusts plot to ensure all labels are visible

plt.gca().get_yaxis().set_major_formatter(
    plt.FuncFormatter(lambda x, p: format(int(x), ',')))

plt.show()

## Viewing results

In [None]:
use database ml;
use schema model_results_schema;

select * from model_performance
order by accuracy desc;

In [None]:
select * from confusion_matrix
where id = 109;