<a href="https://colab.research.google.com/github/Syedzamin07/cost-optimized-credit-card-fraud-detection/blob/main/03_baseline_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Baseline Decision Rules

**Purpose:**
To establish a financial benchmark (the "floor") by evaluating naive, non-ML strategies.

**The Question:**
*"How much money do we lose if we don't use Machine Learning at all?"*

**Strategies to Test:**
1.  **Allow All Transactions:** The default state. We trust everyone.
2.  **Block All Transactions:** Extreme risk aversion. We trust no one.

**Success Criteria for Future Models:**
Any Machine Learning model we build in later steps **must** result in a lower total cost than the best baseline found here.

In [None]:
import pandas as pd
import numpy as np

# Load Data
url = "https://storage.googleapis.com/download.tensorflow.org/data/creditcard.csv"
try:
    df = pd.read_csv(url)
    print(f"‚úÖ Data Loaded: {df.shape[0]:,} transactions")
except Exception as e:
    print(f"‚ùå Error loading data: {e}")

# Define the Target (0 = Legitimate, 1 = Fraud)
y_true = df['Class']

‚úÖ Data Loaded: 284,807 transactions


In [None]:
# Define the Business Cost Function
# Derived from decision_framework.md

def calculate_financial_loss(y_true, y_pred):
    """
    Calculates total financial loss based on the business constraints.
    """
    # Confusion Matrix Components
    # We can calculate these manually for speed/clarity without sklearn here
    fp = np.sum((y_pred == 1) & (y_true == 0)) # Predicted Fraud, Actually Legit
    fn = np.sum((y_pred == 0) & (y_true == 1)) # Predicted Legit, Actually Fraud

    # Costs
    cost_missed_fraud = 100 # $100 per event
    cost_blocked_user = 5   # $5 per event

    total_loss = (fn * cost_missed_fraud) + (fp * cost_blocked_user)

    return total_loss, fn, fp

In [None]:
# Strategy 1: The "Do Nothing" Approach (Allow All)
# We predict '0' (Legitimate) for every single transaction.

y_pred_allow_all = np.zeros_like(y_true)

loss_allow, fn_allow, fp_allow = calculate_financial_loss(y_true, y_pred_allow_all)

print(f"--- STRATEGY 1: ALLOW ALL TRANSACTIONS ---")
print(f"Missed Fraud (FN):   {fn_allow:,} (Cost: ${fn_allow * 100:,.0f})")
print(f"Blocked Users (FP):  {fp_allow} (Cost: $0)")
print(f"------------------------------------------")
print(f"TOTAL FINANCIAL LOSS: ${loss_allow:,.2f}")

--- STRATEGY 1: ALLOW ALL TRANSACTIONS ---
Missed Fraud (FN):   492 (Cost: $49,200)
Blocked Users (FP):  0 (Cost: $0)
------------------------------------------
TOTAL FINANCIAL LOSS: $49,200.00


### üìâ Conclusion: The Benchmark to Beat

**The Numbers:**
- **"Allow All" Loss:** ~&#36;49,200  
  (Calculation: 492 frauds √ó &#36;100)
- **"Block All" Loss:** ~&#36;1.4 Million  
  (Calculation: 284k users √ó &#36;5)

---

**The Insight:** The **"Allow All"** strategy is significantly cheaper than blocking everyone because fraud is rare. Therefore, **&#36;49,200** is our **Baseline Loss**.

**The Goal:** Our Machine Learning pipeline must deliver a Total Financial Loss **lower than &#36;49,200**. If our complex XGBoost model costs the company **&#36;50,000** (due to false alarms), it is **worse than doing nothing**.