<a href="https://colab.research.google.com/github/Ron-levi1/Social-Media-Advertisement-Performance/blob/main/part_7_Model_Evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
import warnings; warnings.filterwarnings("ignore")
import pandas as pd
import numpy as np
from google.colab import drive
drive.mount('/content/drive')

import joblib
from sklearn.metrics import (
    accuracy_score, precision_recall_fscore_support,
    f1_score, roc_auc_score, log_loss, confusion_matrix, classification_report
)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


#### Load the best tuned XGB model (saved after Fine-Tuning) and the DEV/TEST splits from Drive. DEV will be used to tune the decision threshold; TEST is held out for the final performance report.

In [6]:
est_model = joblib.load('/content/drive/MyDrive/xgb_best_model.pkl')
best_model = est_model

X_dev = pd.read_csv('/content/drive/MyDrive/X_dev.csv')
y_dev = pd.read_csv('/content/drive/MyDrive/y_dev.csv')
X_test = pd.read_csv('/content/drive/MyDrive/X_test.csv')
y_test = pd.read_csv('/content/drive/MyDrive/y_test.csv')

y_dev = y_dev.squeeze()
y_test = y_test.squeeze()

#### Use probabilities (predict_proba) to search thresholds between 0.05â€“0.95 and pick the value that maximizes F1 on DEV. This avoids the default 0.5 cutoff, which can be sub-optimal with imbalanced data.

In [7]:
proba_dev = best_model.predict_proba(X_dev)[:, 1]
ths = np.linspace(0.05, 0.95, 91)

best_th, best_f1 = 0.5, -1
for t in ths:
    y_hat = (proba_dev >= t).astype(int)
    f1 = f1_score(y_dev, y_hat)
    if f1 > best_f1:
        best_f1, best_th = f1, t

print(f"Chosen threshold on DEV: {best_th:.3f} (DEV F1={best_f1:.3f})")

Chosen threshold on DEV: 0.050 (DEV F1=0.252)


#### Apply the chosen threshold to TEST predictions and compute Accuracy, Precision, Recall, F1 on labels, ROC-AUC on probabilities, and Log-loss. Print the confusion matrix and a classification report to summarize performance on truly unseen data.

In [8]:
proba_test = best_model.predict_proba(X_test)[:, 1]
y_test_pred = (proba_test >= best_th).astype(int)

prec, rec, f1, _ = precision_recall_fscore_support(y_test, y_test_pred, average='binary', zero_division=0)
acc = accuracy_score(y_test, y_test_pred)
auc_prob = roc_auc_score(y_test, proba_test)
ll = log_loss(y_test, np.vstack([1 - proba_test, proba_test]).T)
cm = confusion_matrix(y_test, y_test_pred)

print("\n=== TEST Metrics (thresholded) ===")
print(f"Accuracy:  {acc:.4f}")
print(f"Precision: {prec:.4f}")
print(f"Recall:    {rec:.4f}")
print(f"F1-score:  {f1:.4f}")
print(f"ROC-AUC:   {auc_prob:.4f}")
print(f"Log-loss:  {ll:.4f}")

print("\nConfusion Matrix (TEST):")
print(cm)

print("\nClassification Report (TEST):")
print(classification_report(y_test, y_test_pred, zero_division=0))


=== TEST Metrics (thresholded) ===
Accuracy:  0.2977
Precision: 0.1506
Recall:    0.7901
F1-score:  0.2529
ROC-AUC:   0.5020
Log-loss:  0.5416

Confusion Matrix (TEST):
[[10728 40244]
 [ 1895  7133]]

Classification Report (TEST):
              precision    recall  f1-score   support

           0       0.85      0.21      0.34     50972
           1       0.15      0.79      0.25      9028

    accuracy                           0.30     60000
   macro avg       0.50      0.50      0.30     60000
weighted avg       0.74      0.30      0.32     60000



###### The final XGBoost model achieved a recall of 0.79 on the test set, meaning it successfully identified most positive engagement events. However, precision dropped to 0.15, indicating a high number of false positives. The overall F1-score (0.25) and ROC-AUC (0.50) suggest that while the model captures many positives, it lacks discriminative power and generalization. The low optimal threshold (0.05) confirms a strong imbalance in the data and potential overfitting to the oversampled training distribution.

#### Store the selected DEV threshold and all final TEST metrics in a CSV on Drive so they can be referenced in your report or reused later.

In [10]:
final_results = {
    "Threshold_DEV": best_th,
    "Accuracy_TEST": acc,
    "Precision_TEST": prec,
    "Recall_TEST": rec,
    "F1_TEST": f1,
    "AUC_prob_TEST": auc_prob,
    "LogLoss_TEST": ll
}

pd.DataFrame([final_results]).to_csv('/content/drive/MyDrive/final_xgb_results.csv', index=False)
print("\n Final metrics saved to Drive: final_xgb_results.csv")


 Final metrics saved to Drive: final_xgb_results.csv
