# Fraud Detection Model Testing and Evaluation

This notebook loads a trained XGBoost model to predict fraudulent ad clicks on a test dataset. It evaluates the model's performance using standard metrics and visualizes results, including a confusion matrix.


## 1. Importing Necessary Libraries
Required libraries are imported for:
- **Data processing**: pandas
- **Model loading and prediction**: xgboost
- **Feature scaling**: StandardScaler (from scikit-learn)
- **Visualization**: matplotlib


## 2. Loading the Pretrained Model
The saved XGBoost model (`xgboost_model.json`) is loaded using XGBoost's Booster class to make predictions on the test dataset.


In [None]:
# Import necessary libraries
import pandas as pd
import xgboost as xgb
from sklearn.preprocessing import StandardScaler
#from testing_preprocessing import dataset

# Load the saved XGBoost model from JSON file
model = xgb.Booster()
model.load_model('xgboost_model.json')
print("Loaded XGBoost model from 'xgboost_model.json'.")


## 3. Loading and Verifying Test Data
The preprocessed test dataset is loaded using the `test_to_df` function, and the first few rows are displayed for verification.


In [None]:
# using the preprocessed data
from data_reprocessing import test_to_df
dataset = 'new_test.csv'  
test_data = test_to_df(dataset)

# Display the first few rows of the dataset to verify the preprocessing
print(test_data.head())


## 4. Preparing Test Data
- `X_test`: Contains all feature columns except the target (`is_attributed`).
- `y_test`: The true labels representing fraudulent (0) and non-fraudulent (1) clicks.
The test data is converted to DMatrix format for optimized prediction with XGBoost.


## 5. Generating Predictions
The model predicts:
1. Probabilities of each class (`y_test_pred_prob`).
2. Binary classifications (`y_test_pred_class`) based on a threshold of 0.5.


In [None]:
# Separate features and labels from the test data
X_test = test_data.drop(columns=['is_attributed'])  # Features
y_test = test_data['is_attributed']  # True labels

# Convert the test set to DMatrix format for XGBoost
dtest = xgb.DMatrix(X_test)

# Make predictions on the test set using the loaded model
y_test_pred_prob = model.predict(dtest)

# Convert probabilities to binary class predictions (0 = fraudulent, 1 = non-fraudulent)
y_test_pred_class = (y_test_pred_prob > 0.5).astype(int)


## 6. Evaluating Model Performance
Performance metrics are calculated for fraud detection:
- **Accuracy**: Overall correctness of predictions.
- **Precision**: Correctly identified fraudulent clicks as a fraction of total predicted frauds.
- **Recall**: Fraction of actual frauds identified correctly.
- **F1-Score**: Harmonic mean of precision and recall.
- **AUC-ROC**: Area under the ROC curve for classification effectiveness.


In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

# Calculate performance metrics on the test set
accuracy = accuracy_score(y_test, y_test_pred_class)
precision = precision_score(y_test, y_test_pred_class, pos_label=0)  # 0 = fraudulent
recall = recall_score(y_test, y_test_pred_class, pos_label=0)  # 0 = fraudulent
f1 = f1_score(y_test, y_test_pred_class, pos_label=0)  # 0 = fraudulent
auc_roc = roc_auc_score(y_test, y_test_pred_prob)

# Display evaluation results
print(f"Test Set Accuracy: {accuracy:.4f}")
print(f"Test Set Precision (fraud detection): {precision:.4f}")
print(f"Test Set Recall (fraud detection): {recall:.4f}")
print(f"Test Set F1-Score (fraud detection): {f1:.4f}")
print(f"Test Set AUC-ROC: {auc_roc:.4f}")


## 7. Visualizing Confusion Matrix
The confusion matrix visualizes:
- True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN).
A color-coded matrix is displayed for clear interpretation.


In [None]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

# Calculate the confusion matrix
cm = confusion_matrix(y_test, y_test_pred_class)

# Plot the confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=['Fraud (0)', 'Non-Fraud (1)'])
disp.plot(cmap='Blues')
plt.title('Confusion Matrix on Test Set')
plt.show()


## Conclusion
This notebook evaluates the performance of a trained XGBoost model for fraud detection. The results demonstrate the model's ability to identify fraudulent clicks accurately, with performance metrics and visualizations providing insights into its effectiveness.
