# Fraud Detection Model Evaluation with Gaussian Naive Bayes

This notebook evaluates a previously trained Gaussian Naive Bayes model on a test dataset. It includes predictions, evaluation metrics, and visualization of results.


## 1. Importing Required Libraries
This section imports the libraries needed for:
- **Data handling**: pandas.
- **Model loading**: joblib.
- **Evaluation metrics**: accuracy, precision, recall, F1-score, ROC AUC, and confusion matrix.
- **Visualization**: matplotlib for plotting.


## 2. Loading the Trained Naive Bayes Model
The saved Gaussian Naive Bayes model (`naive_bayes_model.joblib`) is loaded using `joblib`. The loaded model is ready for predictions on the test dataset.


In [None]:
# Import necessary libraries
import pandas as pd
import joblib  # For loading the saved model
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

# Load the saved Naive Bayes model from the pickle file
model = joblib.load('naive_bayes_model.joblib')
print("Loaded Naive Bayes model from 'naive_bayes_model.joblib'.")


## 3. Loading and Verifying Test Data
The preprocessed test dataset is loaded using the `test_to_df` function. The first few rows are displayed to verify the dataset's integrity and structure.


In [None]:
# using the preprocessed data
from data_reprocessing import test_to_df
dataset = 'new_test.csv'  
test_data = test_to_df(dataset)

# Display the first few rows of the dataset to verify the preprocessing
print(test_data.head())

## 4. Preparing the Test Dataset
- Features (`X_test`): All columns except the target (`is_attributed`).
- Target (`y_test`): True labels indicating fraud (0) or non-fraud (1).
- The model generates predictions:
  - `y_test_pred_prob`: Probabilities for class 1 (fraudulent).
  - `y_test_pred_class`: Binary class predictions based on a 0.5 threshold.


In [None]:
# Separate features and labels from the test data
X_test = test_data.drop(columns=['is_attributed'])  # Features
y_test = test_data['is_attributed']  # True labels

# Make predictions on the test set using the loaded model
y_test_pred_prob = model.predict_proba(X_test)[:, 1]  # Probability predictions for class 1
y_test_pred_class = model.predict(X_test)  # Binary class predictions

## 5. Evaluating Model Performance
The following metrics are calculated for the test set:
- **Accuracy**: Overall correctness of predictions.
- **Precision**: Fraction of correctly predicted fraud cases.
- **Recall**: Fraction of actual fraud cases identified.
- **F1-Score**: Harmonic mean of precision and recall.
- **ROC AUC Score**: Area under the ROC curve, measuring classification effectiveness.
Results are displayed for analysis.


In [None]:
# Calculate performance metrics on the test set
accuracy = accuracy_score(y_test, y_test_pred_class)
precision = precision_score(y_test, y_test_pred_class, pos_label=0)  # 0 = fraudulent
recall = recall_score(y_test, y_test_pred_class, pos_label=0)  # 0 = fraudulent
f1 = f1_score(y_test, y_test_pred_class, pos_label=0)  # 0 = fraudulent
auc_roc = roc_auc_score(y_test, y_test_pred_prob)

# Display evaluation results
print(f"Test Set Accuracy: {accuracy:.4f}")
print(f"Test Set Precision (fraud detection): {precision:.4f}")
print(f"Test Set Recall (fraud detection): {recall:.4f}")
print(f"Test Set F1-Score (fraud detection): {f1:.4f}")
print(f"Test Set AUC-ROC: {auc_roc:.4f}")


## 6. Visualizing Confusion Matrix
- The confusion matrix shows:
  - True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN).
- A heatmap is displayed for intuitive visualization, with labels for fraud (0) and non-fraud (1).


In [None]:
# Calculate the confusion matrix
cm = confusion_matrix(y_test, y_test_pred_class)

# Plot the confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=['Fraud (0)', 'Non-Fraud (1)'])
disp.plot(cmap='Blues')
plt.title('Confusion Matrix on Test Set')
plt.show()


## Conclusion
This notebook demonstrates evaluating a Gaussian Naive Bayes model for fraud detection. It calculates key performance metrics and visualizes results, providing insights into the model's effectiveness on unseen data.
