# Advanced Models for Fraud Detection

In this notebook, we will explore advanced machine learning models to improve our fraud detection capabilities. We will evaluate models such as Random Forest and XGBoost, and compare their performance against our baseline logistic regression model.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
import joblib

# Load the processed training data
train_data = pd.read_csv('../data/processed/train_engineered.csv')
X = train_data.drop('is_fraud', axis=1)
y = train_data['is_fraud']

# Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Random Forest Model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# Predictions and Evaluation
rf_predictions = rf_model.predict(X_val)
print("Random Forest Classification Report:")
print(classification_report(y_val, rf_predictions))
print("Confusion Matrix:")
print(confusion_matrix(y_val, rf_predictions))

# Save the Random Forest model
joblib.dump(rf_model, '../models/random_forest.pkl')

# XGBoost Model
xgb_model = XGBClassifier(use_label_encoder=False, eval_metric='logloss')
xgb_model.fit(X_train, y_train)

# Predictions and Evaluation
xgb_predictions = xgb_model.predict(X_val)
print("XGBoost Classification Report:")
print(classification_report(y_val, xgb_predictions))
print("Confusion Matrix:")
print(confusion_matrix(y_val, xgb_predictions))

# Save the XGBoost model
joblib.dump(xgb_model, '../models/xgboost_final.pkl')

# Conclusion
## In this notebook, we have trained and evaluated Random Forest and XGBoost models for fraud detection. The results will help us determine the best model for our final submission.