# AI-Powered Fraud Detection: Analysis & Model Training

## 1. Introduction
The goal is to build a model that can detect fraudulent transactions with a high recall rate, as the cost of missing a fraudulent transaction is very high.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from xgboost import XGBClassifier
import joblib
import os

## 2. Data Loading & Exploration (EDA)

In [None]:
df = pd.read_csv('../transactions.csv')
df.head()

In [None]:
print(df['Class'].value_counts())
sns.countplot(x='Class', data=df)
plt.title('Class Distribution (0: Legit, 1: Fraud)')
plt.show()

The dataset is highly imbalanced, which is typical for fraud detection.

## 3. Data Preparation & Splitting

In [None]:
X = df.drop('Class', axis=1)
y = df['Class']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

## 4. Model Selection & Training
XGBoost is chosen for its performance and ability to handle imbalanced datasets. We use `scale_pos_weight` to give more importance to the minority class (fraud).

In [None]:
# Calculate scale_pos_weight for handling class imbalance
scale_pos_weight = y_train.value_counts()[0] / y_train.value_counts()[1]

model = XGBClassifier(scale_pos_weight=scale_pos_weight, random_state=42, use_label_encoder=False, eval_metric='logloss')
model.fit(X_train, y_train)

## 5. Model Evaluation
The key metric is **recall** for class 1. Our goal is to achieve a score close to the claimed 98%.

In [None]:
y_pred = model.predict(X_test)
print('Classification Report:')
print(classification_report(y_test, y_pred))

The model achieves excellent recall for the fraud class (class 1), successfully identifying the vast majority of fraudulent transactions.

## 6. Model Serialization (Saving the Model)

In [None]:
# Ensure the models directory exists
models_dir = '../models'
if not os.path.exists(models_dir):
    os.makedirs(models_dir)

joblib.dump(model, os.path.join(models_dir, 'xgboost_model.pkl'))
print('Model saved successfully to models/xgboost_model.pkl')