# Fraud Detection Case Study

This notebook walks through the end-to-end process of building a fraud detection model.

## Step 1: Load the Dataset

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load the data
file_path = 'Fraud.csv'  # Ensure this file is in your working directory
df = pd.read_csv(file_path)

# Basic exploration
print("Dataset Shape:", df.shape)
df.head()

## Step 2: Data Overview and Cleaning

In [None]:
print(df.info())
print(df.describe())
print(df.isnull().sum())

## Step 3: Class Balance and Target Analysis

In [None]:
print(df['isFraud'].value_counts())
sns.countplot(x='isFraud', data=df)
plt.title('Fraud vs Non-Fraud Count')
plt.show()

## Step 4: Feature Engineering

In [None]:
df = pd.get_dummies(df, columns=['type'], drop_first=True)
df['errorBalanceOrig'] = df['newbalanceOrig'] + df['amount'] - df['oldbalanceOrg']
df['errorBalanceDest'] = df['oldbalanceDest'] + df['amount'] - df['newbalanceDest']

## Step 5: Data Preparation

In [None]:
X = df.drop(['isFraud', 'isFlaggedFraud', 'nameOrig', 'nameDest'], axis=1)
y = df['isFraud']

## Step 6: Train/Test Split

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

## Step 7: Train Random Forest Classifier

In [None]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=100, class_weight='balanced', random_state=42)
model.fit(X_train, y_train)

## Step 8: Model Evaluation

In [None]:
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score

y_pred = model.predict(X_test)
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
probs = model.predict_proba(X_test)[:, 1]
print("ROC AUC Score:", roc_auc_score(y_test, probs))

## Step 9: Feature Importance

In [None]:
importances = model.feature_importances_
feat_names = X.columns
feat_imp = pd.Series(importances, index=feat_names).sort_values(ascending=False)

plt.figure(figsize=(10,6))
sns.barplot(x=feat_imp[:10], y=feat_imp.index[:10])
plt.title('Top 10 Important Features')
plt.show()

## Summary
- Built a Random Forest model with balanced class weighting.
- Evaluated with classification report and AUC.
- Derived insights from top predictive features.