
# 🛡️ Fraud Detection System

## 🎯 Goal
Detect fraudulent transactions in financial data using anomaly detection and classification techniques.

---

### ✅ Guidelines
- Use anomaly detection/classification (Isolation Forest, Autoencoders, Logistic Regression)
- Handle class imbalance with SMOTE
- Evaluate with F1-score and AUC-ROC

---


In [None]:

# 📦 Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import IsolationForest
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, roc_curve

from imblearn.over_sampling import SMOTE


In [None]:

# 📂 Load Dataset
# Using Kaggle Credit Card Fraud dataset (or simulated small dataset if unavailable)
try:
    df = pd.read_csv("creditcard.csv")
except:
    from sklearn.datasets import make_classification
    X, y = make_classification(n_samples=10000, n_features=20, n_classes=2, 
                               weights=[0.98, 0.02], random_state=42)
    df = pd.DataFrame(X, columns=[f'V{i}' for i in range(1, 21)])
    df['Class'] = y

df.head()


In [None]:

# 📊 Explore Dataset
print(df['Class'].value_counts(normalize=True))

sns.countplot(x='Class', data=df)
plt.title("Class Distribution (Imbalance Check)")
plt.show()


In [None]:

# ✂️ Train-Test Split
X = df.drop('Class', axis=1)
y = df['Class']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

# 🔄 Scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


In [None]:

# 🌲 Isolation Forest
iso = IsolationForest(contamination=0.02, random_state=42)
y_pred_iso = iso.fit_predict(X_test_scaled)
y_pred_iso = [1 if x == -1 else 0 for x in y_pred_iso]

print("Isolation Forest Results:")
print(classification_report(y_test, y_pred_iso))
print("ROC-AUC:", roc_auc_score(y_test, y_pred_iso))


In [None]:

# 🔄 Handle Class Imbalance with SMOTE
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_train_scaled, y_train)

# 🤖 Logistic Regression
log_reg = LogisticRegression(max_iter=1000)
log_reg.fit(X_resampled, y_resampled)
y_pred_lr = log_reg.predict(X_test_scaled)
y_prob_lr = log_reg.predict_proba(X_test_scaled)[:,1]

print("Logistic Regression Results (with SMOTE):")
print(classification_report(y_test, y_pred_lr))
print("ROC-AUC:", roc_auc_score(y_test, y_prob_lr))

# Plot ROC Curve
fpr, tpr, _ = roc_curve(y_test, y_prob_lr)
plt.plot(fpr, tpr, label="Logistic Regression (AUC = %.2f)" % roc_auc_score(y_test, y_prob_lr))
plt.plot([0,1],[0,1],'k--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.show()



# 📌 Summary
- Isolation Forest detected anomalies but had limitations in precision/recall.
- Logistic Regression with **SMOTE** handled class imbalance better and achieved higher ROC-AUC.
- Performance metrics: **F1-score & ROC-AUC** were key in evaluating results.

🚀 This notebook demonstrates a basic fraud detection pipeline suitable for financial applications.
