
# 💳 AI-Based Financial Fraud Detection

This notebook builds a machine learning model using Random Forest and SMOTE to detect fraudulent credit card transactions.  
It uses a public Kaggle dataset and is suitable for use in finance/data science internship portfolios (e.g., Goldman Sachs, World Bank).

---


In [None]:
from google.colab import drive
drive.mount('/content/drive')

import pandas as pd

file_path = '/content/drive/MyDrive/Fraud_Detection_Project/Dataset/creditcard.csv'
df = pd.read_csv(file_path)

df.head()




## 🔄 Data Preprocessing

- Load dataset from Google Drive
- Normalize the 'Amount' column
- Drop unhelpful columns like 'Time'


In [None]:
print(df.info())            # Check for nulls
print(df['Class'].value_counts())  # Check class imbalance


In [None]:
from sklearn.preprocessing import StandardScaler

# Normalize 'Amount' (run once is fine)
df['Amount'] = StandardScaler().fit_transform(df[['Amount']])

# Drop 'Time' column only if it exists
if 'Time' in df.columns:
    df = df.drop(['Time'], axis=1)


In [None]:
from sklearn.model_selection import train_test_split

# Step 1: Separate features (X) and target (y)
X = df.drop('Class', axis=1)  # Features (input data)
y = df['Class']               # Target (fraud or not fraud)

# Step 2: Split the data (80% train, 20% test) with stratification
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,           # 20% for testing
    stratify=y,              # Keep fraud ratio same in train and test
    random_state=42          # Reproducibility
)



## 🧠 Model Training with Random Forest + SMOTE

- Use SMOTE to oversample the minority class (fraud)
- Train the model on balanced data
- Evaluate using confusion matrix and classification report


In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

# Step 1: Create the model
model = RandomForestClassifier(n_estimators=100, random_state=42)

# Step 2: Train the model
model.fit(X_train, y_train)

# Step 3: Predict on the test set
y_pred = model.predict(X_test)

# Step 4: Evaluate the model
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))


In [None]:
!pip install imbalanced-learn


In [None]:
from imblearn.over_sampling import SMOTE

# Apply SMOTE only on training data
smote = SMOTE(random_state=42)
X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)

# Check new class distribution
print("Before SMOTE:", y_train.value_counts())
print("After SMOTE:", y_train_resampled.value_counts())


In [None]:
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train_resampled, y_train_resampled)

# Predict on original test set
y_pred = model.predict(X_test)

# Evaluate
from sklearn.metrics import classification_report, confusion_matrix
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))


In [None]:
import joblib

# Define path to save the model
model_path = '/content/drive/MyDrive/Fraud_Detection_Project/smote_fraud_model.pkl'

# Save the model
joblib.dump(model, model_path)

print(f"✅ Model saved to: {model_path}")
