# Credit Card Fraud Detection Project

This notebook covers a complete workflow for detecting credit card fraud using the Kaggle dataset. Key phases include data loading, cleaning, preprocessing, handling class imbalance, model training and evaluation, and optional deployment.

You'll need to download `creditcard.csv` from Kaggle and place it in the same directory as this notebook.

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from imblearn.over_sampling import SMOTE
from sklearn.metrics import classification_report, roc_auc_score

print('Libraries imported successfully.')

In [None]:
df = pd.read_csv('creditcard.csv')
print('Shape:', df.shape)
display(df.head())
print(df.info())
print(df.describe())

In [None]:
# Check for missing values
print(df.isnull().sum())

# Scale 'Amount'
scaler = StandardScaler()
df['Amount'] = scaler.fit_transform(df[['Amount']])

# Drop 'Time'
df.drop(columns=['Time'], inplace=True)
print('Preprocessing complete.')

In [None]:
# Split into features and target
X = df.drop('Class', axis=1)
y = df['Class']

# Stratified split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)
print('Training set:', X_train.shape, 'Test set:', X_test.shape)

In [None]:
# Apply SMOTE to training data
smote = SMOTE(random_state=42)
X_train_res, y_train_res = smote.fit_resample(X_train, y_train)
print('Resampled training set:', X_train_res.shape)

In [None]:
models = {
    'Logistic Regression': LogisticRegression(class_weight='balanced', max_iter=1000),
    'Decision Tree': DecisionTreeClassifier(class_weight='balanced'),
    'Random Forest': RandomForestClassifier(class_weight='balanced', n_estimators=100),
    'XGBoost': XGBClassifier(scale_pos_weight=(len(y_train) - sum(y_train)) / sum(y_train))
}

results = {}
for name, model in models.items():
    model.fit(X_train_res, y_train_res)
    preds = model.predict(X_test)
    report = classification_report(y_test, preds, output_dict=True)
    auc = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1])
    results[name] = {'report': report, 'roc_auc': auc}
    print(f"{name} - ROC AUC: {auc:.4f}")


## Model Performance Summary
The table below summarizes precision, recall, F1-score, and ROC AUC for each model. Look for a balance of high precision and recall.


## Optional Deployment
You can deploy this model via Streamlit or Flask:
```bash
streamlit run app.py
```
or
```bash
python app.py
```