# Credit Card Fraud Detection – EDA & Modeling

This notebook performs Exploratory Data Analysis (EDA), preprocessing, and machine learning modeling on the Credit Card Fraud Detection dataset.

In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score


## Load Dataset
Make sure `creditcard.csv` is in the same folder as this notebook.

In [ ]:
df = pd.read_csv('creditcard.csv')
df.head()

## Dataset Overview

In [ ]:
df.info()
df.describe()

## Class Distribution

In [ ]:
df['Class'].value_counts().plot(kind='bar')
plt.title('Fraud vs Non-Fraud Distribution')
plt.show()

## Feature Scaling

In [ ]:
scaler = StandardScaler()
df['Amount'] = scaler.fit_transform(df[['Amount']])
df['Time'] = scaler.fit_transform(df[['Time']])

## Train-Test Split

In [ ]:
X = df.drop('Class', axis=1)
y = df['Class']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

## Logistic Regression Model

In [ ]:
lr = LogisticRegression(max_iter=1000)
lr.fit(X_train, y_train)

y_pred_lr = lr.predict(X_test)
print(classification_report(y_test, y_pred_lr))

## Random Forest Model

In [ ]:
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

y_pred_rf = rf.predict(X_test)
print(classification_report(y_test, y_pred_rf))

## Conclusion
Random Forest performs better in detecting fraudulent transactions due to higher recall and F1-score.