# Credit Card Fraud Detection

## 1. Introduction
This notebook tackles the problem of credit card fraud detection, a classic example of imbalanced classification. We will use a dataset from `imbalanced-learn` and apply techniques like SMOTE to handle the class imbalance before training a classification model.

## 2. Data Loading and Preparation

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from imblearn.datasets import fetch_datasets

# Load the dataset
data = fetch_datasets()['creditcard']
X = data.data
y = data.target

# Create a DataFrame for easier manipulation
df = pd.DataFrame(X, columns=[f'V{i+1}' for i in range(X.shape[1])])
df['Class'] = y

df.head()

## 3. Exploratory Data Analysis (EDA)

In [None]:
# Check the class distribution
print(df['Class'].value_counts())
sns.countplot(x='Class', data=df)
plt.title('Class Distribution (0: Non-Fraud, 1: Fraud)')
plt.show()

The dataset is highly imbalanced, with a very small number of fraudulent transactions.

## 4. Handling Class Imbalance with SMOTE

In [None]:
from imblearn.over_sampling import SMOTE

smote = SMOTE(random_state=42)
X_res, y_res = smote.fit_resample(X, y)

# Check the new class distribution
print(pd.Series(y_res).value_counts())
sns.countplot(x=y_res)
plt.title('Class Distribution After SMOTE')
plt.show()

## 5. Model Building and Training

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score, roc_curve

# Split the resampled data
X_train, X_test, y_train, y_test = train_test_split(X_res, y_res, test_size=0.2, random_state=42)

# Train a Logistic Regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

## 6. Model Evaluation

In [None]:
# Make predictions
y_pred = model.predict(X_test)
y_pred_proba = model.predict_proba(X_test)[:, 1]

# Evaluate the model
print('Classification Report:')
print(classification_report(y_test, y_pred))
print(f'ROC AUC Score: {roc_auc_score(y_test, y_pred_proba):.2f}')

# Plot ROC curve
fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
plt.figure()
plt.plot(fpr, tpr, label='Logistic Regression (area = %0.2f)' % roc_auc_score(y_test, y_pred_proba))
plt.plot([0, 1], [0, 1], 'r--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()

## 7. Conclusion
By using SMOTE to handle the class imbalance, we were able to train a Logistic Regression model that performs well in detecting fraudulent transactions. The high ROC AUC score and the detailed classification report demonstrate the model's effectiveness. This approach is crucial for building reliable fraud detection systems where the cost of missing a fraudulent transaction is high.