# Session 21-22 Support Vector Machine

# Exercise: Credit Card Fraud Detection

You are a data scientist at a financial institution tasked with improving the company’s fraud detection system.

You are given a large, high-dimensional transactional dataset that is suspected to contain redundant and irrelevant features.

Dataset link:
https://www.kaggle.com/datasets/shayannaveed/credit-card-fraud-detection

Your objective is to apply dimensionality reduction techniques to simplify the data while preserving the most informative patterns for fraud detection.

# Step 1 — Data Understanding

In [None]:
# =========================
# STEP 1: DATA UNDERSTANDING
# =========================

import pandas as pd
import numpy as np

In [None]:
# Load the dataset
df = pd.read_csv("creditcard.csv")

# Display basic information about the dataset
print(df.info())

# Preview the first few rows
df.head()

In [None]:
# Check class distribution (fraud vs non-fraud)
df["Class"].value_counts(normalize=True)

In [None]:
# Separate features and target
X = df.drop("Class", axis=1)  # All input features
y = df["Class"]               # Target variable (0 = non-fraud, 1 = fraud)

# Print dataset shape
print("Feature matrix shape:", X.shape)
print("Target shape:", y.shape)

## Why dimensionality reduction helps (conceptual):

This dataset has many features (V1–V28) already PCA-transformed, plus Time and Amount.

Dimensionality reduction can:
- Reduce noise
- Improve computational efficiency
- Help visualization
- Potentially improve model generalization

# Step 2 — Method Selection

We will use three methods for different purposes:

| Method | Purpose |
|-------|---------|
| PCA | Feature compression for modeling |
| t-SNE | Visualization (NOT modeling) |
| LDA | Supervised dimensionality reduction |

# Step 3 — Dimensionality Reduction

## 3.1 Standardization

In [None]:
# =========================
# FEATURE SCALING
# =========================

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

# Scale features (VERY important for PCA, t-SNE, LDA)
X_scaled = scaler.fit_transform(X)

## 3.2 Principal Component Analysis (PCA)

In [None]:
# =========================
# PCA
# =========================

from sklearn.decomposition import PCA

# Keep enough components to explain 95% of variance
pca = PCA(n_components=0.95, random_state=42)

X_pca = pca.fit_transform(X_scaled)

print("Original number of features:", X_scaled.shape[1])
print("Reduced number of features after PCA:", X_pca.shape[1])

### What does this line do?

`pca = PCA(n_components=0.95, random_state=42)`

This line creates a PCA model that automatically selects the minimum number of principal components needed to retain 95% of the variance in the data.

`n_components = 0.95` : Keep enough principal components to explain 95% of the total variance.

`random_state = 42` : Ensures reproducibility. PCA itself is deterministic, but some underlying numerical procedures may involve randomness. Setting `random_state` ensures the same result every time the code is run.

In [None]:
# Explained variance ratio
explained_variance = np.cumsum(pca.explained_variance_ratio_)

# Show cumulative variance
explained_variance

## 3.3 t-SNE (Visualization Only)
t-SNE is not used for downstream models.

In [None]:
# =========================
# t-SNE (Visualization)
# =========================

from sklearn.manifold import TSNE

# Use a subset for speed (t-SNE is expensive)
sample_idx = np.random.choice(len(X_scaled), size=5000, replace=False)

X_subset = X_scaled[sample_idx]
y_subset = y.iloc[sample_idx]

tsne = TSNE(
    n_components=2,
    perplexity=30,
    learning_rate=200,
    max_iter=1000,
    random_state=42
)

X_tsne = tsne.fit_transform(X_subset)

In [None]:
# Visualization
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 6))
plt.scatter(
    X_tsne[:, 0],
    X_tsne[:, 1],
    c=y_subset,
    cmap="coolwarm",
    alpha=0.6
)
plt.title("t-SNE Visualization of Credit Card Transactions")
plt.xlabel("t-SNE Component 1")
plt.ylabel("t-SNE Component 2")
plt.colorbar(label="Non-Fraud (0) / Fraud (1)")
plt.show()

### Interpretation

t-SNE preserves local neighborhoods.

Clusters may appear well separated, but distances are NOT interpretable.

## 3.4 Linear Discriminant Analysis (LDA)

LDA is supervised, so it uses labels.

In [None]:
# =========================
# LDA
# =========================

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# For binary classification, LDA can reduce to at most 1 component
lda = LinearDiscriminantAnalysis(n_components=1)

X_lda = lda.fit_transform(X_scaled, y)

print("LDA reduced shape:", X_lda.shape)

### Interpretation

LDA maximizes class separability.
    
Since this is a binary problem, max components = number_of_classes - 1 = 1

# Step 4 — Analysis and Interpretation

## Information Retention

In [None]:
# PCA information retention
print("Total variance retained by PCA:", explained_variance[-1])

## Summary Comparison

In [None]:
# =========================
# METHOD COMPARISON SUMMARY
# =========================

methods_summary = pd.DataFrame({
    "Method": ["Original", "PCA", "t-SNE", "LDA"],
    "Dimensions": [
        X_scaled.shape[1],
        X_pca.shape[1],
        2,
        X_lda.shape[1]
    ],
    "Purpose": [
        "Raw data",
        "Modeling & compression",
        "Visualization only",
        "Supervised discrimination"
    ]
})

methods_summary

## Final Interpretation

PCA:
- Reduces dimensionality while retaining variance
- Suitable for fraud detection models

t-SNE:
- Excellent for visualization
- NOT suitable for training classifiers

LDA:
- Uses class labels
- Maximizes fraud vs non-fraud separation
- Very useful for imbalanced classification