# 👩‍💻 Detecting Credit Card Fraud with Isolation Forest

## 📋 Overview
In this lab, you'll apply **Isolation Forest** to detect anomalies (potential fraud cases) in a <b>Credit Card Fraud Detection</b> Dataset. You’ll visualize the data, prepare it for modeling, experiment with Isolation Forest hyperparameters, and compare it to a simple K-Nearest Neighbors (KNN) anomaly detection method.

## 🎯 Learning Outcomes
By the end of this lab, you will be able to:

- Prepare transactional data for anomaly detection

- Train and evaluate an Isolation Forest model

- Visualize and interpret anomalies in feature space

- Adjust hyperparameters to control anomaly sensitivity


## Task 1: Load and Understand the Dataset
**Context:** Start with basic exploration to understand the structure and distribution of data.

**Steps:**

1. Load the dataset using Pandas.


2. Display the first few rows `(.head())`, dataset info (`.info()`), and basic statistics (.describe()).


3. Focus on TransactionAmount `amt`, TransactionTime `unix_time`, and any available merchant identifiers.

**Prompting Questions:**
- Are there any obvious missing values?

- Do `TransactionAmount` and `TransactionTime` have reasonable ranges?

**💡 Tip:** Look at the maximum and minimum values for potential anomalies.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import NearestNeighbors

# Load the dataset
data = pd.read_csv('Credit_Card_Transactions.csv') 

# Preview the data
# <your code here>

**⚙️ Test Your Work:**

- DataFrame loads successfully

- Features correctly identified

## Task 2: Initial Data Visualization
**Context:** Get a first look at patterns and potential anomalies.

**Steps:**

1. Create a scatter plot of `TransactionAmount` vs `TransactionTime`.


2. Look for isolated points or dense groupings.

**Prompting Questions:**
- Are there points far away from others (potential fraud)?

- Is transaction volume time-dependent?

**💡 Tip:** Log-transform `TransactionAmount` if distribution is extremely skewed.

In [None]:
# Create scatter plot of TransactionAmount vs TransactionTime

**⚙️ Test Your Work:**

- Scatter plot shows some areas more populated than others

- Potential anomalies visually identifiable

## Task 3: Prepare the Data for Isolation Forest
**Context:** While Isolation Forest is less sensitive to feature scale than distance-based methods, standardizing features is generally recommended for many algorithms and can still be beneficial.

**Steps:**

1. Select key features (`TransactionAmount`, `TransactionTime`).


2. Standardize features using `StandardScaler`.

**Prompting Questions:**
- Are features roughly standardized (mean 0, std 1) after scaling?
- Is any transformation needed (log-scaling high-skew features)?

**💡 Tip:** Always reshape if needed: `.values.reshape(-1,1)`.

In [None]:
# Standardize features before applying Isolation Forest

**⚙️ Test Your Work:**

- Scaled feature set (`X_scaled`) ready for modeling

## Task 4: Implement Isolation Forest
**Context:** Train the model to detect anomalies.

**Steps:**

1. Configure `IsolationForest()` with a contamination rate (start with 0.02 for 2% fraud).


2. Fit the model to the scaled feature set.


3. Predict anomalies (-1 = anomaly, 1 = normal).

**Prompting Questions:**
- How sensitive is the model to contamination changes?

- Are anomalies well-separated visually?

**💡 Tip:** Use `random_state=42` for reproducibility.

In [None]:
# Train Isolation Forest and predict anomalies

**⚙️ Test Your Work:**
- Anomaly predictions assigned correctly (1 = normal, -1 = anomaly)

## Task 5: Analyze Anomalies
**Context:** See how anomalies align with original plots.

**Steps:**

1. Add anomaly labels back to the DataFrame.


2. Visualize anomalies on the scatter plot (different color/marker).


3. Reflect on the spatial distribution of anomalies.

**Prompting Questions:**
- Are anomalies generally isolated from dense clusters?

- Are there false positives (normal behavior marked as fraud)?

**💡 Tip:** Use different markers or colors for fraud vs normal.

In [None]:
# Visualize anomalies detected by Isolation Forest

**⚙️ Test Your Work:**
- Clear separation of anomaly points from normal points on plot

## Task 6: Reflect on Findings
**Context:** Understand practical implications of your anomaly detection.

**Steps:**

1. Summarize the main traits of detected fraud cases.


2. Discuss how fraud detection impacts business operations.


3. Reflect on differences in density between normal and anomalous transactions.

**Prompting Questions:**
- Do anomalies have unusually high or low transaction amounts?

- How might adjusting contamination affect business risk tolerance?

## (Optional) Compare Isolation Forest vs KNN Anomaly Detection
**Bonus Task:**
- Use `NearestNeighbors` to calculate distances to nearest neighbors.

- Flag points farthest from their neighbors as potential anomalies.

- Compare results qualitatively to Isolation Forest.

**Prompting Questions:**
- Does KNN capture similar anomalies?

- How do models differ in outlier sensitivity?

## ✅ Success Checklist
- Dataset loaded and understood

- Scatter plot of raw data created

- Features standardized

- Isolation Forest model trained and anomalies predicted

- Anomalies visualized on plots

- Findings and reflections documented


## 🔍 Common Issues & Solutions

**Problem:** Model predicts all points as normal
 
**Solution:** Adjust contamination parameter; start around 0.02 or 0.05
 
**Problem:** Scaling errors
 
**Solution:** Ensure correct feature selection and 
 
**Problem:** Visualizations look unclear
 
**Solution:** Use alpha blending or separate colors to highlight anomalies

## 🔑 Key Points
- Isolation Forest isolates anomalies through random splits

- Proper feature scaling improves anomaly detection accuracy

- Visualizing data helps validate anomaly predictions


## Exemplar Solution

After completing this activity (or if you get stuck!), take a moment to review the exemplar solution. This sample solution can offer insights into different techniques and approaches.

Reflect on what you can learn from the exemplar solution to improve your coding skills.

Remember, multiple solutions can exist for some problems; the goal is to learn and grow as a programmer by exploring various approaches.

Use the exemplar solution as a learning tool to enhance your understanding and refine your approach to coding challenges.

<details>
<summary><strong>Click HERE to see an examplar solution</summary><strong>
    
```python
# -------------------------------
# Task 1: Load and Understand Dataset
# -------------------------------

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import NearestNeighbors

# Load dataset
data = pd.read_csv('Credit_Card_Transactions.csv')  # Update path if needed

# Quick view of data
print(data.head())
print(data.info())
print(data.describe())

# Select and drop NaNs from both features and corresponding rows in original data
selected_cols = ['amt', 'unix_time']
data = data.dropna(subset=selected_cols)
features = data[selected_cols]

# -------------------------------
# Task 2: Initial Data Visualization
# -------------------------------

# Basic scatter plot of transactions
plt.figure(figsize=(8,6))
plt.scatter(features['unix_time'], features['amt'], alpha=0.5)
plt.title('Transaction Amount vs Transaction Time')
plt.xlabel('Transaction Time (unix_time)')
plt.ylabel('Transaction Amount (amt)')
plt.grid(True)
plt.show()

# -------------------------------
# Task 3: Prepare Data for Isolation Forest
# -------------------------------

# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(features)

# -------------------------------
# Task 4: Implement Isolation Forest
# -------------------------------

# Initialize and fit Isolation Forest
iso_forest = IsolationForest(contamination=0.02, random_state=42)
anomaly_labels = iso_forest.fit_predict(X_scaled)

# Assign anomaly labels to data
data['Anomaly_IF'] = anomaly_labels  # 1 = normal, -1 = anomaly

# -------------------------------
# Task 5: Analyze and Visualize Anomalies
# -------------------------------

# Plot anomalies on scatter plot
plt.figure(figsize=(8,6))
plt.scatter(
    data['unix_time'],
    data['amt'],
    c=data['Anomaly_IF'],
    cmap='coolwarm',
    edgecolors='k',
    alpha=0.7
)
plt.title('Isolation Forest: Fraud Detection')
plt.xlabel('Transaction Time (unix_time)')
plt.ylabel('Transaction Amount (amt)')
plt.grid(True)
plt.show()

# -------------------------------
# (Optional) Compare to KNN Anomaly Detection
# -------------------------------

# Use Nearest Neighbors to find anomalies
neighbors = NearestNeighbors(n_neighbors=5)
neighbors.fit(X_scaled)
distances, indices = neighbors.kneighbors(X_scaled)

# Use distance to nearest neighbors as anomaly score
mean_distances = distances.mean(axis=1)

# Set threshold manually (top 2% farthest)
threshold = np.percentile(mean_distances, 98)
anomaly_knn = (mean_distances > threshold).astype(int)  # 1 = anomaly

# Add KNN anomaly predictions
data['Anomaly_KNN'] = anomaly_knn

# Visualize KNN anomalies
plt.figure(figsize=(8,6))
plt.scatter(
    data['unix_time'],
    data['amt'],
    c=data['Anomaly_KNN'],
    cmap='plasma',
    edgecolors='k',
    alpha=0.6
)
plt.title('KNN Anomaly Detection: Fraud Detection')
plt.xlabel('Transaction Time (unix_time)')
plt.ylabel('Transaction Amount (amt)')
plt.grid(True)
plt.show()

# -------------------------------
# Task 6: Reflect on Findings
# -------------------------------

"""Findings:
- Isolation Forest flagged ~2% of transactions as anomalies.
- Many anomalies correspond to unusually high or low amounts or odd transaction times.
- KNN flagged a slightly different set, focused more on isolated or sparse data points.
- Both methods are unsupervised and do not use the 'is_fraud' label.
- You can optionally compare predictions to 'is_fraud' to see how well models align.
"""

```