# Isolation Forest - Complete Guide
## Anomaly Detection and Outlier Identification

## üìö Learning Objectives
- Understand Isolation Forest algorithm for anomaly detection
- Learn how isolation works for outlier detection
- Tune contamination parameter
- Apply to fraud detection and outlier identification
- Compare with other anomaly detection methods
- Handle imbalanced anomaly detection problems

## üéØ What is Isolation Forest?

**Isolation Forest** is an unsupervised machine learning algorithm for anomaly detection that isolates anomalies instead of profiling normal data points.

### Key Insight:
**Anomalies are few and different** - they are easier to isolate than normal points!

### How It Works:
1. **Randomly select a feature** and a split value
2. **Recursively partition** the data
3. **Anomalies require fewer splits** to isolate (shorter path length)
4. **Normal points require more splits** (longer path length)

### Anomaly Score:
- Based on **path length** in isolation trees
- **Short path** ‚Üí High anomaly score ‚Üí Likely anomaly
- **Long path** ‚Üí Low anomaly score ‚Üí Likely normal

### Advantages:
‚úÖ **Fast** - Linear time complexity O(n)  
‚úÖ **Scalable** - Works with large datasets  
‚úÖ **No distance metrics** - Unlike LOF, DBSCAN  
‚úÖ **Handles high dimensions** - Better than distance-based methods  
‚úÖ **Few parameters** - Only contamination and n_estimators  
‚úÖ **Unsupervised** - No labels needed  

### When to Use:
‚úÖ Fraud detection  
‚úÖ Network intrusion detection  
‚úÖ Manufacturing defect detection  
‚úÖ Medical anomaly detection  
‚úÖ Outlier removal before modeling  
‚úÖ High-dimensional data  

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification, make_blobs
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, roc_curve
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("‚úÖ Libraries imported successfully!")

## Part 1: Understanding Isolation Forest
### 1Ô∏è‚É£ Create Synthetic Data with Anomalies

In [None]:
# Create normal data
np.random.seed(42)
n_samples = 300
n_outliers = 30

# Normal points (clustered)
X_normal = np.random.randn(n_samples, 2) * 0.5 + np.array([0, 0])

# Outliers (scattered far from normal points)
X_outliers = np.random.uniform(low=-4, high=4, size=(n_outliers, 2))

# Combine
X = np.vstack([X_normal, X_outliers])
y_true = np.hstack([np.zeros(n_samples), np.ones(n_outliers)])  # 0=normal, 1=anomaly

print(f"Dataset shape: {X.shape}")
print(f"Normal points: {n_samples} ({n_samples/(n_samples+n_outliers)*100:.1f}%)")
print(f"Anomalies: {n_outliers} ({n_outliers/(n_samples+n_outliers)*100:.1f}%)")

# Visualize
plt.figure(figsize=(12, 8))
plt.scatter(X_normal[:, 0], X_normal[:, 1], 
           c='blue', s=50, alpha=0.6, edgecolors='black', label='Normal')
plt.scatter(X_outliers[:, 0], X_outliers[:, 1],
           c='red', s=100, alpha=0.8, edgecolors='black', marker='x', 
           linewidths=2, label='Anomalies')
plt.xlabel('Feature 1', fontsize=12)
plt.ylabel('Feature 2', fontsize=12)
plt.title('Synthetic Dataset with Anomalies', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

### 2Ô∏è‚É£ Apply Isolation Forest

In [None]:
# Train Isolation Forest
# contamination = expected proportion of outliers
contamination_rate = n_outliers / (n_samples + n_outliers)

iso_forest = IsolationForest(
    contamination=contamination_rate,
    random_state=42,
    n_estimators=100
)

# Fit and predict
y_pred = iso_forest.fit_predict(X)
# Convert to 0/1 (IsolationForest returns 1 for normal, -1 for anomaly)
y_pred_binary = (y_pred == -1).astype(int)

# Get anomaly scores
anomaly_scores = iso_forest.score_samples(X)
# More negative = more anomalous

print(f"\nüìä Isolation Forest Results:")
print(f"Detected anomalies: {(y_pred == -1).sum()}")
print(f"True anomalies: {y_true.sum():.0f}")
print(f"\nAnomaly score range: [{anomaly_scores.min():.3f}, {anomaly_scores.max():.3f}]")
print(f"More negative = more anomalous")

# Evaluate
print(f"\nüìã Classification Report:")
print(classification_report(y_true, y_pred_binary, target_names=['Normal', 'Anomaly']))

# Confusion matrix
cm = confusion_matrix(y_true, y_pred_binary)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
           xticklabels=['Normal', 'Anomaly'],
           yticklabels=['Normal', 'Anomaly'],
           cbar_kws={'label': 'Count'})
plt.xlabel('Predicted Label', fontsize=12)
plt.ylabel('True Label', fontsize=12)
plt.title('Confusion Matrix - Isolation Forest', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

### 3Ô∏è‚É£ Visualize Anomaly Scores

In [None]:
# Create visualization
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Plot 1: Predictions
normal_mask = y_pred == 1
anomaly_mask = y_pred == -1

axes[0].scatter(X[normal_mask, 0], X[normal_mask, 1],
               c='blue', s=50, alpha=0.6, edgecolors='black', label='Predicted Normal')
axes[0].scatter(X[anomaly_mask, 0], X[anomaly_mask, 1],
               c='red', s=100, alpha=0.8, edgecolors='black', marker='x',
               linewidths=2, label='Predicted Anomaly')
axes[0].set_xlabel('Feature 1', fontsize=12)
axes[0].set_ylabel('Feature 2', fontsize=12)
axes[0].set_title('Isolation Forest Predictions', fontsize=14, fontweight='bold')
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)

# Plot 2: Anomaly scores (heatmap)
scatter = axes[1].scatter(X[:, 0], X[:, 1], c=anomaly_scores,
                         cmap='RdYlGn', s=80, alpha=0.7, edgecolors='black')
axes[1].set_xlabel('Feature 1', fontsize=12)
axes[1].set_ylabel('Feature 2', fontsize=12)
axes[1].set_title('Anomaly Scores\n(Red = Anomalous, Green = Normal)', 
                 fontsize=14, fontweight='bold')
plt.colorbar(scatter, ax=axes[1], label='Anomaly Score')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Distribution of anomaly scores
plt.figure(figsize=(12, 6))
plt.hist(anomaly_scores[y_true == 0], bins=30, alpha=0.6, 
        label='Normal Points', color='blue', edgecolor='black')
plt.hist(anomaly_scores[y_true == 1], bins=30, alpha=0.6,
        label='True Anomalies', color='red', edgecolor='black')
plt.xlabel('Anomaly Score', fontsize=12)
plt.ylabel('Frequency', fontsize=12)
plt.title('Distribution of Anomaly Scores', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

print("\nüí° Anomalies have more negative scores!")

### 4Ô∏è‚É£ Parameter Tuning: Contamination

In [None]:
# Test different contamination values
contamination_values = [0.01, 0.05, 0.1, 0.15, 0.2, 0.25]
results = []

for contam in contamination_values:
    iso = IsolationForest(contamination=contam, random_state=42, n_estimators=100)
    y_pred_temp = iso.fit_predict(X)
    y_pred_temp_binary = (y_pred_temp == -1).astype(int)
    
    # Calculate metrics
    from sklearn.metrics import precision_score, recall_score, f1_score
    
    precision = precision_score(y_true, y_pred_temp_binary, zero_division=0)
    recall = recall_score(y_true, y_pred_temp_binary, zero_division=0)
    f1 = f1_score(y_true, y_pred_temp_binary, zero_division=0)
    n_detected = (y_pred_temp == -1).sum()
    
    results.append({
        'Contamination': contam,
        'Detected': n_detected,
        'Precision': precision,
        'Recall': recall,
        'F1-Score': f1
    })

results_df = pd.DataFrame(results)
print("\nüìä Impact of Contamination Parameter:")
print(results_df.to_string(index=False))

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Metrics vs contamination
axes[0].plot(results_df['Contamination'], results_df['Precision'], 
            marker='o', linewidth=2, label='Precision', color='blue')
axes[0].plot(results_df['Contamination'], results_df['Recall'],
            marker='s', linewidth=2, label='Recall', color='green')
axes[0].plot(results_df['Contamination'], results_df['F1-Score'],
            marker='^', linewidth=2, label='F1-Score', color='red')
axes[0].axvline(x=contamination_rate, color='black', linestyle='--', 
               linewidth=2, label=f'True Rate ({contamination_rate:.2f})')
axes[0].set_xlabel('Contamination Parameter', fontsize=12)
axes[0].set_ylabel('Score', fontsize=12)
axes[0].set_title('Performance vs Contamination Parameter', fontsize=14, fontweight='bold')
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)

# Number detected
axes[1].bar(results_df['Contamination'].astype(str), results_df['Detected'],
           color='skyblue', edgecolor='black', alpha=0.7)
axes[1].axhline(y=n_outliers, color='red', linestyle='--', linewidth=2,
               label=f'True Anomalies ({n_outliers})')
axes[1].set_xlabel('Contamination Parameter', fontsize=12)
axes[1].set_ylabel('Number of Anomalies Detected', fontsize=12)
axes[1].set_title('Detected Anomalies vs Contamination', fontsize=14, fontweight='bold')
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print(f"\nüí° Best F1-Score: {results_df.loc[results_df['F1-Score'].idxmax(), 'F1-Score']:.4f}")
print(f"üí° At contamination: {results_df.loc[results_df['F1-Score'].idxmax(), 'Contamination']}")

## Part 2: Real-World Application - Credit Card Fraud Detection
### 5Ô∏è‚É£ Simulate Credit Card Transactions

In [None]:
# Create synthetic credit card transaction data
np.random.seed(42)
n_transactions = 10000
n_fraud = 200  # 2% fraud rate (realistic)

# Normal transactions
normal_amount = np.random.gamma(shape=2, scale=50, size=n_transactions)
normal_time = np.random.uniform(0, 24, size=n_transactions)
normal_merchant_cat = np.random.choice([1, 2, 3, 4, 5], size=n_transactions, 
                                      p=[0.3, 0.25, 0.2, 0.15, 0.1])
normal_location = np.random.normal(0, 1, size=n_transactions)

# Fraudulent transactions (different patterns)
fraud_amount = np.random.uniform(500, 2000, size=n_fraud)  # Large amounts
fraud_time = np.random.uniform(0, 6, size=n_fraud)  # Late night
fraud_merchant_cat = np.random.choice([4, 5], size=n_fraud)  # Unusual categories
fraud_location = np.random.normal(5, 2, size=n_fraud)  # Unusual locations

# Combine
df_transactions = pd.DataFrame({
    'Amount': np.concatenate([normal_amount, fraud_amount]),
    'Time_of_Day': np.concatenate([normal_time, fraud_time]),
    'Merchant_Category': np.concatenate([normal_merchant_cat, fraud_merchant_cat]),
    'Location_Score': np.concatenate([normal_location, fraud_location]),
    'Is_Fraud': np.concatenate([np.zeros(n_transactions), np.ones(n_fraud)])
})

# Shuffle
df_transactions = df_transactions.sample(frac=1, random_state=42).reset_index(drop=True)

print(f"Total transactions: {len(df_transactions)}")
print(f"Fraud rate: {df_transactions['Is_Fraud'].mean()*100:.2f}%")
print(f"\nüìä Transaction Statistics:")
print(df_transactions.groupby('Is_Fraud').describe().T)

# Visualize
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Amount distribution
axes[0, 0].hist(df_transactions[df_transactions['Is_Fraud']==0]['Amount'], 
               bins=50, alpha=0.6, label='Normal', color='blue', edgecolor='black')
axes[0, 0].hist(df_transactions[df_transactions['Is_Fraud']==1]['Amount'],
               bins=50, alpha=0.6, label='Fraud', color='red', edgecolor='black')
axes[0, 0].set_xlabel('Transaction Amount ($)', fontsize=11)
axes[0, 0].set_ylabel('Frequency', fontsize=11)
axes[0, 0].set_title('Transaction Amount Distribution', fontsize=12, fontweight='bold')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3, axis='y')

# Time distribution
axes[0, 1].hist(df_transactions[df_transactions['Is_Fraud']==0]['Time_of_Day'],
               bins=24, alpha=0.6, label='Normal', color='blue', edgecolor='black')
axes[0, 1].hist(df_transactions[df_transactions['Is_Fraud']==1]['Time_of_Day'],
               bins=24, alpha=0.6, label='Fraud', color='red', edgecolor='black')
axes[0, 1].set_xlabel('Time of Day (hour)', fontsize=11)
axes[0, 1].set_ylabel('Frequency', fontsize=11)
axes[0, 1].set_title('Transaction Time Distribution', fontsize=12, fontweight='bold')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3, axis='y')

# Scatter: Amount vs Time
axes[1, 0].scatter(df_transactions[df_transactions['Is_Fraud']==0]['Amount'],
                  df_transactions[df_transactions['Is_Fraud']==0]['Time_of_Day'],
                  alpha=0.3, s=20, label='Normal', color='blue')
axes[1, 0].scatter(df_transactions[df_transactions['Is_Fraud']==1]['Amount'],
                  df_transactions[df_transactions['Is_Fraud']==1]['Time_of_Day'],
                  alpha=0.8, s=50, label='Fraud', color='red', marker='x', linewidths=2)
axes[1, 0].set_xlabel('Transaction Amount ($)', fontsize=11)
axes[1, 0].set_ylabel('Time of Day (hour)', fontsize=11)
axes[1, 0].set_title('Amount vs Time', fontsize=12, fontweight='bold')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Fraud rate by merchant category
fraud_by_cat = df_transactions.groupby('Merchant_Category')['Is_Fraud'].mean()
axes[1, 1].bar(fraud_by_cat.index, fraud_by_cat.values * 100,
              color='coral', edgecolor='black', alpha=0.7)
axes[1, 1].set_xlabel('Merchant Category', fontsize=11)
axes[1, 1].set_ylabel('Fraud Rate (%)', fontsize=11)
axes[1, 1].set_title('Fraud Rate by Merchant Category', fontsize=12, fontweight='bold')
axes[1, 1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

### 6Ô∏è‚É£ Apply Isolation Forest for Fraud Detection

In [None]:
# Prepare features
X_fraud = df_transactions[['Amount', 'Time_of_Day', 'Merchant_Category', 'Location_Score']].values
y_fraud_true = df_transactions['Is_Fraud'].values

# Scale features
scaler = StandardScaler()
X_fraud_scaled = scaler.fit_transform(X_fraud)

# Train Isolation Forest
fraud_rate = y_fraud_true.mean()
iso_fraud = IsolationForest(
    contamination=fraud_rate,
    random_state=42,
    n_estimators=100
)

# Predict
y_fraud_pred = iso_fraud.fit_predict(X_fraud_scaled)
y_fraud_pred_binary = (y_fraud_pred == -1).astype(int)

# Get fraud scores
fraud_scores = iso_fraud.score_samples(X_fraud_scaled)
df_transactions['Fraud_Score'] = fraud_scores
df_transactions['Predicted_Fraud'] = y_fraud_pred_binary

print(f"\nüìä Fraud Detection Results:")
print(f"Total transactions: {len(df_transactions)}")
print(f"True fraud cases: {y_fraud_true.sum():.0f}")
print(f"Detected fraud cases: {y_fraud_pred_binary.sum()}")
print(f"\nüìã Classification Report:")
print(classification_report(y_fraud_true, y_fraud_pred_binary, 
                          target_names=['Legitimate', 'Fraud']))

# Confusion matrix
cm_fraud = confusion_matrix(y_fraud_true, y_fraud_pred_binary)

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Confusion matrix
sns.heatmap(cm_fraud, annot=True, fmt='d', cmap='Reds', ax=axes[0],
           xticklabels=['Legitimate', 'Fraud'],
           yticklabels=['Legitimate', 'Fraud'],
           cbar_kws={'label': 'Count'})
axes[0].set_xlabel('Predicted Label', fontsize=12)
axes[0].set_ylabel('True Label', fontsize=12)
axes[0].set_title('Confusion Matrix - Fraud Detection', fontsize=14, fontweight='bold')

# ROC Curve
# Convert scores to probabilities (more negative = higher fraud probability)
fraud_proba = -fraud_scores  # Invert so higher = more fraudulent
fpr, tpr, thresholds = roc_curve(y_fraud_true, fraud_proba)
roc_auc = roc_auc_score(y_fraud_true, fraud_proba)

axes[1].plot(fpr, tpr, linewidth=2, label=f'Isolation Forest (AUC = {roc_auc:.3f})', color='red')
axes[1].plot([0, 1], [0, 1], 'k--', linewidth=2, label='Random Classifier')
axes[1].set_xlabel('False Positive Rate', fontsize=12)
axes[1].set_ylabel('True Positive Rate', fontsize=12)
axes[1].set_title('ROC Curve - Fraud Detection', fontsize=14, fontweight='bold')
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nüéØ ROC-AUC Score: {roc_auc:.4f}")

### 7Ô∏è‚É£ Analyze Detected Fraud Cases

In [None]:
# Top suspicious transactions
top_suspicious = df_transactions.nsmallest(20, 'Fraud_Score')

print("\nüö® Top 20 Most Suspicious Transactions:")
print(top_suspicious[['Amount', 'Time_of_Day', 'Merchant_Category', 
                      'Location_Score', 'Fraud_Score', 'Is_Fraud', 'Predicted_Fraud']])

# Analyze false positives and false negatives
false_positives = df_transactions[(df_transactions['Is_Fraud'] == 0) & 
                                 (df_transactions['Predicted_Fraud'] == 1)]
false_negatives = df_transactions[(df_transactions['Is_Fraud'] == 1) & 
                                 (df_transactions['Predicted_Fraud'] == 0)]

print(f"\n‚ùå False Positives: {len(false_positives)}")
print(f"‚ùå False Negatives: {len(false_negatives)}")

if len(false_positives) > 0:
    print(f"\nüìä False Positive Characteristics:")
    print(false_positives[['Amount', 'Time_of_Day', 'Merchant_Category', 'Location_Score']].describe())

if len(false_negatives) > 0:
    print(f"\nüìä False Negative Characteristics:")
    print(false_negatives[['Amount', 'Time_of_Day', 'Merchant_Category', 'Location_Score']].describe())

# Visualize fraud score distribution
plt.figure(figsize=(14, 6))
plt.hist(df_transactions[df_transactions['Is_Fraud']==0]['Fraud_Score'],
        bins=50, alpha=0.6, label='Legitimate', color='blue', edgecolor='black')
plt.hist(df_transactions[df_transactions['Is_Fraud']==1]['Fraud_Score'],
        bins=50, alpha=0.6, label='Fraud', color='red', edgecolor='black')
plt.xlabel('Fraud Score (more negative = more suspicious)', fontsize=12)
plt.ylabel('Frequency', fontsize=12)
plt.title('Distribution of Fraud Scores', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

## üìä Key Takeaways

### Isolation Forest Advantages:
‚úÖ **Fast and scalable** - O(n) time complexity  
‚úÖ **No distance metrics** - Works well in high dimensions  
‚úÖ **Few parameters** - Only contamination and n_estimators  
‚úÖ **Unsupervised** - No labels needed for training  
‚úÖ **Interpretable scores** - Anomaly scores are intuitive  
‚úÖ **Handles outliers well** - Designed for anomaly detection  

### Isolation Forest Disadvantages:
‚ùå **Requires contamination estimate** - Need to know approximate outlier rate  
‚ùå **Not for clustered anomalies** - Works best for scattered outliers  
‚ùå **Sensitive to contamination** - Wrong value affects performance  
‚ùå **No probability estimates** - Only anomaly scores  

### Parameter Guidelines:

**contamination:**
- Expected proportion of outliers in dataset
- Default: 0.1 (10%)
- Too low: Misses anomalies
- Too high: Too many false positives
- **Recommendation:** Use domain knowledge or cross-validation

**n_estimators:**
- Number of isolation trees
- Default: 100
- More trees: More stable, slower
- **Recommendation:** 100-200 for most cases

**max_samples:**
- Number of samples to draw for each tree
- Default: 'auto' (min(256, n_samples))
- Smaller: Faster, less accurate
- **Recommendation:** Keep default

### When to Use Isolation Forest:
‚úÖ Fraud detection (credit cards, insurance)  
‚úÖ Network intrusion detection  
‚úÖ Manufacturing defect detection  
‚úÖ Medical anomaly detection  
‚úÖ Outlier removal before modeling  
‚úÖ High-dimensional data  
‚úÖ Large datasets (scalable)  

### When NOT to Use:
‚ùå Anomalies form clusters (use DBSCAN)  
‚ùå Need probability estimates (use One-Class SVM)  
‚ùå Very small datasets (< 100 samples)  
‚ùå All data points are important (no outliers)  

### Real-World Applications:
1. **Credit Card Fraud** - Detect unusual transactions
2. **Network Security** - Identify intrusions
3. **Manufacturing** - Find defective products
4. **Healthcare** - Detect unusual patient vitals
5. **E-commerce** - Identify fake reviews
6. **IoT** - Detect sensor anomalies

### Best Practices:

1. **Scale your data** - Isolation Forest benefits from scaling
   ```python
   scaler = StandardScaler()
   X_scaled = scaler.fit_transform(X)
   ```

2. **Tune contamination** - Use domain knowledge or validation
   ```python
   # If you know ~2% are anomalies
   iso = IsolationForest(contamination=0.02)
   ```

3. **Use anomaly scores** - Don't just use binary predictions
   ```python
   scores = iso.score_samples(X)
   # More negative = more anomalous
   ```

4. **Validate results** - Check detected anomalies make sense

5. **Combine with other methods** - Ensemble for better results

### Comparison with Other Methods:

| Method | Speed | High-D | Scalability | Interpretability |
|--------|-------|--------|-------------|------------------|
| **Isolation Forest** | Fast | Good | Excellent | Medium |
| **One-Class SVM** | Slow | Poor | Poor | Low |
| **LOF** | Slow | Poor | Poor | Medium |
| **DBSCAN** | Medium | Poor | Medium | High |
| **Autoencoder** | Medium | Excellent | Good | Low |

### Isolation Forest Workflow:
```python
# 1. Prepare data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# 2. Estimate contamination
contamination = 0.05  # 5% outliers expected

# 3. Train model
iso = IsolationForest(
    contamination=contamination,
    n_estimators=100,
    random_state=42
)
predictions = iso.fit_predict(X_scaled)

# 4. Get anomaly scores
scores = iso.score_samples(X_scaled)

# 5. Analyze results
anomalies = X[predictions == -1]
```

### Performance Metrics:
- **Precision**: Of detected anomalies, how many are true?
- **Recall**: Of true anomalies, how many detected?
- **F1-Score**: Balance between precision and recall
- **ROC-AUC**: Overall discrimination ability

### Next Steps:
1. Apply to your own anomaly detection problem
2. Try ensemble with other anomaly detectors
3. Experiment with different contamination values
4. Compare with autoencoders for complex data
5. Deploy for real-time fraud detection