# Anomaly Detection: Network Intrusion Detection using PyCaret

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/BalaAnbalagan/pycaret-automl-examples/blob/main/anomaly-detection/network_intrusion_detection.ipynb)

## Problem Statement

Cybersecurity threats are constantly evolving. Network intrusion detection systems (NIDS) must identify anomalous network traffic that could indicate attacks, malware, or unauthorized access. This is an **unsupervised anomaly detection** problem - we don't have labels for all attack types, but we need to flag unusual behavior.

## Business Value

- **Cybersecurity**: Detect zero-day attacks and unknown threats
- **Network Operations**: Identify performance anomalies
- **Compliance**: Meet security monitoring requirements
- **Incident Response**: Early warning system for breaches
- **Cost Savings**: Prevent data breaches and downtime

## Dataset Information

**Source**: [Kaggle - Network Intrusion Detection (2024)](https://www.kaggle.com/datasets/bcccdatasets/network-intrusion-detection)

**Original**: Based on CIC-IDS-2017 dataset

**Features**: Network traffic characteristics including:
- Flow duration
- Packet counts and sizes
- Bytes per second
- Flag counts (SYN, ACK, FIN, etc.)
- Protocol information

**Task**: Identify anomalous network flows (potential attacks)

## What You Will Learn

1. **Anomaly Detection**: Finding outliers in data
2. **Unsupervised Learning**: No labeled attacks needed
3. **Multiple Algorithms**: Isolation Forest, LOF, One-Class SVM
4. **Anomaly Scoring**: Ranking suspicious activity
5. **Threshold Selection**: Balancing false positives vs false negatives
6. **Cybersecurity Application**: Real-world threat detection
7. **Model Deployment**: Production intrusion detection

---

## Cell 1: Install and Import Required Libraries (Google Colab Compatible)

### What
We're installing PyCaret with compatible dependencies for Google Colab and importing all necessary Python libraries for our analysis.

### Why
Google Colab comes with pre-installed packages that can conflict with PyCaret's dependencies. This cell ensures compatibility by installing packages in the correct order to avoid runtime crashes.

### Technical Details
- Detect if running in Google Colab
- Install compatible versions of base packages (numpy, pandas, scipy, scikit-learn)
- Install PyCaret without forcing full dependency resolution
- Avoid version conflicts that cause runtime crashes

### Expected Output
Installation progress messages and a reminder to restart the runtime. After restart, the notebook will work smoothly without dependency errors.

### IMPORTANT
⚠️ After running this cell, you MUST restart the runtime:
- Click: **Runtime → Restart runtime** (or Ctrl+M .)
- After restart, skip this cell and run all other cells normally

In [None]:
# ============================================================
# INSTALLATION CELL - Google Colab Compatible
# ============================================================
# This cell fixes dependency conflicts that cause runtime crashes

import sys

# Check if running in Colab
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("=" * 60)
    print("🔧 Google Colab Detected")
    print("=" * 60)
    print("📦 Installing PyCaret with compatible dependencies...")
    print("⏳ This will take 2-3 minutes, please be patient...")

    # Upgrade pip first
    !pip install -q --upgrade pip

    # Install compatible base packages FIRST (prevents conflicts)
    print("Step 1/3: Installing base packages with compatible versions...")
    !pip install -q --upgrade \
        numpy>=1.23.0,<2.0.0 \
        pandas>=2.0.0,<2.3.0 \
        scipy>=1.10.0,<1.14.0 \
        scikit-learn>=1.3.0,<1.6.0 \
        matplotlib>=3.7.0,<3.9.0

    # Install PyCaret (will use already installed base packages)
    print("Step 2/3: Installing PyCaret...")
    !pip install -q pycaret

    # Install additional ML packages
    print("Step 3/3: Installing additional ML packages...")
    !pip install -q \
        category-encoders \
        lightgbm \
        xgboost \
        catboost \
        optuna \
        plotly \
        kaleido

    print("" + "=" * 60)
    print("✅ Installation Complete!")
    print("=" * 60)
    print("⚠️  CRITICAL: You MUST restart the runtime now!")
    print("   👉 Click: Runtime → Restart runtime (or Ctrl+M .)")
    print("🔄 After restart:")
    print("   1. Skip this installation cell")
    print("   2. Run all other cells normally")
    print("   3. Everything will work without crashes!")
    print("=" * 60)

else:
    print("=" * 60)
    print("📍 Local Environment Detected")
    print("=" * 60)
    print("Installing standard PyCaret with full dependencies...")
    !pip install pycaret[full]
    print("✅ Installation complete!")
    print("=" * 60)

# Import libraries after installation
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

print("📚 Libraries imported successfully!")
print(f"   - Pandas version: {pd.__version__}")
print(f"   - NumPy version: {np.__version__}")

---

## Cell 2: Load Network Traffic Dataset

### What
Loading network traffic data for anomaly detection.

### Why
For demonstration, we'll use a sample dataset. In production:
- Real-time network packet capture
- Feature extraction from flow data
- Continuous monitoring

### Technical Details
Network traffic features capture flow characteristics that can reveal attacks.

### Expected Output
Dataset loaded with network flow features.

In [None]:
# For this demo, we'll create a synthetic network traffic dataset
# In production, use actual network flow data

np.random.seed(42)

# Generate normal network traffic (90%)
n_normal = 900
normal_data = {
    'flow_duration': np.random.normal(120000, 30000, n_normal),
    'total_fwd_packets': np.random.poisson(50, n_normal),
    'total_bwd_packets': np.random.poisson(45, n_normal),
    'flow_bytes_per_sec': np.random.normal(5000, 1000, n_normal),
    'flow_packets_per_sec': np.random.normal(100, 20, n_normal),
    'fwd_header_length': np.random.normal(200, 50, n_normal),
    'bwd_header_length': np.random.normal(180, 40, n_normal)
}

# Generate anomalous traffic (10% - attacks, scans, etc.)
n_anomaly = 100
anomaly_data = {
    'flow_duration': np.random.normal(5000, 2000, n_anomaly),  # Very short
    'total_fwd_packets': np.random.poisson(200, n_anomaly),  # Unusually high
    'total_bwd_packets': np.random.poisson(5, n_anomaly),  # Unusually low
    'flow_bytes_per_sec': np.random.normal(50000, 10000, n_anomaly),  # Very high
    'flow_packets_per_sec': np.random.normal(500, 100, n_anomaly),  # Very high
    'fwd_header_length': np.random.normal(400, 100, n_anomaly),  # High
    'bwd_header_length': np.random.normal(50, 20, n_anomaly)  # Low
}

# Combine and create DataFrame
df_normal = pd.DataFrame(normal_data)
df_anomaly = pd.DataFrame(anomaly_data)
df = pd.concat([df_normal, df_anomaly], ignore_index=True)

# Shuffle
df = df.sample(frac=1, random_state=42).reset_index(drop=True)

# Add true labels (for evaluation only - not used in training)
true_labels = [0]*n_normal + [1]*n_anomaly
df['true_anomaly'] = true_labels
df = df.sample(frac=1, random_state=42).reset_index(drop=True)

print("Network traffic dataset created!")
print(f"\nShape: {df.shape[0]} network flows, {df.shape[1]-1} features")
print(f"\nTrue distribution (for evaluation):")
print(f"- Normal traffic: {(df['true_anomaly']==0).sum()}")
print(f"- Anomalous traffic: {(df['true_anomaly']==1).sum()}")
print("\nFirst 5 rows:")
df.head()

---

## Cell 3: Exploratory Data Analysis

### What
Exploring network traffic patterns to understand normal vs anomalous behavior.

### Why
Understanding data helps:
- Identify features that distinguish anomalies
- Set realistic expectations
- Guide algorithm selection

### Technical Details
We'll visualize distributions to see if anomalies are visually distinct.

### Expected Output
Statistical summary and visualizations.

In [None]:
print("=" * 60)
print("NETWORK TRAFFIC ANALYSIS")
print("=" * 60)

# Remove true_anomaly for unsupervised analysis
features_df = df.drop('true_anomaly', axis=1)

print("\nStatistical Summary:")
display(features_df.describe())

# Visualize distributions
fig, axes = plt.subplots(2, 4, figsize=(18, 10))
axes = axes.ravel()

for idx, col in enumerate(features_df.columns):
    axes[idx].hist(features_df[col], bins=50, alpha=0.7, edgecolor='black')
    axes[idx].set_title(col, fontsize=10, fontweight='bold')
    axes[idx].set_xlabel('Value')
    axes[idx].set_ylabel('Frequency')
    axes[idx].grid(alpha=0.3)

# Remove extra subplot
fig.delaxes(axes[7])

plt.tight_layout()
plt.show()

print("\nKey Observations:")
print("- Most traffic follows normal patterns (main distribution)")
print("- Some outliers visible (potential anomalies)")
print("- Different features show different spread")
print("- Anomaly detection will flag unusual combinations")

---

## Cell 4: PyCaret Setup for Anomaly Detection

### What
Initializing PyCaret's anomaly detection environment.

### Why
Anomaly detection setup prepares data for:
- Outlier identification
- Anomaly scoring
- Threshold-based flagging

### Technical Details
Like clustering, anomaly detection is unsupervised:
- No target variable
- No train/test split (use all data)
- Normalization important

### Expected Output
Setup summary for anomaly detection.

In [None]:
from pycaret.anomaly import *

print("=" * 60)
print("PYCARET SETUP - ANOMALY DETECTION")
print("=" * 60)
print("\nConfiguring unsupervised anomaly detection...\n")

# Setup (exclude true_anomaly from training)
anomaly_setup = setup(
    data=features_df,
    normalize=True,
    session_seed=42,
    verbose=True
)

print("\n" + "=" * 60)
print("✓ Anomaly detection setup complete!")
print("=" * 60)
print("\nKey Points:")
print("- UNSUPERVISED: No labels used in training")
print("- GOAL: Flag unusual network flows")
print("- ASSUMPTION: Most traffic is normal, anomalies are rare")
print("\nReady to detect network intrusions!")

---

## Cell 5: Create Anomaly Detection Models

### What
Creating multiple anomaly detection models with different algorithms.

### Why
Different algorithms detect different types of anomalies:

**Isolation Forest**:
- Fast and scalable
- Isolates anomalies using random trees
- Good for high-dimensional data

**LOF (Local Outlier Factor)**:
- Density-based detection
- Compares local density to neighbors
- Good for varying density clusters

**One-Class SVM**:
- Learns boundary around normal data
- Flags points outside boundary
- Good for well-defined normal region

### Technical Details
Each algorithm assigns anomaly scores and labels.

### Expected Output
Models created with anomaly detection results.

In [None]:
print("=" * 60)
print("CREATING ANOMALY DETECTION MODELS")
print("=" * 60)

# Isolation Forest
print("\n1. Isolation Forest")
print("   - Fast, scalable algorithm")
print("   - Isolates anomalies using random trees")
iforest = create_model('iforest', fraction=0.1)

# LOF (Local Outlier Factor)
print("\n2. Local Outlier Factor (LOF)")
print("   - Density-based detection")
print("   - Compares local density to neighbors")
lof = create_model('lof', fraction=0.1)

# One-Class SVM
print("\n3. One-Class SVM")
print("   - Learns boundary around normal data")
print("   - Flags outliers outside boundary")
ocsvm = create_model('svm', fraction=0.1)

print("\n" + "=" * 60)
print("All models created!")
print("=" * 60)
print("\nNote: 'fraction=0.1' means expect ~10% anomalies")
print("Adjust based on your network's typical anomaly rate")

---

## Cell 6: Assign Anomaly Labels and Scores

### What
Assigning anomaly labels (0=normal, 1=anomaly) and scores to each network flow.

### Why
Anomaly detection output includes:
- **Binary label**: Normal (0) or Anomaly (1)
- **Anomaly score**: Continuous score (higher = more anomalous)
- **Decision function**: Distance from normal region

### Technical Details
Scores allow ranking:
- Investigate highest-scoring flows first
- Adjust thresholds based on resources
- Create priority alerts

### Expected Output
Dataset with anomaly predictions and scores.

In [None]:
print("=" * 60)
print("ASSIGNING ANOMALY LABELS AND SCORES")
print("=" * 60)

# Assign predictions (using Isolation Forest)
predictions = assign_model(iforest)

# Add true labels for evaluation
predictions['true_anomaly'] = df['true_anomaly'].values

print("\nPredictions completed!")
print(f"\nColumns added:")
print("- Anomaly: Binary label (0=Normal, 1=Anomaly)")
print("- Anomaly_Score: Continuous score (higher = more suspicious)")

print("\n" + "=" * 60)
print("DETECTION RESULTS")
print("=" * 60)
print(f"\nTotal flows analyzed: {len(predictions)}")
print(f"Flagged as anomalies: {(predictions['Anomaly']==1).sum()}")
print(f"Marked as normal: {(predictions['Anomaly']==0).sum()}")

print("\nSample predictions (sorted by anomaly score):")
display(predictions[['flow_duration', 'total_fwd_packets', 'Anomaly', 'Anomaly_Score', 'true_anomaly']]
        .sort_values('Anomaly_Score', ascending=False).head(10))

---

## Cell 7: Evaluate Detection Performance

### What
Evaluating how well our model detects true anomalies (since we have ground truth for this demo).

### Why
In production, we don't have labels, but for this demo:
- Check if model catches real attacks
- Understand false positive rate
- Validate approach before deployment

### Technical Details
Key metrics:
- **Precision**: Of flagged flows, how many are truly anomalous?
- **Recall**: Of true anomalies, how many did we catch?
- **F1-Score**: Balance between precision and recall

### Expected Output
Confusion matrix and performance metrics.

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

print("=" * 60)
print("ANOMALY DETECTION PERFORMANCE")
print("=" * 60)
print("\nNote: In production, we don't have true labels.")
print("This evaluation is possible because we created synthetic data.\n")

# Classification report
print(classification_report(predictions['true_anomaly'], predictions['Anomaly'],
                          target_names=['Normal', 'Anomaly']))

# Confusion matrix
cm = confusion_matrix(predictions['true_anomaly'], predictions['Anomaly'])

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Normal', 'Anomaly'],
            yticklabels=['Normal', 'Anomaly'])
plt.title('Confusion Matrix - Anomaly Detection', fontsize=14, fontweight='bold')
plt.ylabel('True Label', fontsize=12)
plt.xlabel('Predicted Label', fontsize=12)
plt.tight_layout()
plt.show()

print("\n" + "=" * 60)
print("CONFUSION MATRIX BREAKDOWN")
print("=" * 60)
tn, fp, fn, tp = cm.ravel()
print(f"\nTrue Negatives (TN):  {tn} - Correctly identified normal traffic")
print(f"False Positives (FP): {fp} - False alarms (normal flagged as anomaly)")
print(f"False Negatives (FN): {fn} - Missed attacks (anomaly marked as normal) ⚠️")
print(f"True Positives (TP):  {tp} - Correctly caught anomalies ✓")

print("\n" + "=" * 60)
print("CYBERSECURITY PERSPECTIVE")
print("=" * 60)
print("\n- False Positives (FP): Alert fatigue, wasted investigation time")
print("- False Negatives (FN): Missed threats - MOST CRITICAL!")
print("\nTrade-off: Lower threshold = More FP but fewer FN (catch more attacks)")

---

## Cell 8: Visualize Anomalies

### What
Visualizing detected anomalies in 2D space using dimensionality reduction.

### Why
Helps understand:
- Where anomalies lie in feature space
- If they form patterns
- Model behavior

### Technical Details
PCA or t-SNE reduces dimensions for visualization.

### Expected Output
2D plot showing normal vs anomalous flows.

In [None]:
print("=" * 60)
print("VISUALIZING ANOMALIES IN 2D")
print("=" * 60)

# PyCaret's built-in visualization
plot_model(iforest, plot='tsne')

print("\n" + "=" * 60)
print("INTERPRETING THE VISUALIZATION")
print("=" * 60)
print("\n- Blue points: Normal network traffic")
print("- Red/Yellow points: Detected anomalies")
print("- t-SNE reduces 7 dimensions to 2D")
print("\nGood detection shows:")
print("- Anomalies at edges/outside main cluster")
print("- Clear separation from normal traffic")
print("- Some clustering of similar attack types")

---

## Cell 9: Analyze Top Anomalies

### What
Examining the most suspicious network flows for investigation.

### Why
In production:
- Security teams investigate top-scoring flows first
- Limited resources require prioritization
- Understanding patterns helps create rules

### Technical Details
Sort by anomaly score to get most suspicious flows.

### Expected Output
List of highest-scoring anomalies with characteristics.

In [None]:
print("=" * 60)
print("TOP 10 MOST SUSPICIOUS NETWORK FLOWS")
print("=" * 60)

# Get top anomalies
top_anomalies = predictions[predictions['Anomaly']==1].sort_values('Anomaly_Score', ascending=False).head(10)

print("\nThese flows should be investigated first:\n")
display(top_anomalies[['flow_duration', 'total_fwd_packets', 'total_bwd_packets',
                       'flow_bytes_per_sec', 'Anomaly_Score', 'true_anomaly']])

print("\n" + "=" * 60)
print("CHARACTERISTICS OF DETECTED ANOMALIES")
print("=" * 60)

anomalies_only = predictions[predictions['Anomaly']==1].drop(['Anomaly', 'Anomaly_Score', 'true_anomaly'], axis=1)
normal_only = predictions[predictions['Anomaly']==0].drop(['Anomaly', 'Anomaly_Score', 'true_anomaly'], axis=1)

comparison = pd.DataFrame({
    'Feature': anomalies_only.columns,
    'Normal_Mean': normal_only.mean().values,
    'Anomaly_Mean': anomalies_only.mean().values
})
comparison['Difference_%'] = ((comparison['Anomaly_Mean'] - comparison['Normal_Mean']) / 
                               comparison['Normal_Mean'] * 100).round(1)

print("\nAverage values: Normal vs Anomalous traffic")
display(comparison)

print("\nKey patterns in anomalies:")
for idx, row in comparison.iterrows():
    if abs(row['Difference_%']) > 50:
        direction = "higher" if row['Difference_%'] > 0 else "lower"
        print(f"- {row['Feature']}: {abs(row['Difference_%']):.1f}% {direction} than normal")

---

## Cell 10: Save Anomaly Detection Model

### What
Saving the trained anomaly detection model for deployment.

### Why
Production deployment:
- Real-time network monitoring
- Continuous threat detection
- Automated alerting
- Integration with SIEM systems

### Technical Details
Model can score new flows in real-time.

### Expected Output
Saved model ready for production use.

In [None]:
print("=" * 60)
print("SAVING ANOMALY DETECTION MODEL")
print("=" * 60)

model_name = 'network_intrusion_detector'
save_model(iforest, model_name)

print(f"\n✓ Model saved as '{model_name}.pkl'")

print("\n" + "=" * 60)
print("DEPLOYMENT ARCHITECTURE")
print("=" * 60)
print("\n1. Network Traffic Capture")
print("   ↓ Extract flow features")
print("2. Feature Engineering")
print("   ↓ Normalize, transform")
print("3. Anomaly Detection Model")
print("   ↓ Score each flow")
print("4. Alert System")
print("   ↓ High scores trigger alerts")
print("5. Security Team Investigation")

print("\n" + "=" * 60)
print("PRODUCTION CONSIDERATIONS")
print("=" * 60)
print("\n- Set appropriate anomaly threshold based on team capacity")
print("- Implement alert prioritization (score-based)")
print("- Regular model retraining with new normal traffic")
print("- Feedback loop: Confirmed attacks improve model")
print("- Integration with firewall for automatic blocking")

print("\n" + "=" * 60)
print("TO USE THE MODEL")
print("=" * 60)
print("\n```python")
print("from pycaret.anomaly import load_model, predict_model")
print(f"model = load_model('{model_name}')")
print("predictions = predict_model(model, data=new_traffic)")
print("suspicious = predictions[predictions['Anomaly']==1]")
print("```")

---

## Conclusions and Key Takeaways

### What We Accomplished

1. **Anomaly Detection**: Identified unusual network traffic patterns
2. **Unsupervised Learning**: No labeled attacks needed for training
3. **Multiple Algorithms**: Compared Isolation Forest, LOF, One-Class SVM
4. **Anomaly Scoring**: Ranked suspicious flows for investigation
5. **Production Ready**: Model ready for real-time intrusion detection

### Key Learnings

#### Anomaly Detection vs Other ML Tasks

| Aspect | Classification | Clustering | Anomaly Detection |
|--------|---------------|-----------|------------------|
| Goal | Predict labels | Find groups | Find outliers |
| Labels | Required | None | None |
| Output | Class | Cluster ID | Anomaly score |
| Assumption | Balanced classes | Natural groups | Most data is normal |
| Use Case | Diagnosis | Segmentation | Fraud, intrusion |

#### Technical Skills
- **Isolation Forest**: Fast, scalable outlier detection
- **LOF**: Density-based anomaly detection
- **One-Class SVM**: Boundary-based detection
- **Anomaly Scoring**: Continuous scores for prioritization
- **Threshold Selection**: Balancing false positives vs false negatives

#### Cybersecurity Applications
- **Network Intrusion Detection**: Flag suspicious traffic
- **Zero-Day Attacks**: Detect unknown threats
- **Insider Threats**: Unusual user behavior
- **DDoS Detection**: Abnormal traffic patterns
- **Malware Detection**: Anomalous system behavior

### Anomaly Detection Algorithms

**Isolation Forest**:
- **How**: Isolates anomalies using random trees
- **Strengths**: Fast, scalable, handles high dimensions
- **Best for**: Large datasets, real-time detection

**Local Outlier Factor (LOF)**:
- **How**: Compares local density to neighbors
- **Strengths**: Good for varying density regions
- **Best for**: Complex data with multiple normal patterns

**One-Class SVM**:
- **How**: Learns boundary around normal data
- **Strengths**: Effective for well-defined normal region
- **Best for**: High-dimensional data with clear normal zone

### Business Value

1. **Security**:
   - Early threat detection
   - Reduced dwell time
   - Proactive defense

2. **Cost Savings**:
   - Prevent data breaches ($millions)
   - Reduce manual monitoring
   - Minimize downtime

3. **Compliance**:
   - Meet security monitoring requirements
   - Audit trail of threats
   - Demonstrate due diligence

### Challenges and Considerations

1. **False Positives**:
   - Alert fatigue if too many
   - Wasted investigation time
   - Need to tune threshold

2. **False Negatives**:
   - Missed attacks are costly
   - More dangerous than false positives
   - Balance with other security layers

3. **Concept Drift**:
   - Normal traffic patterns change
   - New attack types emerge
   - Regular retraining needed

### Production Deployment

**Real-time Pipeline**:
1. Capture network packets
2. Extract flow features
3. Normalize/preprocess
4. Score with model
5. Alert if anomaly
6. Investigate and respond

**Best Practices**:
- Start with conservative threshold (fewer alerts)
- Gradually tune based on investigation feedback
- Implement tiered alerting (critical/medium/low)
- Integrate with SIEM for correlation
- Regular model updates with new normal traffic
- Maintain human-in-the-loop for critical decisions

### Limitations

1. **Training Data Quality**:
   - Assumes training data is mostly normal
   - Contaminated training = poor detection

2. **Novel Attacks**:
   - Can miss sophisticated attacks that mimic normal traffic
   - Should be one layer in defense-in-depth strategy

3. **Feature Engineering**:
   - Quality of features crucial
   - Domain expertise needed

### Future Enhancements

1. **Deep Learning**: Neural networks for complex patterns
2. **Ensemble Methods**: Combine multiple detectors
3. **Temporal Analysis**: Sequence-based detection
4. **Contextual Features**: User, time, location context
5. **Active Learning**: Feedback from investigations

### Resources

- [PyCaret Anomaly Detection](https://pycaret.gitbook.io/docs/get-started/tutorials/anomaly-detection)
- [Scikit-learn Outlier Detection](https://scikit-learn.org/stable/modules/outlier_detection.html)
- [Network Intrusion Detection](https://www.coursera.org/)

---

**Author**: Bala Anbalagan  
**Date**: January 2025  
**Dataset**: Synthetic network traffic (based on CIC-IDS-2017 patterns)  
**License**: MIT  

---

## Thank you for following this anomaly detection tutorial!

**Key Achievement**: Built an unsupervised intrusion detection system without labeled attacks!

**Main Insight**: Anomaly detection excels at finding unusual patterns, making it perfect for cybersecurity where new threats constantly emerge.

**Next Steps**:
- Apply to real network traffic data
- Integrate with SIEM systems
- Deploy for continuous monitoring

**Disclaimer**: For educational purposes. Production intrusion detection requires comprehensive security architecture, not just anomaly detection. Always combine with firewalls, IDS/IPS, endpoint protection, and security expertise.