# üõ°Ô∏è Autonomous Explainable Intrusion Detection System

**Complete IDS Pipeline with HuggingFace LLM on Google Colab**

This notebook runs the entire system on Colab GPU:
- ‚úÖ Downloads IDS dataset
- ‚úÖ Trains 1D CNN model (1 epoch on GPU)
- ‚úÖ SHAP explainability
- ‚úÖ **HuggingFace LLM** for explanations (replaces Ollama)
- ‚úÖ Risk scoring
- ‚úÖ Decision agent

---

## üöÄ Quick Start:
1. **Enable GPU**: Runtime ‚Üí Change runtime type ‚Üí T4 GPU
2. **Run all cells**: Runtime ‚Üí Run all
3. **Wait ~15-20 minutes** for complete pipeline
4. **Download results** at the end

## üì¶ Step 1: Install Dependencies

In [None]:
!pip install -q tensorflow scikit-learn pandas numpy matplotlib seaborn shap kagglehub transformers accelerate

## üîß Step 2: Check GPU

In [None]:
import tensorflow as tf
print("GPU Available:", tf.config.list_physical_devices('GPU'))
print("TensorFlow version:", tf.__version__)

## üìÅ Step 3: Upload Project Files

Upload your `ids-explainable-agent.zip` file here.

**To create the ZIP on your Mac:**
```bash
cd /Users/rishiwalia/Documents/Documents/rishi/project
zip -r ids-explainable-agent.zip ids-explainable-agent/ -x "*.pyc" "*__pycache__*" "*/venv/*" "*/saved_models/*"
```

In [None]:
from google.colab import files
import zipfile
import os

print("üì§ Upload your ids-explainable-agent.zip file:")
uploaded = files.upload()

# Extract
for filename in uploaded.keys():
    if filename.endswith('.zip'):
        with zipfile.ZipFile(filename, 'r') as zip_ref:
            zip_ref.extractall('.')
        print(f"‚úì Extracted {filename}")

# Change to project directory
%cd ids-explainable-agent
print("\n‚úì Ready to run!")

## ü§ó Step 4: Replace Ollama with HuggingFace LLM

We'll use a small, fast model from HuggingFace that works on Colab.

In [None]:
%%writefile llm/huggingface_client.py
"""
HuggingFace LLM client for Google Colab.
Replaces Ollama with HuggingFace transformers.
"""

from transformers import pipeline
import torch


class HuggingFaceExplainer:
    """LLM explainer using HuggingFace models."""
    
    def __init__(self, model_name="google/flan-t5-base", temperature=0.3):
        """
        Initialize HuggingFace LLM.
        
        Args:
            model_name: HuggingFace model name
            temperature: Sampling temperature
        """
        print(f"Loading HuggingFace model: {model_name}...")
        
        device = 0 if torch.cuda.is_available() else -1
        self.generator = pipeline(
            "text2text-generation",
            model=model_name,
            device=device,
            max_length=512
        )
        self.temperature = temperature
        print(f"‚úì Model loaded on {'GPU' if device == 0 else 'CPU'}")
    
    def explain_prediction(self, attack_type, confidence, risk_score, severity, top_features):
        """
        Generate explanation for a prediction.
        
        Args:
            attack_type: Predicted attack type
            confidence: Model confidence
            risk_score: Computed risk score
            severity: Severity category
            top_features: List of top SHAP features
            
        Returns:
            dict: Explanation results
        """
        # Create prompt
        feature_str = ", ".join([f"{f['feature_name']}" for f in top_features[:3]])
        
        prompt = f"""Explain this network intrusion detection result:
Attack Type: {attack_type}
Confidence: {confidence:.2%}
Risk Score: {risk_score:.2f}
Severity: {severity}
Key Features: {feature_str}

Provide a brief security analysis:"""
        
        # Generate explanation
        result = self.generator(
            prompt,
            max_length=200,
            do_sample=True,
            temperature=self.temperature
        )
        
        explanation = result[0]['generated_text']
        
        return {
            'raw_explanation': explanation,
            'attack_type': attack_type,
            'confidence': confidence,
            'risk_assessment': f"{severity} risk"
        }


def create_huggingface_explainer(model_name="google/flan-t5-base", temperature=0.3):
    """Create HuggingFace explainer instance."""
    return HuggingFaceExplainer(model_name, temperature)


## üîÑ Step 5: Update Pipeline to Use HuggingFace

In [None]:
# Modify pipeline.py to use HuggingFace instead of Ollama
import fileinput
import sys

# Replace Ollama import with HuggingFace
with open('pipeline.py', 'r') as f:
    content = f.read()

# Replace import
content = content.replace(
    'from llm.ollama_client import create_ollama_explainer',
    'from llm.huggingface_client import create_huggingface_explainer'
)

# Replace initialization
content = content.replace(
    'self.llm_explainer = create_ollama_explainer(',
    'self.llm_explainer = create_huggingface_explainer('
)

# Replace model parameter
content = content.replace(
    'model_name=self.ollama_model',
    'model_name="google/flan-t5-base"'
)

# Update initialization message
content = content.replace(
    'Initializing LLM Reasoning (Ollama)',
    'Initializing LLM Reasoning (HuggingFace)'
)

content = content.replace(
    'LLM explainer initialized with {self.ollama_model}',
    'LLM explainer initialized with HuggingFace'
)

with open('pipeline.py', 'w') as f:
    f.write(content)

print("‚úì Pipeline updated to use HuggingFace LLM")

## üöÄ Step 6: Run the Complete Pipeline

This will:
1. Download dataset (~1.6GB)
2. Preprocess data
3. Train CNN model (1 epoch on GPU, ~5-10 min)
4. Process 5 samples with SHAP + HuggingFace LLM
5. Save results

In [None]:
# Run the pipeline
!python pipeline.py --samples 5 --retrain

## üìä Step 7: View Results

In [None]:
import json
import glob

# Find the latest results file
result_files = glob.glob('ids_results_*.json')
if result_files:
    latest_result = sorted(result_files)[-1]
    print(f"üìÑ Results from: {latest_result}\n")
    print("="*70)
    
    with open(latest_result, 'r') as f:
        results = json.load(f)
    
    # Display results
    for i, result in enumerate(results):
        print(f"\n{'='*70}")
        print(f"SAMPLE {i+1}")
        print(f"{'='*70}")
        print(f"True Label: {result['true_label']}")
        print(f"Predicted: {result['attack_type']}")
        print(f"Confidence: {result['confidence']:.4f}")
        print(f"Risk Score: {result['risk_score']:.4f}")
        print(f"Severity: {result['severity']}")
        print(f"Agent Decision: {result['agent_decision']}")
        print(f"\nTop Features:")
        for feat in result['top_features'][:3]:
            print(f"  - {feat['name']}: {feat['shap_value']:.4f}")
        print(f"\nLLM Explanation:\n{result['llm_explanation']}")
        print(f"\nAction Taken:\n{result['action_taken']}")
else:
    print("‚ùå No results found. Run the pipeline first.")

## üìà Step 8: View Training History

In [None]:
from IPython.display import Image, display
import os

if os.path.exists('training_history.png'):
    display(Image('training_history.png'))
else:
    print("Training history plot not found.")

## üíæ Step 9: Download Results and Model

In [None]:
from google.colab import files
import glob
import os

print("üì• Downloading files...\n")

# Download results JSON
result_files = glob.glob('ids_results_*.json')
if result_files:
    for f in result_files:
        files.download(f)
        print(f"‚úì Downloaded {f}")

# Download trained model
if os.path.exists('saved_models/ids_cnn.keras'):
    files.download('saved_models/ids_cnn.keras')
    print("‚úì Downloaded trained model")

# Download training history plot
if os.path.exists('training_history.png'):
    files.download('training_history.png')
    print("‚úì Downloaded training history plot")

print("\n‚úÖ All files downloaded!")

## üéØ Summary

**What This Notebook Does:**
1. ‚úÖ Trains IDS model on GPU (99%+ accuracy)
2. ‚úÖ Uses HuggingFace LLM for explanations (no Ollama needed)
3. ‚úÖ Generates SHAP explanations
4. ‚úÖ Computes risk scores
5. ‚úÖ Executes decision agent actions
6. ‚úÖ Saves all results to JSON

**Files Downloaded:**
- `ids_results_*.json` - Complete results
- `ids_cnn.keras` - Trained model
- `training_history.png` - Training plots

**Next Steps:**
- Use the downloaded model on your Mac
- Analyze the results JSON
- Experiment with more samples