# 🔍 Splunk SIEM Server for Phishing Detection

This notebook runs a Splunk instance in Google Colab that receives phishing detection alerts from the main notebook.

## Architecture
```
┌─────────────────────────┐      ┌─────────────────────────┐
│   COLAB NOTEBOOK 1      │      │   COLAB NOTEBOOK 2      │
│   (This Notebook)       │      │   (Phishing Detection)  │
│                         │      │                         │
│   Splunk Enterprise     │◄─────│   FastAPI Gateway       │
│   - Web UI (8000)       │ HEC  │   - RF Model            │
│   - HEC (8088)          │      │   - XGBoost Model       │
│                         │      │   - LLM Model           │
│   Exposed via ngrok     │      │                         │
└─────────────────────────┘      └─────────────────────────┘
```

## Instructions
1. Run all cells in this notebook first
2. Copy the ngrok HEC URL displayed
3. Use that URL in the main phishing detection notebook

---

## 1. Install Dependencies

In [None]:
# Install required packages
!pip install pyngrok requests -q

import os
import time
import requests
import json
from pyngrok import ngrok, conf

print("✓ Dependencies installed")

## 2. Configure ngrok

Get your free ngrok auth token from: https://dashboard.ngrok.com/get-started/your-authtoken

In [None]:
# @title ngrok Configuration (REQUIRED)
# @markdown **You MUST provide an ngrok token for this to work!**
# @markdown Get your free token from: https://dashboard.ngrok.com/get-started/your-authtoken

NGROK_AUTH_TOKEN = ""  # @param {type:"string"}

if NGROK_AUTH_TOKEN:
    from pyngrok import ngrok
    ngrok.set_auth_token(NGROK_AUTH_TOKEN)
    print("✓ ngrok authenticated successfully!")
else:
    raise ValueError(
        "
" + "="*60 + "
" +
        "ERROR: ngrok auth token is REQUIRED!
" +
        "="*60 + "
" +
        "1. Go to: https://dashboard.ngrok.com/get-started/your-authtoken
" +
        "2. Sign up for a free account (or log in)
" +
        "3. Copy your authtoken
" +
        "4. Paste it in the NGROK_AUTH_TOKEN field above
" +
        "5. Re-run this cell
" +
        "="*60
    )

## 3. Install Docker in Colab

In [None]:
%%bash
# Install Docker
echo "Installing Docker..."

# Remove old versions
apt-get remove -y docker docker-engine docker.io containerd runc 2>/dev/null || true

# Install prerequisites
apt-get update -qq
apt-get install -y -qq apt-transport-https ca-certificates curl gnupg lsb-release

# Add Docker's official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

# Set up stable repository
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker Engine
apt-get update -qq
apt-get install -y -qq docker-ce docker-ce-cli containerd.io

# Start Docker daemon
dockerd > /dev/null 2>&1 &
sleep 5

echo "✓ Docker installed"
docker --version

In [None]:
# Verify Docker is running
!sleep 3 && docker info --format '{{.ServerVersion}}' 2>/dev/null && echo "✓ Docker daemon is running" || echo "Starting Docker daemon..."

# Start daemon if not running
import subprocess
import time

try:
    subprocess.run(["docker", "info"], capture_output=True, check=True)
except:
    print("Starting Docker daemon...")
    subprocess.Popen(["dockerd"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    time.sleep(10)
    print("✓ Docker daemon started")

## 4. Configure Splunk Settings

In [None]:
# @title Splunk Configuration
SPLUNK_PASSWORD = "PhishingDemo123!"  # @param {type:"string"}
SPLUNK_HEC_TOKEN = "phishing-hec-token-demo"  # @param {type:"string"}
SPLUNK_INDEX = "security"  # @param {type:"string"}

# Validate password
if len(SPLUNK_PASSWORD) < 8:
    print("⚠ Password must be at least 8 characters!")
else:
    print(f"✓ Splunk admin password set")
    print(f"✓ HEC Token: {SPLUNK_HEC_TOKEN}")
    print(f"✓ Index: {SPLUNK_INDEX}")

## 5. Pull and Run Splunk Container

Using the official Splunk image (lightweight single-instance mode)

In [None]:
%%bash -s "$SPLUNK_PASSWORD" "$SPLUNK_HEC_TOKEN" "$SPLUNK_INDEX"

SPLUNK_PASSWORD=$1
SPLUNK_HEC_TOKEN=$2
SPLUNK_INDEX=$3

echo "Pulling Splunk image (this may take 2-3 minutes)..."

# Stop any existing Splunk container
docker stop splunk 2>/dev/null || true
docker rm splunk 2>/dev/null || true

# Pull the latest Splunk image
docker pull splunk/splunk:latest

echo ""
echo "Starting Splunk container..."

# Run Splunk with HEC enabled
docker run -d \
  --name splunk \
  -p 8000:8000 \
  -p 8088:8088 \
  -p 8089:8089 \
  -e SPLUNK_START_ARGS='--accept-license' \
  -e SPLUNK_PASSWORD="$SPLUNK_PASSWORD" \
  -e SPLUNK_HEC_TOKEN="$SPLUNK_HEC_TOKEN" \
  splunk/splunk:latest

echo ""
echo "✓ Splunk container started"
echo "  Waiting for Splunk to initialize (this takes 2-3 minutes)..."

In [None]:
# Wait for Splunk to be ready
import time
import subprocess

print("Waiting for Splunk to start...")
print("(This typically takes 2-3 minutes)\n")

max_wait = 300  # 5 minutes max
start_time = time.time()

while time.time() - start_time < max_wait:
    try:
        # Check if Splunk web is responding
        result = subprocess.run(
            ["curl", "-s", "-o", "/dev/null", "-w", "%{http_code}", "http://localhost:8000"],
            capture_output=True, text=True, timeout=5
        )
        if result.stdout.strip() in ["200", "303"]:
            print("\n✓ Splunk Web UI is ready!")
            break
    except:
        pass
    
    elapsed = int(time.time() - start_time)
    print(f"  Still starting... ({elapsed}s)", end="\r")
    time.sleep(10)
else:
    print("\n⚠ Splunk taking longer than expected. Check logs with: !docker logs splunk")

# Show container status
!docker ps --filter name=splunk --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

## 6. Configure HEC (HTTP Event Collector)

In [None]:
import requests
import time
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

SPLUNK_API = "https://localhost:8089"
auth = ("admin", SPLUNK_PASSWORD)

print("Configuring Splunk HEC...\n")

# Wait for management port
for i in range(30):
    try:
        r = requests.get(f"{SPLUNK_API}/services", auth=auth, verify=False, timeout=5)
        if r.status_code in [200, 401]:
            break
    except:
        time.sleep(5)

# Step 1: Create the security index
print(f"1. Creating index '{SPLUNK_INDEX}'...")
try:
    r = requests.post(
        f"{SPLUNK_API}/services/data/indexes",
        auth=auth,
        verify=False,
        data={"name": SPLUNK_INDEX}
    )
    if r.status_code in [201, 409]:  # 409 = already exists
        print(f"   ✓ Index '{SPLUNK_INDEX}' ready")
    else:
        print(f"   ⚠ Index creation returned: {r.status_code}")
except Exception as e:
    print(f"   ⚠ Error: {e}")

# Step 2: Enable HEC globally
print("2. Enabling HEC globally...")
try:
    r = requests.post(
        f"{SPLUNK_API}/services/data/inputs/http/http",
        auth=auth,
        verify=False,
        data={"disabled": "0", "enableSSL": "1"}
    )
    print("   ✓ HEC enabled")
except Exception as e:
    print(f"   Note: {e}")

# Step 3: Create HEC token
print(f"3. Creating HEC token...")
try:
    # Delete existing token if any
    requests.delete(
        f"{SPLUNK_API}/services/data/inputs/http/phishing_detection",
        auth=auth,
        verify=False
    )
    
    # Create new token
    r = requests.post(
        f"{SPLUNK_API}/services/data/inputs/http",
        auth=auth,
        verify=False,
        data={
            "name": "phishing_detection",
            "token": SPLUNK_HEC_TOKEN,
            "index": SPLUNK_INDEX,
            "indexes": SPLUNK_INDEX,
            "sourcetype": "phishing_detection",
            "disabled": "0"
        }
    )
    if r.status_code in [200, 201, 409]:
        print(f"   ✓ HEC token created: {SPLUNK_HEC_TOKEN}")
    else:
        print(f"   Status: {r.status_code} - {r.text[:200]}")
except Exception as e:
    print(f"   ⚠ Error: {e}")

print("\n✓ Splunk HEC configuration complete!")

## 7. Create ngrok Tunnels

This exposes Splunk Web UI and HEC to the internet so your other Colab notebook can reach it.

In [None]:
# Kill any existing ngrok tunnels
ngrok.kill()
time.sleep(2)

# Create tunnel for HEC (port 8088)
print("Creating ngrok tunnel for Splunk HEC...")
hec_tunnel = ngrok.connect(8088, "http")
HEC_PUBLIC_URL = hec_tunnel.public_url

# Convert to HTTPS if needed
if HEC_PUBLIC_URL.startswith("http://"):
    HEC_PUBLIC_URL = HEC_PUBLIC_URL.replace("http://", "https://")

print("\n" + "="*70)
print("🎉 SPLUNK IS READY!")
print("="*70)
print(f"\n📊 Splunk Web UI (local):  http://localhost:8000")
print(f"   Username: admin")
print(f"   Password: {SPLUNK_PASSWORD}")
print(f"\n🔗 HEC Endpoint (PUBLIC - use this in phishing notebook):")
print(f"   {HEC_PUBLIC_URL}/services/collector")
print(f"\n🔑 HEC Token: {SPLUNK_HEC_TOKEN}")
print(f"📁 Index: {SPLUNK_INDEX}")
print("="*70)

# Store for easy copy
SPLUNK_HEC_FULL_URL = f"{HEC_PUBLIC_URL}/services/collector"

## 8. Test HEC Connection

In [None]:
import requests
import json
from datetime import datetime
import urllib3
urllib3.disable_warnings()

print("Testing HEC connection...\n")

# Test event
test_event = {
    "time": datetime.now().timestamp(),
    "host": "colab-splunk-test",
    "source": "phishing-gateway-test",
    "sourcetype": "phishing_detection",
    "index": SPLUNK_INDEX,
    "event": {
        "alert_id": "test-001",
        "timestamp": datetime.utcnow().isoformat() + "Z",
        "event_type": "connection_test",
        "severity": "INFO",
        "message": "HEC connection test from Colab",
        "detection": {
            "classification": "TEST",
            "is_phishing": False,
            "probability": 0.0,
            "risk_score": 0
        }
    }
}

headers = {
    "Authorization": f"Splunk {SPLUNK_HEC_TOKEN}",
    "Content-Type": "application/json"
}

# Try local first
print("Testing local HEC (localhost:8088)...")
try:
    r = requests.post(
        "https://localhost:8088/services/collector",
        headers=headers,
        json=test_event,
        verify=False,
        timeout=10
    )
    if r.status_code == 200:
        print(f"   ✓ Local HEC working! Response: {r.json()}")
    else:
        print(f"   ⚠ Status {r.status_code}: {r.text}")
except Exception as e:
    print(f"   ⚠ Local test failed: {e}")

# Test via ngrok
print(f"\nTesting public HEC ({HEC_PUBLIC_URL})...")
try:
    r = requests.post(
        f"{HEC_PUBLIC_URL}/services/collector",
        headers=headers,
        json=test_event,
        verify=False,
        timeout=10
    )
    if r.status_code == 200:
        print(f"   ✓ Public HEC working! Response: {r.json()}")
        print("\n" + "="*50)
        print("✅ READY TO RECEIVE PHISHING ALERTS!")
        print("="*50)
    else:
        print(f"   ⚠ Status {r.status_code}: {r.text}")
except Exception as e:
    print(f"   ⚠ Public test failed: {e}")

## 9. 📋 Copy These Settings to Phishing Detection Notebook

Run this cell to get the configuration you need for the main notebook:

In [None]:
print("="*70)
print("📋 COPY THESE SETTINGS TO YOUR PHISHING DETECTION NOTEBOOK")
print("="*70)
print(f'''
# Splunk Configuration (paste in phishing notebook)
SPLUNK_HEC_URL = "{SPLUNK_HEC_FULL_URL}"
SPLUNK_HEC_TOKEN = "{SPLUNK_HEC_TOKEN}"
SPLUNK_INDEX = "{SPLUNK_INDEX}"
''')
print("="*70)
print("\nOr configure via API after starting the phishing API:")
print(f'''
curl -X POST "YOUR_PHISHING_API_URL/splunk/configure" \\
  -d "hec_url={SPLUNK_HEC_FULL_URL}" \\
  -d "token={SPLUNK_HEC_TOKEN}" \\
  -d "index={SPLUNK_INDEX}"
''')

## 10. View Splunk Logs & Events

In [None]:
# View recent Splunk container logs
print("Recent Splunk logs:")
print("="*50)
!docker logs splunk --tail 20

In [None]:
# Search for events in Splunk via REST API
import requests
import urllib3
urllib3.disable_warnings()

print(f"Searching for events in index='{SPLUNK_INDEX}'...\n")

# Create a search job
search_query = f"search index={SPLUNK_INDEX} | head 10"

try:
    # Start search
    r = requests.post(
        "https://localhost:8089/services/search/jobs",
        auth=("admin", SPLUNK_PASSWORD),
        verify=False,
        data={
            "search": search_query,
            "output_mode": "json",
            "exec_mode": "oneshot"
        }
    )
    
    if r.status_code == 200:
        results = r.json()
        if "results" in results and results["results"]:
            print(f"Found {len(results['results'])} events:\n")
            for i, event in enumerate(results["results"], 1):
                print(f"--- Event {i} ---")
                raw = event.get("_raw", str(event))
                # Pretty print if JSON
                try:
                    parsed = json.loads(raw)
                    print(json.dumps(parsed, indent=2)[:500])
                except:
                    print(raw[:500])
                print()
        else:
            print("No events found yet. Send some phishing predictions!")
    else:
        print(f"Search returned status {r.status_code}")
except Exception as e:
    print(f"Error searching: {e}")

## 11. Keep Alive

Run this cell to keep the notebook running and prevent Colab from disconnecting.

In [None]:
import time
from IPython.display import clear_output
import requests
import urllib3
urllib3.disable_warnings()

print("🔄 Keep-alive running. Splunk is accepting events.")
print(f"   HEC URL: {SPLUNK_HEC_FULL_URL}")
print(f"   Token: {SPLUNK_HEC_TOKEN}")
print("\nPress STOP (⬛) to end.\n")
print("="*50)

event_count = 0
start_time = time.time()

while True:
    try:
        # Check Splunk health
        r = requests.get("http://localhost:8000", timeout=5)
        status = "✓ Running" if r.status_code in [200, 303] else f"⚠ Status {r.status_code}"
        
        # Check event count
        try:
            search_r = requests.post(
                "https://localhost:8089/services/search/jobs",
                auth=("admin", SPLUNK_PASSWORD),
                verify=False,
                data={
                    "search": f"search index={SPLUNK_INDEX} | stats count",
                    "output_mode": "json",
                    "exec_mode": "oneshot"
                },
                timeout=10
            )
            if search_r.status_code == 200:
                results = search_r.json().get("results", [])
                if results:
                    event_count = int(results[0].get("count", 0))
        except:
            pass
        
        elapsed = int(time.time() - start_time)
        hours, remainder = divmod(elapsed, 3600)
        minutes, seconds = divmod(remainder, 60)
        
        print(f"\r[{hours:02d}:{minutes:02d}:{seconds:02d}] Splunk: {status} | Events indexed: {event_count}", end="")
        
    except Exception as e:
        print(f"\r⚠ Health check error: {e}", end="")
    
    time.sleep(30)

## 12. Cleanup (Run when done)

In [None]:
# Stop and remove Splunk container
print("Stopping Splunk...")
!docker stop splunk
!docker rm splunk

# Kill ngrok tunnels
ngrok.kill()

print("\n✓ Cleanup complete!")