Skip to content

Security Supervisor

scarecr0w12 edited this page Jun 18, 2026 · 1 revision

Security Supervisor System

Overview

CortexPrism implements a three-layer LLM-based access control system to protect sensitive data from unauthorized agent access. The security supervisor runs alongside the existing Parallax policy validator, adding an intelligent review layer specifically for sensitive data operations.

Architecture

┌─────────────────────────────────────────────────────────┐
│              Agent Tool Execution Flow                   │
└─────────────────────────────────────────────────────────┘

Agent requests sensitive data
        │
        ↓
┌─────────────────────────────────────────────────────────┐
│  Layer 1: Data Classification                            │
│  - Check sensitivity level of requested data             │
│  - Levels: PUBLIC, NORMAL, SENSITIVE, SECRET             │
│  - Pattern-based detection:                              │
│    • SECRET: passwords, API keys, tokens, SSNs,          │
│      credit cards, private keys                          │
│    • SENSITIVE: email, phone, addresses,                 │
│      confidential markers                                │
│    • Default: non-empty = sensitive                      │
└─────────────────────────────────────────────────────────┘
        │
        ├─→ PUBLIC/NORMAL → Allow (no gate)
        │
        └─→ SENSITIVE/SECRET ↓
                              
┌─────────────────────────────────────────────────────────┐
│  Layer 2: LLM Supervisor                                 │
│  - Fast model (Gemini 2.0 Flash, GPT-4o Mini)           │
│  - Decision caching (1-hour session TTL)                │
│  - Confidence scoring (0.0-1.0)                         │
│  - Reviews: agent intent, data sensitivity,             │
│    operational context, risk assessment                  │
│  - Automatic human escalation for low confidence        │
└─────────────────────────────────────────────────────────┘
        │
        ├─→ ALLOW (confidence > threshold) → grant access
        │
        └─→ DENY or low confidence ↓
                                     
┌─────────────────────────────────────────────────────────┐
│  Layer 3: Human Approval                                │
│  - CLI: Interactive color-coded prompt                  │
│    with reasoning, sample data preview                  │
│  - Web UI: Modal with supervisor reasoning,             │
│    data preview, approve/deny buttons                   │
│  - Temporary grant: 1-hour TTL per session+tool         │
│  - Timeout after 60s → auto-deny                        │
└─────────────────────────────────────────────────────────┘
        │
        ├─→ Approve → cache grant, allow access
        │
        └─→ Deny (or timeout) → reject access

Key Features

Data Classification Engine

Automatic sensitivity detection using pattern matching:

  • SECRET patterns: Passwords (password, passwd, pwd), API keys (sk-, api_key, token), SSNs (\d{3}-\d{2}-\d{4}), credit cards (\d{4}[\s-]\d{4}[\s-]\d{4}[\s-]\d{4}), private keys (-----BEGIN.*PRIVATE KEY-----)
  • SENSITIVE patterns: Email addresses, phone numbers, physical addresses, confidential markers (confidential, internal, proprietary)
  • Default approach: Non-empty data is assumed sensitive until classified otherwise

LLM Supervisor

  • Uses a fast, cheap model (Gemini 2.0 Flash or GPT-4o Mini) for rapid review
  • Decision caching per session (1-hour TTL) prevents repeated approval prompts
  • Confidence scoring — high confidence auto-approves; low confidence escalates to human
  • Cost optimization — cached decisions avoid repeated LLM calls
  • Configurable threshold (confidenceThreshold, default 0.7)

Human Approval Flows

  • CLI: Color-coded interactive prompts showing what data, why the agent wants it, and the supervisor's reasoning
  • Web UI: Modal dialog with sample data preview (truncated for privacy), approve/deny buttons
  • Temporary grants: Approved access cached for the session (1-hour TTL) to prevent approval fatigue
  • Timeout guard: 60-second timeout auto-denies if no human response

Gated Tools

The following tools trigger security supervisor review when accessing sensitive data:

Tool Gate Condition
memory_search Results classified as SENSITIVE or SECRET
db_query Query targets tables with sensitivity columns
browser Screenshot/snapshot may capture sensitive UI
computer Screenshot may capture sensitive desktop content
web_fetch Fetched content matches sensitive patterns

Database Sensitivity

Sensitivity metadata is stored across all databases:

Database Tables with Sensitivity
cortex.db sessions, agents
memory.db episodic_memory, semantic_memory, reflection_memory, graph_entities
lens.db lens_events (audit logs)

A one-time backfill migration classifies all existing data on first run.

Configuration

{
  "securitySupervisor": {
    "enabled": true,
    "provider": "google",
    "model": "gemini-2.0-flash",
    "cacheTTL": 3600,
    "confidenceThreshold": 0.7
  },
  "classification": {
    "levels": ["SECRET", "SENSITIVE", "NORMAL", "PUBLIC"],
    "customPatterns": [
      { "level": "SECRET", "pattern": "my-company-secret-\\d+", "description": "Internal secrets" }
    ]
  }
}

Configuration Options

Field Type Default Description
enabled boolean true Enable/disable the security supervisor
provider string "google" LLM provider for the supervisor model
model string "gemini-2.0-flash" Fast model for review decisions
cacheTTL number 3600 Decision cache TTL in seconds (1 hour)
confidenceThreshold number 0.7 Minimum confidence for auto-approval
classification.levels string[] Default 4 levels Custom classification levels
classification.customPatterns object[] [] Additional regex patterns for classification

Web UI

The security supervisor is configurable in Settings → Security Supervisor tab:

  • Enable/disable toggle
  • Provider and model selection
  • Cache TTL slider
  • Classification level management
  • Custom pattern editor
  • Cache inspection (live decision cache entries)
  • Decision history browser

API endpoints:

  • GET /api/security/supervisor — current configuration
  • PUT /api/security/supervisor — update configuration
  • GET /api/security/supervisor/cache — inspect decision cache
  • DELETE /api/security/supervisor/cache — clear decision cache
  • GET /api/security/supervisor/history — review past decisions
  • GET /api/security/classification — classification configuration
  • PUT /api/security/classification — update classification settings
  • POST /api/security/classification/test — test classification on sample content

See Also

  • Security — Parallax policy validator and overall security model
  • Built-in Tools — Tool catalog with security gates documented
  • Agent Loop — How the supervisor integrates into agent turn processing

Clone this wiki locally