Skip to content

martymcenroe/hermes-docs

Repository files navigation

Security & Safety

Threat Model

flowchart TD
    subgraph "Threats"
        T1["AI Discloses Identity"]
        T2["Real PII Leaked"]
        T3["Legal Threat Received"]
        T4["Conversation Escalates"]
        T5["Dashboard Exposed"]
        T6["Secrets in Code/Logs"]
    end

    subgraph "Mitigations"
        M1["AI Disclosure Detection<br/>30+ blocked terms"]
        M2["PII Pattern Blocking<br/>SSN regex + 9-digit check"]
        M3["Legal Threat Detection<br/>Auto-escalate + stop reply"]
        M4["Message Limit<br/>Escalate after 10 exchanges"]
        M5["API Key Auth<br/>Bearer token or query param"]
        M6[".env.local gitignored<br/>Secrets in Cloudflare"]
    end

    T1 --> M1
    T2 --> M2
    T3 --> M3
    T4 --> M4
    T5 --> M5
    T6 --> M6
Loading

Safety Rails

Response Validation Flow

Every AI-generated response passes through safety validation before sending.

flowchart TD
    A["AI Generated Response"] --> B{"Empty?"}
    B --> |"Yes"| FAIL["FAIL: Fall back to template"]
    B --> |"No"| C{"Contains AI<br/>disclosure terms?"}
    C --> |"Yes"| FAIL
    C --> |"No"| D{"Matches real<br/>PII patterns?"}
    D --> |"Yes"| FAIL
    D --> |"No"| E{"Agrees to phone<br/>before star?"}
    E --> |"Yes"| FAIL
    E --> |"No"| PASS["PASS: Send response"]
    FAIL --> F{"3 consecutive<br/>failures?"}
    F --> |"Yes"| G["ESCALATE: Stop auto-reply"]
    F --> |"No"| H["Use template instead"]
Loading

AI Disclosure Terms (blocked)

The following terms in an AI response trigger safety failure:

Category Blocked Phrases
Direct disclosure "i am an ai", "i'm an ai", "as an ai"
Technical terms "language model", "large language model", "llm"
Product names "chatgpt", "openai", "claude", "anthropic"
Generic terms "artificial intelligence", "automated response", "automated system"
Bot references "i am a bot", "i'm a bot", "i'm not a real person"
Meta references "generated by", "programmed to", "i don't actually exist"

Real PII Detection

Pattern Regex What It Catches
Full SSN \d{3}-\d{2}-\d{4} 123-45-6789 format
Unformatted SSN \d{9} Nine consecutive digits

Phone Agreement Detection

If star_verified = false, these phrases are blocked:

  • "call you at", "i'll call", "let's schedule a call"
  • "available for a call", "here's my number"
  • "my phone number", "call me at"

Escalation System

flowchart TD
    A["Inbound Email"] --> B{"Legal threat<br/>detected?"}
    B --> |"Yes"| ESC["ESCALATED<br/>Stop all auto-reply"]
    B --> |"No"| C{"Message count<br/>> 10?"}
    C --> |"Yes"| ESC
    C --> |"No"| D{"Sender domain<br/>in allowlist?"}
    D --> |"Yes"| ESC
    D --> |"No"| E["Continue normal processing"]
Loading

Legal Threat Terms

lawyer, attorney, legal action, lawsuit, litigation, cease and desist, subpoena, court order, legal counsel, sue you, take legal, filing a complaint, report you, FTC, federal trade commission

Company Allowlist (real opportunities)

Google, Microsoft, Apple, Amazon, Meta, Netflix, Stripe, Cloudflare, GitHub, OpenAI, Anthropic

These trigger escalation because they might be real opportunities that deserve manual attention.

PII Fabrication

When recruiters ask for personal information, Hermes generates fake but deterministic data seeded by conversation ID.

Field Generation Method Example
SSN last 4 conversationId * prime % 10000 7842
Full SSN Three seeded segments 483-29-7842
Date of Birth Random year 1985-1995, month, day 03/17/1991
Address Pool of 12 fake addresses 1247 Oak Ridge Blvd
City/State Pool of 12 city/state pairs Austin, TX 78704
DL Number State prefix + 7 digits TX-2847193

Why deterministic? If a recruiter asks twice, they get the same answers. Consistency prevents suspicion.

What is NEVER sent: Real SSN, real DOB, real address, real DL numbers of the owner.

Dashboard Security

Authentication (3 Methods)

Method Header/Param Role Use Case
API Key Authorization: Bearer KEY or ?key=KEY Owner API access, scripts
GitHub OAuth Session cookie via /auth/github Owner/Viewer+/Viewer Browser access
Viewer Token ?viewer=TOKEN Viewer Interview sharing (4hr TTL)

Role-Based Access Control

Role How Assigned Permissions
Owner API key or GitHub username match Full CRUD, admin actions, poke/sweep
Viewer GitHub OAuth non-owner or viewer token Demo-labeled conversations only, anonymized

Viewer Anonymization

Viewer mode masks PII at runtime (no data duplication):

  • Emails → Recruiter #N
  • Phone numbers → <phone-number>
  • External URLs → <company-url> (safe URLs preserved)
  • Names stripped

Viewer Tokens

  • Generated by owner in Admin tab
  • UUID-based, stored hashed in D1
  • 4-hour TTL, can be revoked
  • Expired tokens show a "sign in with GitHub" prompt

GitHub OAuth

  • Uses GitHub OAuth App (not GitHub App)
  • Scope: read:user (identifies GitHub username for role assignment)
  • Session stored in D1 sessions table
  • Cookie: hermes_session, HttpOnly, Secure, SameSite=Lax, 7-day TTL

API Key Requirements

  • Hex-only characters (a-f, 0-9) -- special characters (+, /, =) get URL-mangled
  • 32 bytes minimum: openssl rand -hex 32
  • Stored as Cloudflare secret (encrypted at rest)
  • Local copy in .env.local (gitignored)

Secret Management

Secret Risk If Leaked Rotation
DASHBOARD_API_KEY Full conversation + KB access wrangler secret put DASHBOARD_API_KEY
RESEND_API_KEY Can send email as martymcenroe.ai Rotate in Resend dashboard
GITHUB_CLIENT_ID OAuth app identity (low risk) Regenerate in GitHub Developer Settings
GITHUB_CLIENT_SECRET OAuth token exchange Regenerate in GitHub Developer Settings
AI_ROLLOUT_MODE Configuration only (low risk) wrangler secret put AI_ROLLOUT_MODE

Kill Switches

Disable AI (keep templates)

wrangler secret put AI_ROLLOUT_MODE
# Type: off

Stop ALL email processing

Disable email routing rules in Cloudflare Dashboard > Email Routing.

Ghost a specific conversation

wrangler d1 execute hermes-db --command "UPDATE conversations SET state = 'GHOSTED' WHERE id = 123"

Prompt Injection Defense

The system prompt structure isolates untrusted content:

[SYSTEM PROMPT - hardcoded persona rules]
[STATE INSTRUCTIONS - from code, not user input]
[KNOWLEDGE BASE - curated by owner, from D1]
[STAR PUSH - from code logic]
---
[USER MESSAGE - recruiter's email body]   <-- Only untrusted content
[ASSISTANT - previous AI responses]
[USER MESSAGE - latest email]             <-- Only untrusted content

The recruiter's email is always in the user role. It is never injected into the system prompt. Knowledge base entries are curated by the owner through the dashboard.

About

Architecture, security, and operational documentation for Hermes — autonomous AI email agent

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors