Security & Safety

Threat Model

flowchart TD
    subgraph "Threats"
        T1["AI Discloses Identity"]
        T2["Real PII Leaked"]
        T3["Legal Threat Received"]
        T4["Conversation Escalates"]
        T5["Dashboard Exposed"]
        T6["Secrets in Code/Logs"]
    end

    subgraph "Mitigations"
        M1["AI Disclosure Detection<br/>30+ blocked terms"]
        M2["PII Pattern Blocking<br/>SSN regex + 9-digit check"]
        M3["Legal Threat Detection<br/>Auto-escalate + stop reply"]
        M4["Message Limit<br/>Escalate after 10 exchanges"]
        M5["API Key Auth<br/>Bearer token or query param"]
        M6[".env.local gitignored<br/>Secrets in Cloudflare"]
    end

    T1 --> M1
    T2 --> M2
    T3 --> M3
    T4 --> M4
    T5 --> M5
    T6 --> M6

Safety Rails

Response Validation Flow

Every AI-generated response passes through safety validation before sending.

flowchart TD
    A["AI Generated Response"] --> B{"Empty?"}
    B --> |"Yes"| FAIL["FAIL: Fall back to template"]
    B --> |"No"| C{"Contains AI<br/>disclosure terms?"}
    C --> |"Yes"| FAIL
    C --> |"No"| D{"Matches real<br/>PII patterns?"}
    D --> |"Yes"| FAIL
    D --> |"No"| E{"Agrees to phone<br/>before star?"}
    E --> |"Yes"| FAIL
    E --> |"No"| PASS["PASS: Send response"]
    FAIL --> F{"3 consecutive<br/>failures?"}
    F --> |"Yes"| G["ESCALATE: Stop auto-reply"]
    F --> |"No"| H["Use template instead"]

AI Disclosure Terms (blocked)

The following terms in an AI response trigger safety failure:

Category	Blocked Phrases
Direct disclosure	"i am an ai", "i'm an ai", "as an ai"
Technical terms	"language model", "large language model", "llm"
Product names	"chatgpt", "openai", "claude", "anthropic"
Generic terms	"artificial intelligence", "automated response", "automated system"
Bot references	"i am a bot", "i'm a bot", "i'm not a real person"
Meta references	"generated by", "programmed to", "i don't actually exist"

Real PII Detection

Pattern	Regex	What It Catches
Full SSN	`\d{3}-\d{2}-\d{4}`	123-45-6789 format
Unformatted SSN	`\d{9}`	Nine consecutive digits

Phone Agreement Detection

If star_verified = false, these phrases are blocked:

"call you at", "i'll call", "let's schedule a call"
"available for a call", "here's my number"
"my phone number", "call me at"

Escalation System

flowchart TD
    A["Inbound Email"] --> B{"Legal threat<br/>detected?"}
    B --> |"Yes"| ESC["ESCALATED<br/>Stop all auto-reply"]
    B --> |"No"| C{"Message count<br/>> 10?"}
    C --> |"Yes"| ESC
    C --> |"No"| D{"Sender domain<br/>in allowlist?"}
    D --> |"Yes"| ESC
    D --> |"No"| E["Continue normal processing"]

Legal Threat Terms

lawyer, attorney, legal action, lawsuit, litigation, cease and desist, subpoena, court order, legal counsel, sue you, take legal, filing a complaint, report you, FTC, federal trade commission

Company Allowlist (real opportunities)

Google, Microsoft, Apple, Amazon, Meta, Netflix, Stripe, Cloudflare, GitHub, OpenAI, Anthropic

These trigger escalation because they might be real opportunities that deserve manual attention.

PII Fabrication

When recruiters ask for personal information, Hermes generates fake but deterministic data seeded by conversation ID.

Field	Generation Method	Example
SSN last 4	`conversationId * prime % 10000`	7842
Full SSN	Three seeded segments	483-29-7842
Date of Birth	Random year 1985-1995, month, day	03/17/1991
Address	Pool of 12 fake addresses	1247 Oak Ridge Blvd
City/State	Pool of 12 city/state pairs	Austin, TX 78704
DL Number	State prefix + 7 digits	TX-2847193

Why deterministic? If a recruiter asks twice, they get the same answers. Consistency prevents suspicion.

What is NEVER sent: Real SSN, real DOB, real address, real DL numbers of the owner.

Dashboard Security

Authentication (3 Methods)

Method	Header/Param	Role	Use Case
API Key	`Authorization: Bearer KEY` or `?key=KEY`	Owner	API access, scripts
GitHub OAuth	Session cookie via `/auth/github`	Owner/Viewer+/Viewer	Browser access
Viewer Token	`?viewer=TOKEN`	Viewer	Interview sharing (4hr TTL)

Role-Based Access Control

Role	How Assigned	Permissions
Owner	API key or GitHub username match	Full CRUD, admin actions, poke/sweep
Viewer	GitHub OAuth non-owner or viewer token	Demo-labeled conversations only, anonymized

Viewer Anonymization

Viewer mode masks PII at runtime (no data duplication):

Emails → Recruiter #N
Phone numbers → <phone-number>
External URLs → <company-url> (safe URLs preserved)
Names stripped

Viewer Tokens

Generated by owner in Admin tab
UUID-based, stored hashed in D1
4-hour TTL, can be revoked
Expired tokens show a "sign in with GitHub" prompt

GitHub OAuth

Uses GitHub OAuth App (not GitHub App)
Scope: read:user (identifies GitHub username for role assignment)
Session stored in D1 sessions table
Cookie: hermes_session, HttpOnly, Secure, SameSite=Lax, 7-day TTL

API Key Requirements

Hex-only characters (a-f, 0-9) -- special characters (+, /, =) get URL-mangled
32 bytes minimum: openssl rand -hex 32
Stored as Cloudflare secret (encrypted at rest)
Local copy in .env.local (gitignored)

Secret Management

Secret	Risk If Leaked	Rotation
`DASHBOARD_API_KEY`	Full conversation + KB access	`wrangler secret put DASHBOARD_API_KEY`
`RESEND_API_KEY`	Can send email as martymcenroe.ai	Rotate in Resend dashboard
`GITHUB_CLIENT_ID`	OAuth app identity (low risk)	Regenerate in GitHub Developer Settings
`GITHUB_CLIENT_SECRET`	OAuth token exchange	Regenerate in GitHub Developer Settings
`AI_ROLLOUT_MODE`	Configuration only (low risk)	`wrangler secret put AI_ROLLOUT_MODE`

Kill Switches

Disable AI (keep templates)

wrangler secret put AI_ROLLOUT_MODE
# Type: off

Stop ALL email processing

Disable email routing rules in Cloudflare Dashboard > Email Routing.

Ghost a specific conversation

wrangler d1 execute hermes-db --command "UPDATE conversations SET state = 'GHOSTED' WHERE id = 123"

Prompt Injection Defense

The system prompt structure isolates untrusted content:

[SYSTEM PROMPT - hardcoded persona rules]
[STATE INSTRUCTIONS - from code, not user input]
[KNOWLEDGE BASE - curated by owner, from D1]
[STAR PUSH - from code logic]
---
[USER MESSAGE - recruiter's email body]   <-- Only untrusted content
[ASSISTANT - previous AI responses]
[USER MESSAGE - latest email]             <-- Only untrusted content

The recruiter's email is always in the user role. It is never injected into the system prompt. Knowledge base entries are curated by the owner through the dashboard.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
API-Reference.md		API-Reference.md
About.md		About.md
Architecture.md		Architecture.md
Conversation-State-Machine.md		Conversation-State-Machine.md
Dashboard.md		Dashboard.md
Deployment.md		Deployment.md
Home.md		Home.md
Knowledge-Base.md		Knowledge-Base.md
Persona.md		Persona.md
Security.md		Security.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Security & Safety

Threat Model

Safety Rails

Response Validation Flow

AI Disclosure Terms (blocked)

Real PII Detection

Phone Agreement Detection

Escalation System

Legal Threat Terms

Company Allowlist (real opportunities)

PII Fabrication

Dashboard Security

Authentication (3 Methods)

Role-Based Access Control

Viewer Anonymization

Viewer Tokens

GitHub OAuth

API Key Requirements

Secret Management

Kill Switches

Disable AI (keep templates)

Stop ALL email processing

Ghost a specific conversation

Prompt Injection Defense

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Security & Safety

Threat Model

Safety Rails

Response Validation Flow

AI Disclosure Terms (blocked)

Real PII Detection

Phone Agreement Detection

Escalation System

Legal Threat Terms

Company Allowlist (real opportunities)

PII Fabrication

Dashboard Security

Authentication (3 Methods)

Role-Based Access Control

Viewer Anonymization

Viewer Tokens

GitHub OAuth

API Key Requirements

Secret Management

Kill Switches

Disable AI (keep templates)

Stop ALL email processing

Ghost a specific conversation

Prompt Injection Defense

About

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages