flowchart TD
subgraph "Threats"
T1["AI Discloses Identity"]
T2["Real PII Leaked"]
T3["Legal Threat Received"]
T4["Conversation Escalates"]
T5["Dashboard Exposed"]
T6["Secrets in Code/Logs"]
end
subgraph "Mitigations"
M1["AI Disclosure Detection<br/>30+ blocked terms"]
M2["PII Pattern Blocking<br/>SSN regex + 9-digit check"]
M3["Legal Threat Detection<br/>Auto-escalate + stop reply"]
M4["Message Limit<br/>Escalate after 10 exchanges"]
M5["API Key Auth<br/>Bearer token or query param"]
M6[".env.local gitignored<br/>Secrets in Cloudflare"]
end
T1 --> M1
T2 --> M2
T3 --> M3
T4 --> M4
T5 --> M5
T6 --> M6
Every AI-generated response passes through safety validation before sending.
flowchart TD
A["AI Generated Response"] --> B{"Empty?"}
B --> |"Yes"| FAIL["FAIL: Fall back to template"]
B --> |"No"| C{"Contains AI<br/>disclosure terms?"}
C --> |"Yes"| FAIL
C --> |"No"| D{"Matches real<br/>PII patterns?"}
D --> |"Yes"| FAIL
D --> |"No"| E{"Agrees to phone<br/>before star?"}
E --> |"Yes"| FAIL
E --> |"No"| PASS["PASS: Send response"]
FAIL --> F{"3 consecutive<br/>failures?"}
F --> |"Yes"| G["ESCALATE: Stop auto-reply"]
F --> |"No"| H["Use template instead"]
The following terms in an AI response trigger safety failure:
| Category | Blocked Phrases |
|---|---|
| Direct disclosure | "i am an ai", "i'm an ai", "as an ai" |
| Technical terms | "language model", "large language model", "llm" |
| Product names | "chatgpt", "openai", "claude", "anthropic" |
| Generic terms | "artificial intelligence", "automated response", "automated system" |
| Bot references | "i am a bot", "i'm a bot", "i'm not a real person" |
| Meta references | "generated by", "programmed to", "i don't actually exist" |
| Pattern | Regex | What It Catches |
|---|---|---|
| Full SSN | \d{3}-\d{2}-\d{4} |
123-45-6789 format |
| Unformatted SSN | \d{9} |
Nine consecutive digits |
If star_verified = false, these phrases are blocked:
- "call you at", "i'll call", "let's schedule a call"
- "available for a call", "here's my number"
- "my phone number", "call me at"
flowchart TD
A["Inbound Email"] --> B{"Legal threat<br/>detected?"}
B --> |"Yes"| ESC["ESCALATED<br/>Stop all auto-reply"]
B --> |"No"| C{"Message count<br/>> 10?"}
C --> |"Yes"| ESC
C --> |"No"| D{"Sender domain<br/>in allowlist?"}
D --> |"Yes"| ESC
D --> |"No"| E["Continue normal processing"]
lawyer, attorney, legal action, lawsuit, litigation, cease and desist, subpoena, court order, legal counsel, sue you, take legal, filing a complaint, report you, FTC, federal trade commission
Google, Microsoft, Apple, Amazon, Meta, Netflix, Stripe, Cloudflare, GitHub, OpenAI, Anthropic
These trigger escalation because they might be real opportunities that deserve manual attention.
When recruiters ask for personal information, Hermes generates fake but deterministic data seeded by conversation ID.
| Field | Generation Method | Example |
|---|---|---|
| SSN last 4 | conversationId * prime % 10000 |
7842 |
| Full SSN | Three seeded segments | 483-29-7842 |
| Date of Birth | Random year 1985-1995, month, day | 03/17/1991 |
| Address | Pool of 12 fake addresses | 1247 Oak Ridge Blvd |
| City/State | Pool of 12 city/state pairs | Austin, TX 78704 |
| DL Number | State prefix + 7 digits | TX-2847193 |
Why deterministic? If a recruiter asks twice, they get the same answers. Consistency prevents suspicion.
What is NEVER sent: Real SSN, real DOB, real address, real DL numbers of the owner.
| Method | Header/Param | Role | Use Case |
|---|---|---|---|
| API Key | Authorization: Bearer KEY or ?key=KEY |
Owner | API access, scripts |
| GitHub OAuth | Session cookie via /auth/github |
Owner/Viewer+/Viewer | Browser access |
| Viewer Token | ?viewer=TOKEN |
Viewer | Interview sharing (4hr TTL) |
| Role | How Assigned | Permissions |
|---|---|---|
| Owner | API key or GitHub username match | Full CRUD, admin actions, poke/sweep |
| Viewer | GitHub OAuth non-owner or viewer token | Demo-labeled conversations only, anonymized |
Viewer mode masks PII at runtime (no data duplication):
- Emails →
Recruiter #N - Phone numbers →
<phone-number> - External URLs →
<company-url>(safe URLs preserved) - Names stripped
- Generated by owner in Admin tab
- UUID-based, stored hashed in D1
- 4-hour TTL, can be revoked
- Expired tokens show a "sign in with GitHub" prompt
- Uses GitHub OAuth App (not GitHub App)
- Scope:
read:user(identifies GitHub username for role assignment) - Session stored in D1
sessionstable - Cookie:
hermes_session, HttpOnly, Secure, SameSite=Lax, 7-day TTL
- Hex-only characters (a-f, 0-9) -- special characters (+, /, =) get URL-mangled
- 32 bytes minimum:
openssl rand -hex 32 - Stored as Cloudflare secret (encrypted at rest)
- Local copy in
.env.local(gitignored)
| Secret | Risk If Leaked | Rotation |
|---|---|---|
DASHBOARD_API_KEY |
Full conversation + KB access | wrangler secret put DASHBOARD_API_KEY |
RESEND_API_KEY |
Can send email as martymcenroe.ai | Rotate in Resend dashboard |
GITHUB_CLIENT_ID |
OAuth app identity (low risk) | Regenerate in GitHub Developer Settings |
GITHUB_CLIENT_SECRET |
OAuth token exchange | Regenerate in GitHub Developer Settings |
AI_ROLLOUT_MODE |
Configuration only (low risk) | wrangler secret put AI_ROLLOUT_MODE |
wrangler secret put AI_ROLLOUT_MODE
# Type: offDisable email routing rules in Cloudflare Dashboard > Email Routing.
wrangler d1 execute hermes-db --command "UPDATE conversations SET state = 'GHOSTED' WHERE id = 123"The system prompt structure isolates untrusted content:
[SYSTEM PROMPT - hardcoded persona rules]
[STATE INSTRUCTIONS - from code, not user input]
[KNOWLEDGE BASE - curated by owner, from D1]
[STAR PUSH - from code logic]
---
[USER MESSAGE - recruiter's email body] <-- Only untrusted content
[ASSISTANT - previous AI responses]
[USER MESSAGE - latest email] <-- Only untrusted content
The recruiter's email is always in the user role. It is never injected into the system prompt. Knowledge base entries are curated by the owner through the dashboard.