Cyrene implements defense-in-depth through multiple security layers:
- Autonomy Policy — risk classification and default controls
- Sandboxing — OS-level confinement to workspace boundary
- Shadow Execution — dry-run before irreversible actions
- Approval Gates — human-in-the-loop for high-stakes decisions
- Receipt Ledger — immutable, signed audit trail
- Injection Scanner — defense against prompt injection attacks
| Risk | Default | Behavior |
|---|---|---|
| Low | Auto | Executes automatically |
| Medium | Approval | Requires user approval |
| High | Blocked | Blocked until autonomy is explicitly raised |
All tool execution runs within an OS-level sandbox:
- Linux: Landlock LSM
- macOS: Seatbelt/sandbox-exec
- Windows: Job Objects/restricted tokens
- Fallback: Docker container
The sandbox confines file and process access to the configured workspace boundary.
Before any irreversible action, Cyrene:
- Runs the plan in shadow mode (sandbox copy)
- Produces a Projected Outcome Summary
- Presents Approve / Rewrite / Abort options
- Persists pending state across restarts
- Times out and cancels on no response
Every action produces a signed, hash-chained receipt:
- SHA-256 hash chain detects any tampering
- Ed25519 signatures verify authenticity
- Append-only — no deletions or reordering
- Verifiable —
verify()walks the chain and reports the first divergence
All untrusted content (web pages, tool output, external messages) passes through:
- Heuristic pattern matching for known injection techniques
- Role-switch detection
- Exfiltration pattern detection
- Quarantine and logging on detection
Cyrene's knowledge graph is guarded by two independent trust boundaries, so a poisoned web page or a hijacked session can never quietly rewrite what Cyrene remembers:
- Untrusted content is scanned before it can be stored. Anything that comes from outside the user — a fetched web page, tool output, an inbound message — passes the Injection Scanner before it reaches memory, and is neutralized on recall. It can never resurface later as a smuggled instruction.
- Memory is owned by the authenticated user. Reads and writes are scoped to the owner, so a spoofed or hijacked session is refused at the write. Only the owner can read or rewrite their facts and user model.
- Never raise autonomy without reviewing the implications
- Keep the command allowlist minimal
- Regularly run
cyrene doctorto check security posture - Review the receipt ledger periodically for unexpected actions