v1.35.0 — Tiered Logging, Crash-Safe Vault, Runtime LLM Control
Tiered Logging, Vault Crash-Safety, Runtime LLM Control, and Security Hardening — deterministic log retention, single-file atomic vault format, operator-controlled backend selection, and a deep sweep of defense-in-depth fixes across dashboard, CLI, SQL, telegram, and brain.
Highlights
- Vault single-file v2 format (
ALSv2magic + 16-byte random salt + Fernet token) with embedded random salt and crash-safe atomic migration — zero corrupt-state window between salt and blob writes - Tiered structured logging with deterministic per-tier retention (long ~30d, short ~6h), hourly cron prune sweep, and unified
*_HOURSenv contract - Runtime LLM operator control — flip
cli↔apibackend per session via dashboard orPOST /api/operator/llmwithout restart - Telegram + CLI fail-closed guard — programming tasks on the CLI backend in sandbox-only mode return a deterministic operator-friendly message instead of hanging on an unreachable Claude Code permission prompt
- Headless CLI auto-approve (
AGENT_CLI_AUTO_APPROVEenv var, default detect TTY) — agents running as systemd/Docker daemons no longer hang on permission prompts - mypy 147 errors → 0 across 112 source files (full type safety)
Added
agent/logs/retention.py—LogRetentionManagerwith deterministic(level, event) → tierresolveragent/logs/logger.py::setup_tiered_logging—_TierRouterstdlib handler routing each structlog event to the right file sinkagent/control/llm_runtime.py— persistent operator override for LLM backend/provider- Anti-echo work-queue detector preventing pasted agent suggestions from spawning duplicate jobs
- Per-transaction
asyncio.Lockin finance tracker against concurrent approve races - Telegram in-flight task tracking with strong references (no GC mid-execution)
- Nonce cache age-based eviction so replay-protection state cannot grow unbounded
- CI release-readiness skip env (
AGENT_RELEASE_READINESS_SKIP_LLM_PROBE=1) docs/SETUP_LOCAL.mdoperator setup guidedocs/SECURITY_INCIDENT_2026-04-07.mdpost-mortem of credential leak via local conversation logs- 27+ new regression tests (vault, finance race, telegram cleanup, log retention, brain conversation)
Changed
- Vault on-disk format is now v2 single-file. Existing v1 vaults migrate automatically on first open.
- Vault wrong-key writes now fail-fast with
VaultDecryptionError AgentBrainreads effective LLM backend throughresolve_llm_runtime_state()so operator overrides actually flip execution path- Short follow-ups (
simple/factual/greetingtask types) now inject conversation context — one-word reply like "ano" no longer arrives at the model with no history - LLM provider cache key now includes kwargs (separate instances per
base_url/api_key) agent/build/storage.pyandagent/review/storage.py_ensure_text_columnvalidate identifiers against allow-list + regex with default literal escaping- Dashboard XSS escapes for note/updated_by/warnings/settlement_id, Bearer token only (no
?key=query string fallback) - Invalid JSON on operator HTTP endpoints returns 400 instead of silently treating body as
{} setup_tiered_loggingnow takeslong_retention_hours(unified contract withLogRetentionManager)
Fixed
- Headless CLI permission prompt hang (daemon mode)
- Wrong-key vault writes silently destroying the legacy encrypted blob (Codex finding, HIGH)
- v1→v2 vault migration crash window between
salt.binwrite andos.replace(Codex finding, MED) — eliminated by single-file format - Tiered logging factory routing — events now actually reach the file sinks
AgentBrainwas reading rawos.environ[\"LLM_BACKEND\"](operator overrides ignored)- Cron prune sweep was scanning a different directory than
__main__wrote to - Short follow-ups losing conversation history
- Multi-task work-queue detector spawning duplicate jobs from echoed agent suggestions
Security
- All SQL DDL paths in build + review layers use whitelist + identifier validation + escape
- Dashboard authentication is Bearer-token only
- Operator HTTP endpoints reject invalid JSON with 400
- Vault writes with wrong master key fail-fast
- Vault writes are atomic and crash-safe (single-file v2 format)
- Telegram in-flight task tracking prevents mid-execution GC
- Finance transaction approval race protected by per-tx
asyncio.Lock - Request nonce cache has bounded lifetime
Deprecations
AGENT_LOG_LONG_RETENTION_DAYSis deprecated in favor ofAGENT_LOG_LONG_RETENTION_HOURS. Both still work; setting only the legacy DAYS variable emits a deprecation warning and internally promotes to hours so the cron prune sweep agrees.
Migration Notes
Vault migration is automatic. When the agent boots on v1.35.0 with an existing vault, it detects the v1 format on first read, decrypts it (using salt.bin if post-1.34, or the static legacy salt if pre-1.34), re-encrypts with a fresh random salt in the new v2 format, and removes salt.bin. No operator action required.
For headless deployments (systemd / Docker / nohup), add to `.env`:
```
AGENT_CLI_AUTO_APPROVE=1
```
If you omit it, the agent auto-detects TTY (also works for daemon mode).
For Telegram + CLI backend + sandbox mode: programming tasks now return a deterministic operator message instead of hanging. Two unblock paths: `POST /api/operator/llm` to switch to the API backend, or set `AGENT_SANDBOX_ONLY=0` for explicit host opt-in.
Log retention env: if you have `AGENT_LOG_LONG_RETENTION_DAYS=30`, switch to `AGENT_LOG_LONG_RETENTION_HOURS=720` (30 × 24).
Tests
- 1762 passed, 4 skipped, 0 failures
- 129 security audit tests
- 27+ new regression tests in this release
Code Quality
- mypy: 147 errors → 0 across 112 source files
- ruff: 0 errors
Full changelog: CHANGELOG.md#1350--2026-04-08
🤖 Generated with Claude Code