Release v1.35.0 — Tiered Logging, Crash-Safe Vault, Runtime LLM Control · B2JK-Industry/Agent_Life_Space

Tiered Logging, Vault Crash-Safety, Runtime LLM Control, and Security Hardening — deterministic log retention, single-file atomic vault format, operator-controlled backend selection, and a deep sweep of defense-in-depth fixes across dashboard, CLI, SQL, telegram, and brain.

Highlights

Vault single-file v2 format (ALSv2 magic + 16-byte random salt + Fernet token) with embedded random salt and crash-safe atomic migration — zero corrupt-state window between salt and blob writes
Tiered structured logging with deterministic per-tier retention (long ~30d, short ~6h), hourly cron prune sweep, and unified *_HOURS env contract
Runtime LLM operator control — flip cli ↔ api backend per session via dashboard or POST /api/operator/llm without restart
Telegram + CLI fail-closed guard — programming tasks on the CLI backend in sandbox-only mode return a deterministic operator-friendly message instead of hanging on an unreachable Claude Code permission prompt
Headless CLI auto-approve (AGENT_CLI_AUTO_APPROVE env var, default detect TTY) — agents running as systemd/Docker daemons no longer hang on permission prompts
mypy 147 errors → 0 across 112 source files (full type safety)

Added

agent/logs/retention.py — LogRetentionManager with deterministic (level, event) → tier resolver
agent/logs/logger.py::setup_tiered_logging — _TierRouter stdlib handler routing each structlog event to the right file sink
agent/control/llm_runtime.py — persistent operator override for LLM backend/provider
Anti-echo work-queue detector preventing pasted agent suggestions from spawning duplicate jobs
Per-transaction asyncio.Lock in finance tracker against concurrent approve races
Telegram in-flight task tracking with strong references (no GC mid-execution)
Nonce cache age-based eviction so replay-protection state cannot grow unbounded
CI release-readiness skip env (AGENT_RELEASE_READINESS_SKIP_LLM_PROBE=1)
docs/SETUP_LOCAL.md operator setup guide
docs/SECURITY_INCIDENT_2026-04-07.md post-mortem of credential leak via local conversation logs
27+ new regression tests (vault, finance race, telegram cleanup, log retention, brain conversation)

Changed

Vault on-disk format is now v2 single-file. Existing v1 vaults migrate automatically on first open.
Vault wrong-key writes now fail-fast with VaultDecryptionError
AgentBrain reads effective LLM backend through resolve_llm_runtime_state() so operator overrides actually flip execution path
Short follow-ups (simple / factual / greeting task types) now inject conversation context — one-word reply like "ano" no longer arrives at the model with no history
LLM provider cache key now includes kwargs (separate instances per base_url / api_key)
agent/build/storage.py and agent/review/storage.py _ensure_text_column validate identifiers against allow-list + regex with default literal escaping
Dashboard XSS escapes for note/updated_by/warnings/settlement_id, Bearer token only (no ?key= query string fallback)
Invalid JSON on operator HTTP endpoints returns 400 instead of silently treating body as {}
setup_tiered_logging now takes long_retention_hours (unified contract with LogRetentionManager)

Fixed

Headless CLI permission prompt hang (daemon mode)
Wrong-key vault writes silently destroying the legacy encrypted blob (Codex finding, HIGH)
v1→v2 vault migration crash window between salt.bin write and os.replace (Codex finding, MED) — eliminated by single-file format
Tiered logging factory routing — events now actually reach the file sinks
AgentBrain was reading raw os.environ[\"LLM_BACKEND\"] (operator overrides ignored)
Cron prune sweep was scanning a different directory than __main__ wrote to
Short follow-ups losing conversation history
Multi-task work-queue detector spawning duplicate jobs from echoed agent suggestions

Security

All SQL DDL paths in build + review layers use whitelist + identifier validation + escape
Dashboard authentication is Bearer-token only
Operator HTTP endpoints reject invalid JSON with 400
Vault writes with wrong master key fail-fast
Vault writes are atomic and crash-safe (single-file v2 format)
Telegram in-flight task tracking prevents mid-execution GC
Finance transaction approval race protected by per-tx asyncio.Lock
Request nonce cache has bounded lifetime

Deprecations

AGENT_LOG_LONG_RETENTION_DAYS is deprecated in favor of AGENT_LOG_LONG_RETENTION_HOURS. Both still work; setting only the legacy DAYS variable emits a deprecation warning and internally promotes to hours so the cron prune sweep agrees.

Migration Notes

Vault migration is automatic. When the agent boots on v1.35.0 with an existing vault, it detects the v1 format on first read, decrypts it (using salt.bin if post-1.34, or the static legacy salt if pre-1.34), re-encrypts with a fresh random salt in the new v2 format, and removes salt.bin. No operator action required.

For headless deployments (systemd / Docker / nohup), add to `.env`:
```
AGENT_CLI_AUTO_APPROVE=1
```
If you omit it, the agent auto-detects TTY (also works for daemon mode).

For Telegram + CLI backend + sandbox mode: programming tasks now return a deterministic operator message instead of hanging. Two unblock paths: `POST /api/operator/llm` to switch to the API backend, or set `AGENT_SANDBOX_ONLY=0` for explicit host opt-in.

Log retention env: if you have `AGENT_LOG_LONG_RETENTION_DAYS=30`, switch to `AGENT_LOG_LONG_RETENTION_HOURS=720` (30 × 24).

Tests

1762 passed, 4 skipped, 0 failures
129 security audit tests
27+ new regression tests in this release

Code Quality

mypy: 147 errors → 0 across 112 source files
ruff: 0 errors

Full changelog: CHANGELOG.md#1350--2026-04-08

🤖 Generated with Claude Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.35.0 — Tiered Logging, Crash-Safe Vault, Runtime LLM Control

Choose a tag to compare

Sorry, something went wrong.