Skip to content

Releases: B2JK-Industry/Agent_Life_Space

v1.36.0 — Practical Memory + Self-Restart + Paraphrase Intents

09 Apr 22:48
e115a0f

Choose a tag to compare

Telegram practical memory, end-to-end self-update + restart, and a comprehensive paraphrase-aware intent layer.

Closes the operator complaints from a real production session: "agent doesn't remember the previous message after a fast-path reply", "stiahni si novu verziu a nasad to fell through to a 180s timeout", "raw Chyba: {\"type\":\"result\",\"subtype\":\"errormaxturns\",...} JSON leaked into chat", and "I want to update + restart from Telegram without SSH-ing into the server".

Highlights

End-to-end Telegram self-update + restart

A single owner-only Telegram message — stiahni si novu verziu a nasad to, update yourself, git pull a reštart, or any of ~30 free-form variants — now performs git fetchgit pull --ff-only → graceful drain → os._exit(0). Systemd / supervisord / docker brings up a fresh process with the new code.

Opt-in via AGENT_SELF_RESTART_AFTER_UPDATE=1 and gated against a detected supervisor (INVOCATION_ID from systemd, AGENT_PROCESS_SUPERVISOR operator override, SUPERVISOR_ENABLED for supervisord, container env / /.dockerenv for docker / kubernetes). The agent refuses to self-kill in an unsupervised environment.

Setup: copy deploy/agent-life-space.service to /etc/systemd/system/, edit user/group/paths (4 lines), systemctl daemon-reload && systemctl enable --now agent-life-space. Full operator handbook in docs/SELF_UPDATE.md.

Practical conversation memory

Every reply path (16 deterministic intents, dispatcher, semantic cache, RAG direct hit, work-queue acknowledgement, deny-guard, main LLM) now writes through one AgentBrain._finalize_reply() so the in-RAM tail and the persistent SQLite store stay in sync. The first message after a process restart hydrates the in-RAM tail from SQLite. The fixes the production complaint that a "hi" → intent reply → "what did I just say?" sequence had zero history when the second message reached the LLM.

In-RAM tail bumped 10 → 20 turns. SQL ordering bug fixed (ORDER BY id DESC instead of timestamp DESC).

Per-chat reply ordering

TelegramBot._chat_locks keeps the reply ordering deterministic inside any single chat while leaving different chats concurrent. Two messages from the same chat can no longer overtake each other.

Paraphrase-aware intent detection

Two precision-first heuristic fallbacks (_looks_like_self_update_question, _looks_like_self_update_imperative) catch free-form variants like "vraj máš novú verziu kde si schopný si aj nasadiť nové veci k sebe je to tak ?" or "stiahni si nový kód z githubu a nahoď ho" without false positives. The negative test set includes the most likely false positives ("stiahni mi obrázok", "pull the milk from the fridge", "aktualizuj toto pdf") to make sure they stay rejected.

Comprehensive timeout normalization

errormaxturns, tooluse, raw result JSON, CLI timeout after Xs, deadline exceeded, asyncio.TimeoutError, request_timeout=N, read timed out — all map to a friendly user-facing sentence. Wired into both the brain path AND the legacy _handle_text path so the production Chyba: {"type":"result","subtype":"errormaxturns",...} leak cannot recur.

New deterministic intents

  • memory_list ("ake su tvoje spomienky?") — reads the live memory store and produces a grounded listing with real counts.
  • context_recall ("prečo si začal s touto temou?", "o čom sme sa bavili?") — reads the in-RAM chat tail and lists the prior turns literally, no LLM.

Both run before the LLM and are zero-cost.

Stats

  • 1967 tests passing (was 1833 before this release; +134 new)
  • 76 new tests in tests/test_telegram_intents.py — parametrized intent detection, brain integration with provider strict-mock, error normalization, paraphrase positives + negatives
  • 16 new tests in tests/test_brain_persistence.py — hydration, idempotent finalize, tail bound, intent persistence
  • 7 new tests in tests/test_telegram_bot_ordering.py — per-chat lock, same-chat serialization, cross-chat concurrency
  • 11 new tests in tests/test_self_update.py — self-restart opt-in, supervisor detection, dirty / up-to-date / diverged refusal
  • ruff check — All checks passed!
  • mypy — 0 issues across 115 source files

Migration notes

  • No breaking changes. Self-restart is opt-in via AGENT_SELF_RESTART_AFTER_UPDATE=1, default OFF. Existing deployments behave identically until the operator opts in.
  • To enable end-to-end Telegram update + restart: see the systemd / supervisord / docker recipes in docs/SELF_UPDATE.md.
  • Default response language is now English. Set AGENT_DEFAULT_LANGUAGE=Slovak (or any other language) to override. The model still matches the user's language on signal regardless.

Bootstrap from a v1.35.0 deployment

Self-update is code; the code has to exist on the server before the agent can use it. v1.35.0 cannot pull v1.36.0 by itself. One last manual bootstrap:

ssh your-server
cd ~/Agent_Life_Space
git pull origin main
sudo cp deploy/agent-life-space.service /etc/systemd/system/
# edit User/Group/WorkingDirectory/EnvironmentFile (4 lines)
sudo systemctl daemon-reload
sudo systemctl enable --now agent-life-space

After this, every future update is just one Telegram message: stiahni si novu verziu a nasad to.

Full per-fix breakdown in CHANGELOG.md.

v1.35.0 — Tiered Logging, Crash-Safe Vault, Runtime LLM Control

08 Apr 21:31
e77f1b7

Choose a tag to compare

Tiered Logging, Vault Crash-Safety, Runtime LLM Control, and Security Hardening — deterministic log retention, single-file atomic vault format, operator-controlled backend selection, and a deep sweep of defense-in-depth fixes across dashboard, CLI, SQL, telegram, and brain.

Highlights

  • Vault single-file v2 format (ALSv2 magic + 16-byte random salt + Fernet token) with embedded random salt and crash-safe atomic migration — zero corrupt-state window between salt and blob writes
  • Tiered structured logging with deterministic per-tier retention (long ~30d, short ~6h), hourly cron prune sweep, and unified *_HOURS env contract
  • Runtime LLM operator control — flip cliapi backend per session via dashboard or POST /api/operator/llm without restart
  • Telegram + CLI fail-closed guard — programming tasks on the CLI backend in sandbox-only mode return a deterministic operator-friendly message instead of hanging on an unreachable Claude Code permission prompt
  • Headless CLI auto-approve (AGENT_CLI_AUTO_APPROVE env var, default detect TTY) — agents running as systemd/Docker daemons no longer hang on permission prompts
  • mypy 147 errors → 0 across 112 source files (full type safety)

Added

  • agent/logs/retention.pyLogRetentionManager with deterministic (level, event) → tier resolver
  • agent/logs/logger.py::setup_tiered_logging_TierRouter stdlib handler routing each structlog event to the right file sink
  • agent/control/llm_runtime.py — persistent operator override for LLM backend/provider
  • Anti-echo work-queue detector preventing pasted agent suggestions from spawning duplicate jobs
  • Per-transaction asyncio.Lock in finance tracker against concurrent approve races
  • Telegram in-flight task tracking with strong references (no GC mid-execution)
  • Nonce cache age-based eviction so replay-protection state cannot grow unbounded
  • CI release-readiness skip env (AGENT_RELEASE_READINESS_SKIP_LLM_PROBE=1)
  • docs/SETUP_LOCAL.md operator setup guide
  • docs/SECURITY_INCIDENT_2026-04-07.md post-mortem of credential leak via local conversation logs
  • 27+ new regression tests (vault, finance race, telegram cleanup, log retention, brain conversation)

Changed

  • Vault on-disk format is now v2 single-file. Existing v1 vaults migrate automatically on first open.
  • Vault wrong-key writes now fail-fast with VaultDecryptionError
  • AgentBrain reads effective LLM backend through resolve_llm_runtime_state() so operator overrides actually flip execution path
  • Short follow-ups (simple / factual / greeting task types) now inject conversation context — one-word reply like "ano" no longer arrives at the model with no history
  • LLM provider cache key now includes kwargs (separate instances per base_url / api_key)
  • agent/build/storage.py and agent/review/storage.py _ensure_text_column validate identifiers against allow-list + regex with default literal escaping
  • Dashboard XSS escapes for note/updated_by/warnings/settlement_id, Bearer token only (no ?key= query string fallback)
  • Invalid JSON on operator HTTP endpoints returns 400 instead of silently treating body as {}
  • setup_tiered_logging now takes long_retention_hours (unified contract with LogRetentionManager)

Fixed

  • Headless CLI permission prompt hang (daemon mode)
  • Wrong-key vault writes silently destroying the legacy encrypted blob (Codex finding, HIGH)
  • v1→v2 vault migration crash window between salt.bin write and os.replace (Codex finding, MED) — eliminated by single-file format
  • Tiered logging factory routing — events now actually reach the file sinks
  • AgentBrain was reading raw os.environ[\"LLM_BACKEND\"] (operator overrides ignored)
  • Cron prune sweep was scanning a different directory than __main__ wrote to
  • Short follow-ups losing conversation history
  • Multi-task work-queue detector spawning duplicate jobs from echoed agent suggestions

Security

  • All SQL DDL paths in build + review layers use whitelist + identifier validation + escape
  • Dashboard authentication is Bearer-token only
  • Operator HTTP endpoints reject invalid JSON with 400
  • Vault writes with wrong master key fail-fast
  • Vault writes are atomic and crash-safe (single-file v2 format)
  • Telegram in-flight task tracking prevents mid-execution GC
  • Finance transaction approval race protected by per-tx asyncio.Lock
  • Request nonce cache has bounded lifetime

Deprecations

  • AGENT_LOG_LONG_RETENTION_DAYS is deprecated in favor of AGENT_LOG_LONG_RETENTION_HOURS. Both still work; setting only the legacy DAYS variable emits a deprecation warning and internally promotes to hours so the cron prune sweep agrees.

Migration Notes

Vault migration is automatic. When the agent boots on v1.35.0 with an existing vault, it detects the v1 format on first read, decrypts it (using salt.bin if post-1.34, or the static legacy salt if pre-1.34), re-encrypts with a fresh random salt in the new v2 format, and removes salt.bin. No operator action required.

For headless deployments (systemd / Docker / nohup), add to `.env`:
```
AGENT_CLI_AUTO_APPROVE=1
```
If you omit it, the agent auto-detects TTY (also works for daemon mode).

For Telegram + CLI backend + sandbox mode: programming tasks now return a deterministic operator message instead of hanging. Two unblock paths: `POST /api/operator/llm` to switch to the API backend, or set `AGENT_SANDBOX_ONLY=0` for explicit host opt-in.

Log retention env: if you have `AGENT_LOG_LONG_RETENTION_DAYS=30`, switch to `AGENT_LOG_LONG_RETENTION_HOURS=720` (30 × 24).

Tests

  • 1762 passed, 4 skipped, 0 failures
  • 129 security audit tests
  • 27+ new regression tests in this release

Code Quality

  • mypy: 147 errors → 0 across 112 source files
  • ruff: 0 errors

Full changelog: CHANGELOG.md#1350--2026-04-08

🤖 Generated with Claude Code

v1.34.0

02 Apr 12:42
e2ef230

Choose a tag to compare

Self-Host Onboarding Closure

  • closes the v1.34.0 self-host onboarding slice with safer runtime defaults and stronger setup diagnostics
  • fresh installs now prefer .agent_runtime while legacy installs with existing runtime data under agent/ stay compatible
  • setup doctor now surfaces project_root, data_dir, identity profile path, pidfile path, and stronger self-host warnings
  • aligns CLI --data-dir and AGENT_DATA_DIR behavior across status, readiness, and operator flows
  • fixes audit-discovered regressions around description-only build fallback, provider/settlement wiring, Telegram operator UX, and release-readiness labeling
  • refreshes README, env example, operator docs, and strategy docs to the v1.34.0 baseline

Verification

  • ruff check .
  • pytest -q -> 1668 passed, 4 skipped
  • targeted self-host regression suite -> 190 passed
  • npm --prefix operator run typecheck
  • python -m agent --setup-doctor --data-dir .agent_runtime
  • python -m agent --release-readiness --release-readiness-release-label v1.34.0 --data-dir .agent_runtime -> ready: true

Residual setup work after deploy

  • set AGENT_NAME
  • set AGENT_SERVER_NAME
  • configure remaining gateway routes for the target environment
  • run live in-network Telegram/dashboard/API smoke on the home server

v1.33.0 — Docker-Isolated Build Execution

01 Apr 21:47

Choose a tag to compare

Docker-Isolated Build Execution with Auto-Fix Retry

What's New

Docker Project Executor (agent/build/docker_executor.py)

  • /build now runs generated projects entirely inside Docker containers
  • Phases: pip install (with network) → pytest (no network) → ruff lint
  • Safety: 512MB RAM, 1 CPU, 5min timeout, read-only mount

Auto-Fix Retry Loop

  • When tests fail, Opus receives test output + source code
  • Generates fixed code → re-runs tests in Docker → up to 2 retries
  • Full cycle each retry: write → install → test

Build Pipeline Integration

  • Codegen-produced builds route through Docker executor
  • Falls back to host verification if Docker unavailable
  • Docker results stored in job metadata

Improved Reporting

  • Telegram/API shows: files, deps, tests, lint, retries, LLM cost
  • Failed builds show test output for debugging

End-to-End Flow

/build . --description "URL shortener with FastAPI"
  → Opus generates 6+ files ($0.15)
  → Docker: pip install deps
  → Docker: pytest (isolated, no network)
  → If fail: Opus fixes → retry
  → Report: files=6 | tests=PASS | lint=PASS

Full Changelog: v1.32.0...v1.33.0

v1.32.0 — LLM Build Pipeline

01 Apr 21:22

Choose a tag to compare

LLM Build Pipeline — Description-Driven Code Generation

What's New

LLM Code Generation (agent/build/codegen.py)

  • /build . --description "..." now generates complete implementation files via Opus
  • Bridges natural language descriptions to deterministic WRITE_FILE build operations
  • Robust JSON parser handles markdown fences, newlines, trailing commas
  • Safety: only WRITE_FILE ops, relative paths, max operation cap

Bilingual Task Classification

  • All classifier keyword sets include EN + SK equivalents
  • Technical terms + intent verb combo signal for accurate Opus routing
  • Sonnet max_turns: 3 → 5

API & Channel Trust

  • Authenticated API callers get terminal-level trust (no response filtering)
  • API timeout scales with complexity (300s programming, 90s chat)
  • Sandbox-first: AGENT_SANDBOX_ONLY=1 downgrades instead of blocking

Bug Fixes

  • PlanRecordStatus.FAILED added (was crashing on failed builds)
  • Failed builds now show real status, not "blocked"
  • Telegram shows job details on failure

Tests

  • 12 new tests for codegen parsing and validation
  • Full suite: 1643+ tests passing

Full Changelog: v1.31.0...v1.32.0

v1.31.0 — Runtime Contract Closure

01 Apr 16:05
122b152

Choose a tag to compare

Runtime Contract Closure

Closes the remaining auth, coupling, and extraction-readiness gaps.

Dashboard Authentication

  • /dashboard now requires API key (header or ?key= query param)
  • Unauthenticated access → minimal login page (no HTML leak)

Public API Discipline

  • Settlement service no longer accesses ._storage private attribute
  • Archival API uses get_storage_for_archival() public method
  • ControlPlaneStateService exposes settlement + archival public methods

Extraction Readiness

  • OperatorReportService gets settlement_service at construction time
  • No post-init private attribute mutation in orchestrator

Stats

  • 1631 tests pass, 4 skipped
  • 14 deployment contract tests (4 new)

Full Phase 4 Summary (v1.25.1 → v1.31.0)

Version Focus
v1.25.1 Production hardening (rate limits, persistence, cache)
v1.26.0 CI invariants, retention, policy migration
v1.27.0 Operator REST API + archival
v1.28.0-1 Dashboard + settlement foundation + regression fixes
v1.29.0 Settlement workflow closure (persistence, retry, dashboard)
v1.30.0 Deployment contract hardening (deny-by-default, config)
v1.31.0 Runtime contract closure (auth, public API, extraction)

v1.30.0 — Deployment Contract Hardening

01 Apr 14:45
f2345c0

Choose a tag to compare

Deployment Contract Hardening

Makes the agent safer and more predictable for self-host deployment.

Deny-by-Default Enforcement

  • Removed AGENT_DEV_MODE bypass from review and build delivery approval
  • Policy enforcement is no longer environment-dependent — delivery without approval queue is always denied
  • No runtime path allows bypassing approval through env vars

Explicit Configuration

  • paths.py raises RuntimeError if no valid project root (no silent ~/.agent-life-space fallback)
  • Pidfile configurable via AGENT_PIDFILE_PATH env var
  • Vault exposes is_ready property for startup health checks
  • Startup config summary logged on boot: project root, API port, vault state, docker, sandbox mode

Reduced Hidden Coupling

  • Docker availability stored as agent attribute (not env var mutation)
  • Gateway payment callback passed at construction time (not post-init attribute mutation)
  • Sandbox default via setdefault (not bracket assignment)

Stats

  • 1627 tests pass, 4 skipped
  • 10 new deployment contract tests
  • 10 files changed

Full Changelog

See CHANGELOG.md for v1.25.1 through v1.30.0 breakdown.

v1.29.0 — Settlement Workflow Closure

01 Apr 13:29

Choose a tag to compare

Settlement Workflow Closure

Turns payment settlement from a foundation-level service into an operator-ready workflow.

What's New

Persistence

  • Settlement requests stored in SQLite — survive agent restart
  • Automatic load-on-init, persist-on-every-state-change

Operator Workflow

  • POST /api/operator/settlements/{id}/approve — approve topup
  • POST /api/operator/settlements/{id}/deny — deny payment
  • POST /api/operator/settlements/{id}/execute — topup + auto-retry original call
  • GET /api/operator/settlements?status=pending — list with filter

Approved Retry Loop

  • Successful wallet topup automatically retries the original API call
  • Original request context preserved through the entire workflow
  • Retry result included in execute response

Gateway Auto-Detection

  • HTTP 402 responses automatically create settlement requests
  • Operator sees pending settlements immediately in API, dashboard, and Telegram

Dashboard UI

  • Settlements section with Approve / Deny / Execute buttons
  • Real-time refresh after actions
  • Status badges (pending → approved → executed)

Reporting

  • settlement_attention items appear in operator inbox for pending settlements

Stats

  • 1617 tests pass, 4 skipped
  • 9 new tests for settlement persistence, retry loop, API actions
  • 11 files modified

Exit Criteria Met (from NEXT_BACKLOG.md)

  • ✅ Settlement requests survive restart
  • ✅ Operators can list/approve/deny from API and dashboard
  • ✅ Payment-required failures move through approved retry path
  • ✅ Reporting shows pending settlement attention

Full Changelog

See CHANGELOG.md for detailed breakdown.

v1.28.1 — Phase 4 Enterprise Hardening Complete

01 Apr 10:58
f3f6b80

Choose a tag to compare

Phase 4 Enterprise Hardening — Complete Release

This release consolidates all Phase 4 work (v1.25.1 through v1.28.1) into a single verified baseline.

Highlights

Operator Control Plane

  • 12 authenticated REST API endpoints under /api/operator/
  • Self-contained HTML dashboard at /dashboard with real-time metrics
  • Archive CSV export + download with path-traversal protection
  • Settlement Telegram commands (/settlement, approve, deny)

Enterprise Hardening

  • 26 CI-enforced architecture invariant tests (blocking gate)
  • Automated retention pruning (6h) + nightly data cleanup
  • Rate limit: 60/min localhost, 10/min external
  • Telemetry auto-recording (hourly snapshots)
  • Workflow + Pipeline SQLite persistence

Unified Policy Boundary

  • 5/9 callers migrated to evaluate_runtime_action()
  • RuntimePolicyDecision enriched with resolved_policy + policy_metadata
  • Gateway, review, build callers routed through unified boundary

Production Fixes

  • /api/operator/report wiring fix (was passing wrong args to OperatorReportService)
  • Memory injection: provenance/kind filtering, truthful framing
  • Structured 400 responses on all operator API query params
  • Pipeline control_plane_statecontrol_plane fix
  • Archival: deployment-safe paths, no host filesystem leak

Payment Settlement Foundation

  • Service wired into orchestrator with API endpoint
  • Telegram approve/deny surface
  • Foundation-level: no auto-retry loop, in-memory state, no dashboard UI yet

Stats

  • 1608 tests pass, 4 skipped
  • ~2500 lines of new production code
  • 57 new tests across operator API, dashboard, settlement, archival
  • 15 files modified in core agent

Full Changelog

See CHANGELOG.md for detailed per-version breakdown (v1.25.1 → v1.28.1).

v1.24.1 — Phase 3 complete + runtime bug fixes

31 Mar 07:39

Choose a tag to compare

Phase 3 Features (v1.22.0 → v1.24.0)

v1.22.0 — Provider Delivery Workflow + Runtime Telemetry

  • Enriched /deliver with provider outcome, receipt, attention, retry, outcome filters
  • /report delivery sub-command
  • TelemetrySnapshot model with job throughput, latency, cost, delivery health
  • /telemetry [hours] command with trend detection

v1.23.0 — Seller-Side Obolos + Multi-Provider Gateway + Architecture Invariants

  • seller_publish_v1 and wallet_topup_v1 capability routes
  • Multi-provider resolution: list_providers_for_capability(), call_api_across_providers()
  • 22 architecture invariant enforcement tests

v1.24.0 — File Upload + x402 Payment Handling

  • Multipart/form-data support in gateway HTTP layer
  • marketplace_upload_v1 capability for file upload APIs
  • _extract_x402_payment_metadata() for structured 402 response parsing

Bug Fixes (v1.24.1)

  • #1 "Zapamätaj si" errormaxturns — dispatcher handles memory storage directly
  • #2 Slovak queries bypassing dispatcher — expanded patterns
  • #3 LLM confabulation — runtime facts injection
  • #4 Channel policy blocking owner INTERNAL responses
  • #5 /queue KeyError 'total_processed'
  • #6 /jobs showing "? ?" instead of job name
  • #7 /report showing "0 done"

Stats

  • 1517 tests, 0 failures
  • Phase 3 backlog: fully closed (all P0-P2 items delivered)