Release Release: v0.6.0 (#998) · Abilityai/trinity

v0.6.0
521c793
Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode
Choose a tag to compare

Filter

View all tags
v0.6.0
521c793
Choose a tag to compare

Filter

View all tags
Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode
vybe tagged this 01 Jun 16:16
* refactor(frontend): sweep auth + channels + top-level views (#554) (#609)

* refactor(frontend): sweep auth + channels + top-level views (#554)

Slice 4 — bundles three subdomains into one PR per request:

  AUTH (3 files, 24 migrations)
    - Login.vue           — error icon
    - SetupPassword.vue   — password match indicator, requirement checklist,
                            error banner, 6-tier strength visualization
                            (red/red/orange/yellow/green/green)
    - MobileAdmin.vue     — logout, status dot, fleet running/high-context
                            counts, chat thinking status

  CHANNELS (5 files, 23 migrations)
    - PublicLinksPanel    — Active badge, expired text, Slack
                            connected/enable/disable/delete icons,
                            form error, delete-confirm modal,
                            success toast
    - WhatsAppChannelPanel — connected dot, sandbox badge, disconnect btn,
                             webhook warning, success/error message
    - TelegramChannelPanel — connected dot, disconnect btn, webhook
                             warning, group-remove, success/error message
    - SlackChannelPanel   — connected dot, disconnect btn, success/error
    - SharingPanel        — Approve button, success/error message,
                            remove button

  TOP-LEVEL VIEWS (4 files, 33 migrations)
    - Dashboard.vue       — running count + dot, message count, clear-tags,
                            connection status dot, history badge,
                            live-feed indicator, message arrow icon
    - PublicChat.vue      — agent online dot, invalid-link icon + bg,
                            agent-unavailable icon + bg, verify error,
                            chat error
    - OperatingRoom.vue   — empty-state success indicator
    - Templates.vue       — error icon

80 token replacements; net diff +80 / -80 (all 1:1 palette aliases).

Deferred (30 raw refs remain; all need new token families):
  - Login: 8 blue (primary action buttons + focus rings)
  - Dashboard: 10 (blue actions, purple tag-cloud button, blue selected-tab)
  - OperatingRoom: 5 (blue selected-tab indicators)
  - PublicChat: 5 (amber AUTO badge, rose READ-ONLY badge, indigo loading)
  - WhatsApp + Sharing: 2 (amber "deployment prerequisite" notices)

These map to pending `action-primary`, `state-selected`, and an accent
expansion that's tracked under #555 follow-up territory.

Tests:
  No new specs — these routes are either auth pages (covered by
  auth.setup), already smoke-tested (Templates), or require fixtures
  (PublicChat needs a public link token; MobileAdmin lives at /m).
  Existing 6 @smoke tests cover the high-traffic routes.

Refs #554

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(vite): tighten /api proxy prefix to /api/

`/api` (no trailing slash) is a path-prefix match in http-proxy-middleware,
so the SPA route `/api-keys` was being captured by the proxy and forwarded
to the backend, returning 404 in dev mode.

This silently broke the `/api-keys` @smoke e2e test on every PR since the
test was added in #597 — that PR's frontend-e2e check failed on merge but
wasn't required, so the failure was missed.

Backend endpoints all live under `/api/...` (with slash), so the tighter
prefix preserves all real proxy traffic and only excludes the SPA route.

Verified locally: 7/7 @smoke tests pass after this change (was 6/7 with
/api-keys failing).

Refs #554 #556

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(frontend): sweep cross-cutting + chat + file-mgr + process + misc (#554) (#623)

* refactor(frontend): sweep auth + channels + top-level views (#554)

Slice 4 — bundles three subdomains into one PR per request:

  AUTH (3 files, 24 migrations)
    - Login.vue           — error icon
    - SetupPassword.vue   — password match indicator, requirement checklist,
                            error banner, 6-tier strength visualization
                            (red/red/orange/yellow/green/green)
    - MobileAdmin.vue     — logout, status dot, fleet running/high-context
                            counts, chat thinking status

  CHANNELS (5 files, 23 migrations)
    - PublicLinksPanel    — Active badge, expired text, Slack
                            connected/enable/disable/delete icons,
                            form error, delete-confirm modal,
                            success toast
    - WhatsAppChannelPanel — connected dot, sandbox badge, disconnect btn,
                             webhook warning, success/error message
    - TelegramChannelPanel — connected dot, disconnect btn, webhook
                             warning, group-remove, success/error message
    - SlackChannelPanel   — connected dot, disconnect btn, success/error
    - SharingPanel        — Approve button, success/error message,
                            remove button

  TOP-LEVEL VIEWS (4 files, 33 migrations)
    - Dashboard.vue       — running count + dot, message count, clear-tags,
                            connection status dot, history badge,
                            live-feed indicator, message arrow icon
    - PublicChat.vue      — agent online dot, invalid-link icon + bg,
                            agent-unavailable icon + bg, verify error,
                            chat error
    - OperatingRoom.vue   — empty-state success indicator
    - Templates.vue       — error icon

80 token replacements; net diff +80 / -80 (all 1:1 palette aliases).

Deferred (30 raw refs remain; all need new token families):
  - Login: 8 blue (primary action buttons + focus rings)
  - Dashboard: 10 (blue actions, purple tag-cloud button, blue selected-tab)
  - OperatingRoom: 5 (blue selected-tab indicators)
  - PublicChat: 5 (amber AUTO badge, rose READ-ONLY badge, indigo loading)
  - WhatsApp + Sharing: 2 (amber "deployment prerequisite" notices)

These map to pending `action-primary`, `state-selected`, and an accent
expansion that's tracked under #555 follow-up territory.

Tests:
  No new specs — these routes are either auth pages (covered by
  auth.setup), already smoke-tested (Templates), or require fixtures
  (PublicChat needs a public link token; MobileAdmin lives at /m).
  Existing 6 @smoke tests cover the high-traffic routes.

Refs #554

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(vite): tighten /api proxy prefix to /api/

`/api` (no trailing slash) is a path-prefix match in http-proxy-middleware,
so the SPA route `/api-keys` was being captured by the proxy and forwarded
to the backend, returning 404 in dev mode.

This silently broke the `/api-keys` @smoke e2e test on every PR since the
test was added in #597 — that PR's frontend-e2e check failed on merge but
wasn't required, so the failure was missed.

Backend endpoints all live under `/api/...` (with slash), so the tighter
prefix preserves all real proxy traffic and only excludes the SPA route.

Verified locally: 7/7 @smoke tests pass after this change (was 6/7 with
/api-keys failing).

Refs #554 #556

* refactor(frontend): sweep cross-cutting + chat + file-mgr + process + misc (#554)

Slice 7 stacked on #609. Migrates 24 files spanning the small-domain pool:

  CHAT (4 files)
    ChatBubble  — copy-success checkmark, self-task accent panel (purple)
    ChatPanel   — agent-not-running warning state, error banner
    ChatInput   — file-remove button, voice-active recording indicator
    ChatHistoryDropdown — error state

  FILE MANAGER (4 files)
    FileManager        — notification toast (success/error), no-agents warning,
                         loading error, delete button + modal + confirm action
    FileTreeNode       — search-matched row highlight (file-type icons stay raw
                         decorative; need their own accent palette later)
    FilePreview        — preview-error icon + text
    FileSharingPanel   — Revoke button

  PROCESS (3 files)
    TrendChart        — completed/failed/cost bars (chart series + legend),
                         success-rate threshold ladder
    RoleMatrix        — no-executor row + badge
    TemplateSelector  — category badges (business/devops/support → status-info /
                         accent-purple / status-urgent)

  CROSS-CUTTING / MODALS (10 files)
    NavBar               — Ops critical-pulse + high indicator, WS connected dot
    GitConflictModal     — yellow warning header (×2), all destructive (red) options
    ReplayTimeline       — system-agent purple panel + badge, schedule-marker arrow,
                            live-feed dot, activity-state success rate ladder
    UnifiedActivityPanel — running/success/fail indicators (live + modal)
    OnboardingChecklist  — completed-state styling (ring, bg, indicator, text)
    CreateAgentModal     — templates-error + general error
    ConfirmDialog        — danger/warning variant icons, text, confirm buttons
    ResourceModal        — (no migrations — amber notice, deferred)
    AvatarGenerateModal  — error text, remove-avatar button
    HelpChatWidget       — error banner + retry button

  MISC (3 files)
    YamlEditor          — error and warning banners + counts + success checkmark
    EditorHelpPanel     — required-field indicator
    TerminalPanelContent — restart-required notice, start-agent button
    TagsEditor          — error message

Net diff: 24 files, +106 / -106 (1:1 palette aliases, byte-identical CSS).

Deferred (existing pattern):
  - Indigo / blue primary action buttons (Login & elsewhere)
  - Blue selected-state (NavBar tabs, OperatingRoom tabs)
  - Amber notices (ResourceModal, RoleMatrix amber missing-role marker —
    these still use the amber palette which differs from yellow)
  - File-type icon colors in FileTreeNode (decorative, need accent-yellow /
    accent-purple-blue / etc. — folder ≠ warning, video ≠ accent, etc.)
  - Slack/Telegram/WhatsApp logo brand colors

These map to the pending `action-primary`, `state-selected`, and accent-color-
expansion tickets.

Verified locally:
  - npm run check:tokens                         passes (10 tokens valid)
  - npm run build                                passes
  - npm run test:e2e:smoke (7 tests, 7.9s)       all green

Refs #554

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(frontend): sweep agent surfaces — 22 files (#554) (#614)

* refactor(frontend): sweep auth + channels + top-level views (#554)

Slice 4 — bundles three subdomains into one PR per request:

  AUTH (3 files, 24 migrations)
    - Login.vue           — error icon
    - SetupPassword.vue   — password match indicator, requirement checklist,
                            error banner, 6-tier strength visualization
                            (red/red/orange/yellow/green/green)
    - MobileAdmin.vue     — logout, status dot, fleet running/high-context
                            counts, chat thinking status

  CHANNELS (5 files, 23 migrations)
    - PublicLinksPanel    — Active badge, expired text, Slack
                            connected/enable/disable/delete icons,
                            form error, delete-confirm modal,
                            success toast
    - WhatsAppChannelPanel — connected dot, sandbox badge, disconnect btn,
                             webhook warning, success/error message
    - TelegramChannelPanel — connected dot, disconnect btn, webhook
                             warning, group-remove, success/error message
    - SlackChannelPanel   — connected dot, disconnect btn, success/error
    - SharingPanel        — Approve button, success/error message,
                            remove button

  TOP-LEVEL VIEWS (4 files, 33 migrations)
    - Dashboard.vue       — running count + dot, message count, clear-tags,
                            connection status dot, history badge,
                            live-feed indicator, message arrow icon
    - PublicChat.vue      — agent online dot, invalid-link icon + bg,
                            agent-unavailable icon + bg, verify error,
                            chat error
    - OperatingRoom.vue   — empty-state success indicator
    - Templates.vue       — error icon

80 token replacements; net diff +80 / -80 (all 1:1 palette aliases).

Deferred (30 raw refs remain; all need new token families):
  - Login: 8 blue (primary action buttons + focus rings)
  - Dashboard: 10 (blue actions, purple tag-cloud button, blue selected-tab)
  - OperatingRoom: 5 (blue selected-tab indicators)
  - PublicChat: 5 (amber AUTO badge, rose READ-ONLY badge, indigo loading)
  - WhatsApp + Sharing: 2 (amber "deployment prerequisite" notices)

These map to pending `action-primary`, `state-selected`, and an accent
expansion that's tracked under #555 follow-up territory.

Tests:
  No new specs — these routes are either auth pages (covered by
  auth.setup), already smoke-tested (Templates), or require fixtures
  (PublicChat needs a public link token; MobileAdmin lives at /m).
  Existing 6 @smoke tests cover the high-traffic routes.

Refs #554

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(vite): tighten /api proxy prefix to /api/

`/api` (no trailing slash) is a path-prefix match in http-proxy-middleware,
so the SPA route `/api-keys` was being captured by the proxy and forwarded
to the backend, returning 404 in dev mode.

This silently broke the `/api-keys` @smoke e2e test on every PR since the
test was added in #597 — that PR's frontend-e2e check failed on merge but
wasn't required, so the failure was missed.

Backend endpoints all live under `/api/...` (with slash), so the tighter
prefix preserves all real proxy traffic and only excludes the SPA route.

Verified locally: 7/7 @smoke tests pass after this change (was 6/7 with
/api-keys failing).

Refs #554 #556

* refactor(frontend): sweep cross-cutting + chat + file-mgr + process + misc (#554)

Slice 7 stacked on #609. Migrates 24 files spanning the small-domain pool:

  CHAT (4 files)
    ChatBubble  — copy-success checkmark, self-task accent panel (purple)
    ChatPanel   — agent-not-running warning state, error banner
    ChatInput   — file-remove button, voice-active recording indicator
    ChatHistoryDropdown — error state

  FILE MANAGER (4 files)
    FileManager        — notification toast (success/error), no-agents warning,
                         loading error, delete button + modal + confirm action
    FileTreeNode       — search-matched row highlight (file-type icons stay raw
                         decorative; need their own accent palette later)
    FilePreview        — preview-error icon + text
    FileSharingPanel   — Revoke button

  PROCESS (3 files)
    TrendChart        — completed/failed/cost bars (chart series + legend),
                         success-rate threshold ladder
    RoleMatrix        — no-executor row + badge
    TemplateSelector  — category badges (business/devops/support → status-info /
                         accent-purple / status-urgent)

  CROSS-CUTTING / MODALS (10 files)
    NavBar               — Ops critical-pulse + high indicator, WS connected dot
    GitConflictModal     — yellow warning header (×2), all destructive (red) options
    ReplayTimeline       — system-agent purple panel + badge, schedule-marker arrow,
                            live-feed dot, activity-state success rate ladder
    UnifiedActivityPanel — running/success/fail indicators (live + modal)
    OnboardingChecklist  — completed-state styling (ring, bg, indicator, text)
    CreateAgentModal     — templates-error + general error
    ConfirmDialog        — danger/warning variant icons, text, confirm buttons
    ResourceModal        — (no migrations — amber notice, deferred)
    AvatarGenerateModal  — error text, remove-avatar button
    HelpChatWidget       — error banner + retry button

  MISC (3 files)
    YamlEditor          — error and warning banners + counts + success checkmark
    EditorHelpPanel     — required-field indicator
    TerminalPanelContent — restart-required notice, start-agent button
    TagsEditor          — error message

Net diff: 24 files, +106 / -106 (1:1 palette aliases, byte-identical CSS).

Deferred (existing pattern):
  - Indigo / blue primary action buttons (Login & elsewhere)
  - Blue selected-state (NavBar tabs, OperatingRoom tabs)
  - Amber notices (ResourceModal, RoleMatrix amber missing-role marker —
    these still use the amber palette which differs from yellow)
  - File-type icon colors in FileTreeNode (decorative, need accent-yellow /
    accent-purple-blue / etc. — folder ≠ warning, video ≠ accent, etc.)
  - Slack/Telegram/WhatsApp logo brand colors

These map to the pending `action-primary`, `state-selected`, and accent-color-
expansion tickets.

Verified locally:
  - npm run check:tokens                         passes (10 tokens valid)
  - npm run build                                passes
  - npm run test:e2e:smoke (7 tests, 7.9s)       all green

Refs #554

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(frontend): sweep agent surfaces — 22 files (#554)

Slice 8 of the design-system migration. Replaces semantic palette refs
with status/state/accent tokens across all per-agent panels, the
agents list, and agent-detail surfaces.

Files migrated:
- 16 panels (Tasks, Git, Dashboard, Playbooks, Nevermined, Info,
  Credentials, Schedules, SystemViewEditor, HostTelemetry, Folders,
  Files, Skills, Observability, Permissions, Metrics)
- 5 agent surfaces (Agents, AgentNode, AgentHeader, AgentTerminal,
  AgentDetail)
- SystemAgentNode

Mappings (palette-equivalent, no visual change):
- yellow → status-warning
- green  → status-success
- red    → status-danger
- orange → status-urgent
- amber  → state-autonomous (token name slightly stretched for tool-call
  / queued-task amber; same palette, future cleanup may rename)
- rose   → state-locked
- purple → accent-purple

Deferred (no token family yet):
- indigo (action-primary)
- blue/sky/cyan (selected-state, category labels)
- teal (category labels)

SystemViewsSidebar untouched — only contains deferred blue/indigo
selected-state references.

Verification:
- npm run check:tokens → 10 tokens equivalent, all references resolve
- npm run build → clean
- npm run test:e2e:smoke → 7/7 passed against live Trinity (HMR)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(agent-runtime): kill npx MCP orphans outside claude pgid that hold stdout pipe open (#618) (#620)

* fix(voice): orb animation loop dies before voice session starts

The canvas is inside v-if="voice.isActive.value" so canvasEl.value is
null when onMounted fires. renderFrame() exits early without scheduling
the next frame, killing the loop permanently.

Replace onMounted initialization with watch(canvasEl) so the RAF loop
starts when the canvas enters the DOM and stops when it leaves.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(agent-runtime): kill npx MCP orphans outside claude pgid that hold stdout pipe open (#618)

After terminate_process_group kills claude's pgid, npm→node MCP server chains
spawned via npx call setsid() and land in a new session — they survive the
pgid kill and keep the stdout pipe write FD open indefinitely.  The kernel
cannot deliver EOF to our reader thread while any writer FD remains open, so
drain_reader_threads blocked for the full 30s post_kill_grace, then lost the
buffered result line via force-close (HTTP 502).

Add _kill_orphan_pipe_writers(): after terminate_process_group, scan /proc/*/fd
for any process outside our pgid that holds the pipe's write end (detected via
fdinfo flags), and SIGKILL it.  Killing the orphan releases all its FDs
(stdout AND stderr write ends) simultaneously, delivering EOF to both reader
threads so they drain naturally before the post_kill_grace window.

New tests (Linux-only, skipped on macOS — /proc required): verify that a
setsid() grandchild is detected and killed, that our own read-end process is
not touched, and that the end-to-end drain path preserves buffered data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(slack): replace slackify-markdown with own renderer (#293) (#622)

Replaces slackify-markdown with a custom renderer fixing 5 compounding bugs (nested lists, headings, blockquotes, tables, horizontal rules). Includes 35 unit tests and updated feature flow doc.

Closes #293

Co-Authored-By: pavshulin <pavshulin@users.noreply.github.com>

* feat(agents): per-agent token usage display in AgentHeader (#250) (#632)

Adds a token usage row to AgentHeader showing 7-day cost sparkline,
today's cost vs 7-day daily average (with trend arrow), and lifetime
totals. Data sourced from schedule_executions in the DB so it persists
across agent restarts.

- New GET /api/agents/{name}/token-stats endpoint
- ScheduleOperations.get_agent_token_stats(): single-pass 24h/7d/lifetime
  aggregation + 7-day daily breakdown with gap-filling
- agentsStore.getAgentTokenStats() action
- TOKEN USAGE ROW in AgentHeader.vue: SparklineChart (amber, 56x16),
  trend indicator (warning/success/gray), lifetime summary
- Hidden for agents with no runs (lifetime_executions == 0)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(site): agent website proxy via /site/{token} endpoint (SITE-001) (#634)

Adds live HTTP reverse-proxy so agents can serve public websites from
their container. A new `type='site'` public link routes requests through
`GET /site/{token}/{path}` → httpx streaming proxy → agent web server at
`http://agent-{name}:3000`. Includes DB migration, nginx routing, rate
limiting (per-IP + per-token), SSRF guard, security header stripping, and
audit event `site_link_visit`. UI adds Chat/Website selector in the link
create modal with a "Website" badge on site links.

Fixes #633

Co-authored-by: Claude <noreply@anthropic.com>

* fix(site): centralize SITE_PORT, atomic rate limit, fire-and-forget audit log, update docs (SITE-001)

- Move SITE_PORT to config.py; import in site.py and public_links.py
- Fix TOCTOU race in _check_site_rate_limit: pipeline INCR+check-after
- Audit log is now asyncio.create_task() so streaming is not delayed
- Add SITE-001 to requirements.md (section 15.1a-3)
- Add site.py to architecture.md router listing + /site/ endpoint table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(agent-runtime): bound _kill_orphan_pipe_writers to 10s to prevent drain stall (#649) (#650)

/proc scanning can block indefinitely when a process is in D state
(uninterruptible sleep), causing drain_reader_threads to stall for
tens of minutes instead of the expected ~30 seconds.

Run _kill_orphan_pipe_writers in a daemon thread with a 10s cap so
a blocked /proc entry cannot push the drain past its deadline.

Also use wall-clock accounting for the post-kill join timeout so time
spent in terminate + orphan scan doesn't silently erode the budget,
and log actual elapsed time instead of the expected value so future
incidents are easier to diagnose.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(security): voice WebSocket + stop endpoint missing ownership check (#600) (#638)

The /ws/voice/{voice_session_id} handler decoded the JWT but threw the
payload away — only the signature was checked. Any authenticated user
holding a valid JWT who learned the 128-bit session id (logs, browser
inspection, XSS) could attach to the audio stream, eavesdrop on the
victim's transcript, and trigger tool calls audit-logged under the
victim's identity.

POST /api/agents/{name}/voice/stop had the same gap: the path agent was
gated via get_authorized_agent, but request.voice_session_id was never
cross-checked against the path agent or the caller's user_id, so the
caller could end and persist a transcript onto another user's session.

Fix:
- WS: extract sub from the decoded JWT, look up the user, and close 4003
  if user.id != session.user_id (admin role bypasses, for support).
- voice_stop: load the session via get_session before mutating, assert
  agent_name == path name AND user_id == current_user.id (admin bypasses),
  raise 403 otherwise.

Added tests/unit/test_voice_auth.py covering: missing token, invalid
token, missing sub claim, unknown user, owner happy path, admin bypass,
attacker rejected, plus voice_stop variants. Loads voice.py via
importlib to avoid pulling in the full routers/__init__.py chain.

Reported by /security-review on PR #599 (2026-04-30); origin commit
7d8abe8 (#581 voice tool calls).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(auth): split login rate limit into per-account + per-IP buckets (#591) (#621)

Pentest finding AISEC-H2 (CVSS 7.5, CWE-307): the previous design used a
single per-IP bucket at 5 fails / 10 min. Any user behind a corporate
NAT, VPN, or CDN locked out everyone else at the same egress IP after
just four bad attempts. A rotating-proxy attacker could keep an
organisation locked out continuously, so the protection doubled as a
platform-wide DoS primitive.

Replace with two independent buckets:

  * Per-account (tight) — 5 fails / 15 min: limits credential stuffing
    on one targeted account; never affects other accounts.
  * Per-IP (loose)      — 30 fails / 5 min: catches single-source abuse
    but stays well above the legitimate-traffic threshold for users
    sharing a NAT/VPN/CDN egress.

Both buckets are checked on every attempt; 429 fires when either is
exhausted. Successful login clears both. Account names are normalised
(lowercase + strip) before keying. Endpoints without an account context
(public access-request) skip the per-account bucket and rely on the
per-IP one only.

Lockout state-changes log a structured WARNING (visible via Vector) so
operators can see when buckets are being exercised.

Live verification on the running backend:
  attempts 1-5 → 401 (counter ticking)
  attempt  6   → 429 "Too many failed attempts for this account..."
  valid pwd    → 429 (account stays locked even with right password)
  other account from same IP → 401 (per-account isolation works)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(security): require creator role on /api/systems/deploy (#592) (#624)

Auditing all entry points that ultimately call `create_agent_internal`
(per #592 AC #2) turned up a real bypass on `POST /api/systems/deploy`.
The system-manifest deployment route gated on `Depends(get_current_user)`
without a role check, so any authenticated user-role account could spawn
an entire fleet of agents through that path — a strictly stronger
privilege than the single-agent bypass the AISEC-H1 finding originally
named (which `POST /api/agents/deploy-local` had already closed via #150).

Add `Depends(require_role("creator"))` to `deploy_system`, matching the
existing dependencies on `POST /api/agents` and `POST /api/agents/deploy-local`.

Regression test (`tests/unit/test_agent_creation_role_gates.py`) walks
the FastAPI router source AST and asserts that every agent-creation
route uses `Depends(require_role("creator"))`. AST-level so the check is
fast, stable across formatting changes, and fires the moment someone
removes the dependency. Confirmed the test catches the regression by
reverting the change and observing the test fail before re-applying.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(security): stop mirroring JWT into document.cookie (#188) (#642)

* fix(security): stop mirroring JWT into document.cookie (#188)

UnderDefense pentest 3.3.5 flagged the frontend for mirroring the
authentication token into a `token` cookie without `Secure` or
`HttpOnly` flags. The cookie was set in setupAxiosAuth via
`document.cookie =` so it was readable from JS (HttpOnly is impossible
on JS-set cookies), transmitted over HTTP without the Secure flag, and
auto-attached to every outbound request as a CSRF vector.

The cookie's stated purpose was "for nginx auth_request to validate
agent UI access" — but that nginx directive was never configured in
any committed deployment (`grep -r auth_request -- *.conf` is empty,
git log -S confirms it never existed). The cookie was pure attack
surface with zero functional value.

Per the issue's "Best" remediation, drop the cookie mirror entirely.
API authentication uses the `Authorization: Bearer` header
exclusively; nothing else needs the cookie.

The cookie-clear on logout is intentionally kept so users carrying a
stale cookie from the pre-fix version get cleaned up on their next
logout cycle. The cookie's `max-age=1800` also naturally expires it
within 30 minutes of the upgrade.

The backend's `/api/auth/validate` endpoint still accepts a cookie as
one of three token sources — left untouched as out-of-scope. With the
frontend no longer setting the cookie nothing legitimate sends one,
but the fallback path remains available if a future nginx
auth_request setup is wired up properly (with Secure + HttpOnly flags
set server-side via Set-Cookie, not via document.cookie).

Closes #188.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(admin-login): remove stale references to JWT cookie mirror (#188)

PR #642 removed the document.cookie set in setupAxiosAuth, but the
admin-login feature flow still documented the cookie as live. Update
the code snippet and the storage table to reflect current behaviour.

Note in the snippet describes why the cookie was removed so readers
who see the diff history can find the rationale without reading the
PR.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(security): override git User-Agent on skills library sync (#184) (#646)

UnderDefense pentest 3.3.1 flagged the backend for leaking the
underlying tech stack via outbound User-Agent. Skills sync uses git
subprocess (not httpx), so the leaked UA is `git/<version>
(libcurl/<version> ...)` — verified live with GIT_TRACE_CURL=1.

Add `-c http.useragent=Trinity-Skills-Sync` (positioned correctly
before the subcommand, as git's `-c` requires) to the two HTTP-bearing
git invocations: `_git_clone` and `_git_pull`'s fetch. The local-only
`git reset --hard` and `git rev-parse HEAD` calls intentionally do
not get the flag — they make no HTTP and threading the flag through
would suggest otherwise.

The SSRF allowlist (#179) already locks the destination to github.com
so the practical exposure is small (GitHub already knows what we are),
but defense-in-depth: even if the allowlist is ever loosened the UA
stays generic.

The constant has no version suffix to avoid yet another version string
drifting against VERSION / package.json / pyproject.

Tests in tests/unit/test_skill_service_user_agent.py mock subprocess
and assert the flag is present at the right argv position for clone
and fetch, and absent for the local-only reset and rev-parse calls.

Live verification with `GIT_TRACE_CURL=1 git -c http.useragent=... ls-remote ...`
confirms the wire UA changes from `git/2.43.0` to `Trinity-Skills-Sync`.

Closes #184.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(deploy): document stale-image symptom + recovery in start.sh and DEPLOYMENT.md (#557) (#626)

Self-hosted developers tracking `dev` occasionally pull a commit that
adds a new Python or Node dependency to one of the platform Dockerfiles,
re-run `start.sh`, and end up with new source running against an old
image's Python env. Uvicorn crashes with `ModuleNotFoundError`, compose
keeps respawning the worker, and `start.sh` reports success — leaving
the UI "Disconnected" with no obvious diagnosis.

Adopting Option B from #557 discussion (PR #625 closed): treat this as
a documentation problem rather than building auto-detection into the
critical-path startup script. Auto-detection has clear cost (Python
subprocess + Docker inspect on every cold start) and unclear benefit
(production deploys use `compose pull` and don't hit this; the affected
population is self-hosted devs whose recovery is one command).

Two changes:
- `scripts/deploy/start.sh`: append a 4-line hint after the "Ready!"
  banner naming the symptom (`ModuleNotFoundError`, "Disconnected" UI)
  and the exact recovery command.
- `docs/DEPLOYMENT.md`: add a Troubleshooting entry with full diagnosis
  walkthrough, root-cause explanation, and the rationale for not
  auto-detecting (links to #557).

A future `scripts/deploy/upgrade.sh` is the right place to bundle
backup + rebuild + start + verify for the explicit upgrade path; that
is bigger-than-#557 scope.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(security): lock down Redis — auth + ACL + network split (#589)

* fix(security): split Docker compose into platform and agent networks (#589)

Redis at 172.28.0.0/16 was reachable from any agent container. AISEC scan
3aad5469 demonstrated end-to-end exfiltration / cross-user task injection
from a legitimately deployed agent. Network segmentation is the strongest
control — agents now physically cannot route to Redis.

Topology:
- trinity-platform (172.29.0.0/16, NEW) — Redis, scheduler, vector
- trinity-agent (172.28.0.0/16, name preserved) — frontend, agents
- Backend / mcp-server / otel-collector / cloudflared straddle both

Agent-creation sites in services/agent_service/* and system_agent_service.py
need zero changes because the agent-network external name is preserved.

Dev: bind Redis host port to 127.0.0.1:6379 (was 0.0.0.0). Tests connect
from the dev machine; LAN cannot. Auth lands in the next commit.

Refs #589 — acceptance criterion #3 (network segment separation).

* fix(security): mandatory Redis auth, ACL users, auth-aware healthcheck (#589)

Both compose files now enforce two passwords (REDIS_PASSWORD admin /
REDIS_BACKEND_PASSWORD runtime) with the fail-on-missing :? form.
docker compose refuses to render without them.

Per-user ACL via inline --user flags. Additive (start from zero, allow
only what the runtime needs) — never +@all -X, which lets newly added
dangerous commands through. backend + scheduler get standard data
families plus scripting/transactions/pubsub minus -@dangerous, which
covers FLUSHALL, CONFIG, SHUTDOWN, MIGRATE, REPLICAOF, MONITOR.

Verified at runtime against redis:7-alpine: PING/SET/GET work for the
backend user, FLUSHALL and CONFIG GET return NOPERM, unauth requests
return NOAUTH.

REDIS_URL on backend + scheduler now embeds the backend ACL user.
mcp-server: REDIS_URL and depends_on:redis dropped in prod compose
(zero Redis imports in src/mcp-server/).

Healthcheck pings as the backend ACL user so a typo'd ACL keeps redis
unhealthy and gates dependent services. depends_on:redis switches to
service_healthy so backend/scheduler don't race the ACL load.

Refs #589 — acceptance criteria #1, #2, #5.

* fix(scheduler-test-rig): mirror Redis auth posture (#589)

Without this, scheduler container fails fast on startup against the rig
because src/scheduler/config.py requires creds in REDIS_URL after #589.
No ACL or network split here — this is a 2-service standalone debugging
rig, not the production posture.

* fix(security): fail-fast on REDIS_URL missing credentials (#589)

Backend (src/backend/config.py) and scheduler (src/scheduler/config.py)
now raise RuntimeError at import time if REDIS_URL is unset or lacks
credentials.

Removed the splicing fallback in backend config that papered over an
unauth REDIS_URL by joining REDIS_PASSWORD into the URL — single source
of truth (compose) eliminates silent drift.

Tests that import backend modules need a creds-bearing REDIS_URL in their
environment; tests/conftest.py will set a dummy one in the test commit.

Refs #589 — acceptance criterion #5.

* fix(webhooks): use REDIS_URL for rate-limit client (#589)

Webhooks rate-limit was the one Redis client that bypassed REDIS_URL —
it used redis.Redis(host="redis", port=6379) and would silently
fail-open under requirepass. Switching to redis.from_url(REDIS_URL)
picks up the credentialed URL like every other client.

Also: distinguish auth/ACL errors (logged at ERROR with exception class)
from transient errors (WARN). Fail-open behavior preserved so a Redis
blip doesn't 500 legitimate webhooks, but a misconfigured deploy now
surfaces in alerts instead of via a webhook abuse incident.

Drops the now-unused REDIS_HOST/REDIS_PORT env reads.

* feat(deploy): auto-generate Redis passwords on fresh installs (#589)

start.sh ensure_redis_passwords matches the existing
CREDENTIAL_ENCRYPTION_KEY pattern, with one safety guard:

- Fresh install (no redis-data volume) → generate both passwords with
  openssl rand -hex 24 and append to .env. One-command boot keeps
  working.
- Existing volume + missing password → refuse with a loud error pointing
  at docs/migrations/REDIS_AUTH.md. Re-keying a populated Redis would
  lock the backend out of its own data; ops needs to follow the explicit
  upgrade path.

Idempotent — second run is a no-op when both passwords are already set.

* docs(security): add Redis auth migration guide + architecture notes (#589)

- docs/migrations/REDIS_AUTH.md: operator upgrade guide. Covers fresh
  installs (auto-generated by start.sh), live upgrades (down
  --remove-orphans + docker network rm + add passwords), production,
  and verification commands.
- docs/memory/architecture.md: new "Network Topology (Issue #589)"
  section above Container Security. Documents the two-network split,
  service membership table, the "agents NEVER on platform network"
  rule, and the three Redis ACL users + their access patterns.

* test(security): network isolation, ACL, fail-fast, webhook rate-limit (#589)

tests/conftest.py: top-level autouse env stub for backend imports.
Backend config now raises at import-time if REDIS_URL lacks credentials;
without this, every test that transitively imports backend modules
breaks. Real Redis tests under tests/security/ override via their own
conftest from .env. Adds the `integration` marker.

tests/unit/test_config_fail_fast.py (new): backend refuses to import
without creds-bearing REDIS_URL. 3 cases — missing env, unauth URL,
URL with creds.

tests/security/test_redis_network_isolation.py (new): 5 integration
tests covering acceptance criteria #1-#3:
  - agent-network container has no route to redis (BLOCKED)
  - unauth client gets NOAUTH on platform network
  - backend ACL user can PING with creds
  - backend ACL user FLUSHALL → NOPERM (no admin)
  - backend ACL user CONFIG GET → NOPERM (no requirepass leak)

tests/security/conftest.py (new): session-scoped fixture loads real
.env values for the integration tests; skips the suite if missing.

tests/integration/test_webhook_rate_limit.py (new): regression for the
from_url switch in webhooks.py. Self-contained — creates agent +
schedule + webhook token inline, hits 11×, expects 429 on the 11th.
Catches the silent fail-open if Redis auth ever regresses.

tests/run-integration.sh (new): pytest -m integration runner. Excluded
from run-smoke.sh per the smoke runner's ~30s no-Docker contract.

* docs(security): detach agents before network rm (#589)

Trinity-managed agent containers are created via the Docker SDK
outside compose, so they store the agent network's UUID, not its
name. After `docker network rm trinity-agent-network` (step 3 of
the upgrade procedure), any later `docker start <agent>` fails:

    Error response from daemon: failed to set up container
    networking: network <old-uuid> not found

Compose-managed services don't hit this — they're recreated with
fresh network refs on `up`. Agent containers aren't, so they keep
the stale UUID until disconnected.

Add an explicit detach loop as step 2, before the network removal.
Verified against a populated install with one running and four
stopped agents: all five reattach cleanly to the new network on
next start.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(security): CSO OBS-1/2/3 follow-ups — webhook rate-limit + healthcheck hardening (#589)

Resolves three observations from the CSO audit
(docs/security-reports/cso-2026-05-04-589-diff.md):

OBS-1 — webhook rate-limit fail-open + connection-per-request DoS amplifier:
* Added in-process secondary rate limiter (3x primary, per-worker) in
  src/backend/routers/webhooks.py. Bounds blast radius during a Redis
  outage without breaking the documented fail-open philosophy.
* Cached the Redis client at module level under threading.Lock with
  double-checked init. _check_webhook_rate_limit resets the cache on
  inner exceptions so stale connections rebuild cleanly. Without
  caching, a flood would open a fresh TCP per request and exhaust
  Redis maxclients — turning the rate limiter into the DoS amplifier.

OBS-2 — tightened _TOKEN_RE from {20,60} to {43} matching
secrets.token_urlsafe(32) (verified against db/schedules.py:524).

OBS-3 — switched all three compose healthchecks from
`redis-cli -a $$PASS` to `REDISCLI_AUTH="$$PASS" redis-cli` so the
password no longer appears in /proc/<pid>/cmdline.

Additional #589 hardening (caught while resolving OBS-1):
* src/backend/config.py + src/scheduler/config.py: tightened the
  REDIS_URL credential check from `"@" in url` substring to urlparse
  validation. Catches redis://@redis:6379, redis://user@redis:6379, etc.
* src/scheduler/main.py: redact password from REDIS_URL before logging
  (was leaking via Vector log aggregator).

Tests:
* tests/unit/test_webhook_rate_limit_inprocess.py — 7 new tests covering
  cap, window expiry, token isolation, runtime-error fallback, regex
  shape, cache hit, cache reset.
* tests/unit/test_config_fail_fast.py — 4 new parametrized cases for
  malformed-credential URL rejection.
* 15/15 unit tests pass.
* Live Redis healthcheck verified — trinity-redis reports healthy with
  the new REDISCLI_AUTH form; `redis-cli ping` returns PONG.

Also adds .gstack/ to .gitignore so future skill artifacts stay local.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(dev): update gitea overlay network name after #589 split

trinity-network no longer exists; gitea dev overlay must attach to
trinity-agent-network (the preserved agent-network name).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): use credentialed Redis URL in scheduler test_config fixture (#661)

After #589 hardened Redis auth, SchedulerConfig raises on bare redis:// URLs.
The test_config fixture bypassed the env-level patch in tests/conftest.py by
passing redis_url="redis://localhost:6379" directly.

Fixes #659

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Session tab — `--resume`-default chat surface (Closes #651) (#652)

* docs(planning): add Session tab design — --resume-default chat surface

Adds docs/planning/SESSION_TAB_2026-04.md, the comprehensive plan for a
new "Session" tab living alongside Chat. Sessions reattach to their own
Claude Code JSONL via --resume, preserving tool memory, mid-skill state,
and reasoning state across turns.

Plan covers:
- UI design (tab placement, multi-session model, +New Session, Reset memory)
- Data model (agent_sessions / agent_session_messages — parallel to chat)
- Backend architecture (separate router, single shared change to
  task_execution_service for persist_session plumbing)
- Phased rollout (foundation → backend → frontend → hardening → GA)
- Edge cases & failure-mode lessons baked in from a prior local spike
  (parser bug, --no-session-persistence dependency, cold-turn detection,
  port allocation)
- Test plan including the cross-session contamination test for
  Anthropic claude-code#26964
- Retention/cleanup policy, observability, security checklist
- Local-first workflow: implementation runs entirely on this branch
  until validation passes; only then does the standard SDLC engage
  (issue, push, PR)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(db): add agent_sessions + agent_session_messages tables

Phase 1.1 of the Session tab plan (docs/planning/SESSION_TAB_2026-04.md).
Schema definitions go in db/schema.py for fresh installs; the matching
idempotent migration agent_sessions_tables in db/migrations.py upgrades
existing databases.

The schema mirrors chat_sessions / chat_messages but is strictly parallel
— no foreign keys, no shared columns, separate index namespace. Three
fields are unique to the session model:

- agent_sessions.cached_claude_session_id — the Claude Code session UUID
  the next turn will pass to ``--resume``
- agent_sessions.consecutive_resume_failures — drives the resume-failure
  fallback (Phase 2.2)
- agent_session_messages.cache_read_tokens — observability for whether
  Anthropic's prompt cache engaged

CASCADE on session delete cleans up message rows automatically.

Verified locally: backend restart applies the migration cleanly, tables
have 15 columns each with correct types/defaults/PKs, all four indexes
created, second restart confirms idempotency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(db): add SessionOperations for Session tab persistence

Phase 1.2 of the Session tab plan (docs/planning/SESSION_TAB_2026-04.md).

- Adds AgentSession and AgentSessionMessage Pydantic models in db_models.py
  with the new fields the Session tab needs beyond ChatSession/ChatMessage:
  cached_claude_session_id, last_resume_at, consecutive_resume_failures on
  the session row, and cache_read_tokens + claude_session_id on each message.

- Creates db/sessions.py with a SessionOperations class mirroring the
  ChatOperations shape: create_session, get_session, list_sessions,
  delete_session, add_session_message, get_session_messages, plus the
  Claude UUID cache helpers (get/update/clear_cached_claude_session_id)
  and resume health helpers (mark_resume_failure, mark_resume_success).

- Wires the new ops into the DatabaseManager facade alongside the
  existing _chat_ops, with one delegating method per public operation.

No router, no agent-server change, no frontend yet — those land in later
phases. Tables agent_sessions and agent_session_messages already exist
from the prior schema commit.

* feat(session-tab): backend foundation for --resume-default Session surface

Phases 1.3 through 1.7 of the Session tab plan
(docs/planning/SESSION_TAB_2026-04.md). Pure backend / agent-server work
behind a flag — no UI surface yet, no behavior change to Chat or any
existing /task caller.

Agent server (base image):

- Stream-json parser fix (Appendix B). Both parse_stream_json_output and
  process_stream_line now recognize {"type":"system","subtype":"init"}
  for session_id capture, with the result event as a fallback when init
  was missed (truncated streams). The legacy bare-init shape is
  intentionally rejected. This is the same bug that would have made
  Session caching corrupt on every cold turn.

- Same bug in execute_headless_task's permission-mode validation site:
  the check matched the wrong shape, so permission_mode_validated never
  flipped to True and the protective kill-on-misconfigured-permission
  path silently failed open. Now uses type=system + subtype=init.

- New persist_session flag threaded through ParallelTaskRequest →
  routers/chat.py → AgentRuntime ABC → ClaudeCodeRuntime.execute_headless
  → execute_headless_task. When True, --no-session-persistence is
  omitted so the JSONL is written and the next turn's --resume can find
  it. --session-id is still passed for unique cold-turn namespace.
  Default False keeps every existing caller stateless.

- gemini_runtime accepts the parameter for ABC parity and ignores it
  (Gemini CLI has no resume).

Backend:

- task_execution_service.execute_task now accepts persist_session: bool
  = False and threads it into the agent payload. All existing callers
  (Chat, schedules, MCP, fan-out, webhooks) keep today's behavior; only
  the future routers/sessions.py (Phase 2) opts in.

- settings_service.is_session_tab_enabled() — feature flag resolving
  system_settings.session_tab_enabled → SESSION_TAB_ENABLED env →
  False. Module-level convenience function exposed.

Tests (run inside trinity-backend container — Python 3.11):

- tests/unit/test_session_operations.py — 9 tests against an isolated
  SQLite DB exercising the full SessionOperations CRUD plus the cached
  claude session UUID lifecycle and resume failure / success counters.

- tests/unit/test_claude_code_session_id_parser.py — 8 tests covering
  both parsers (batch + streaming): system/init recognition, result
  fallback, init-wins-over-result, legacy bare-init rejection, and a
  source-level regression guard for the permission-mode validation
  fix.

- tests/unit/test_session_persistence_flag.py — 8 tests pinning the
  contract: signatures across the runtime ABC, ParallelTaskRequest,
  agent chat router, execute_headless_task, and
  task_execution_service.execute_task. Includes the gating regex check
  on --no-session-persistence and a live signature import to catch
  drift AST parsing alone would miss.

Total: 25 passing tests covering every touchpoint of Phase 1.

Base image (trinity-agent-base) rebuilt to embed the agent-server
changes; existing agent containers will pick them up on next recreate.

* feat(session-tab): backend turn endpoint for --resume-default Session surface

Phase 2 of docs/planning/SESSION_TAB_2026-04.md. Six endpoints under
/api/agents/{name}/session{s,...} that mirror routers/chat.py's auth
model and TaskExecutionService usage but persist to the parallel
agent_sessions / agent_session_messages tables and request
persist_session=True on every turn so each call reattaches via
`claude --print --resume <uuid>`.

Surface gated on is_session_tab_enabled() — flag-off default returns
404 from every endpoint.

  POST   /api/agents/{name}/session                  create row
  GET    /api/agents/{name}/sessions                 list (per-user)
  GET    /api/agents/{name}/sessions/{id}            session + messages
  POST   /api/agents/{name}/sessions/{id}/message    THE turn
  POST   /api/agents/{name}/sessions/{id}/reset      clear cached uuid
  DELETE /api/agents/{name}/sessions/{id}            delete row + msgs

Spike-pitfall defenses baked into the turn endpoint:

- L3 (first-turn-has-no-session-id): the agent_sessions row is created
  server-side via POST /session BEFORE the turn endpoint ever calls
  execute_task. No frontend-first model.
- L2 (cold turn writes empty JSONL): persist_session=True is passed
  unconditionally — Phase 1.4 already wired the flag through the agent
  stack; Phase 2 just promises to set it on every turn.
- L1 (parser misses system/init): trust result.session_id directly —
  Phase 1.3 fixed the parser. Scenario A confirms the captured UUID is
  the real Claude UUID end-to-end.

Phase 2.2 resume-failure fallback: when execute_task returns "no
conversation found" on a turn that had a cached UUID, clear the cache,
mark_resume_failure, and retry once with resume_session_id=None. Logs
event=session_resume_fallback with the stale UUID and consecutive
failure count. Anthropic #39667 (cleanupPeriodDays) and #53417 (CLI
upgrade) both produce this signal.

Phase 2.3 Redis lock: SET NX EX per (agent, claude_uuid) with 5-min TTL
and Lua-script release. Async poll loop (250ms tick) so the event loop
stays free during contention. Cold turns skip the lock (no JSONL to
corrupt). Hard 30s wait ceiling — beyond that the contender gets HTTP
429 with retry hint. Mitigation for Anthropic #20992 (concurrent
--resume JSONL writes corrupt the file).

Per-user ownership at the row layer: even agent owners cannot read or
send into another user's session (E6 isolation in the design doc).
Returns 404 for ownership failures so we don't leak session-id existence.

Tests (tests/integration/test_session_turns.py, run inside
trinity-backend container with docker.sock mounted for testfix
recreation + JSONL surgery in Scenario C):

  Scenario A: 3-turn happy path — same Claude UUID across turns
  Scenario B: turn 2 recalls a secret from turn 1, no text-replay
  Scenario C: JSONL deletion mid-session triggers fallback + recovery
  Scenario D: concurrent POSTs serialise via Redis lock
              (asserts finish_gap ≈ winner_work_time, NOT total wall)
  Scenario E: switching sessions A → B → A preserves A's UUID

5 passed in 54.5s against the live agent-testfix container (recreated
onto the rebuilt base image first per L4 in the plan). Phase 1's 25
unit tests still pass — no regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(session-tab): frontend Session surface

Phase 3 of docs/planning/SESSION_TAB_2026-04.md. Adds the new "Session"
tab in AgentDetail, gated on the is_session_tab_enabled() platform flag
so it stays invisible until explicit opt-in (default off).

Backend prerequisite — routers/settings.py:

- GET /api/settings/feature-flags exposes a curated allowlist of UI-
  relevant flags to any authed user. The existing /api/settings/{key}
  endpoint is admin-only and would block non-admin frontends from even
  knowing whether to render the Session tab. The new endpoint reads
  through services.settings_service.is_session_tab_enabled() so the
  resolution order (DB → env → False) stays in one place.

Frontend:

- src/frontend/src/stores/sessions.js — Pinia store wrapping the six
  /api/agents/{name}/sessions* endpoints with per-agent state isolation
  and the feature-flag cache. Optimistic user-message insert with
  rollback on send failure.

- src/frontend/src/components/SessionPanel.vue — structural copy of
  ChatPanel reusing ChatMessages + ChatInput + ModelSelector. Differs
  from Chat in three places per the design doc:
    * Sends bare user_message to POST .../sessions/{id}/message — no
      buildContextPrompt text-replay (the agent already has working
      memory via --resume).
    * "Reset memory" button + confirm modal that clears the cached
      Claude UUID without deleting the message log (Phase 3.4).
    * Per-session selector subtitle: turn count, context % used,
      cached-memory dot (emerald/gray), and consecutive_resume_failures
      indicator (Phase 3.5).
  Lean cut for first-visible-surface: voice mic, file upload, and SSE
  dynamic status labels are deferred — those need backend extensions
  (file payload on the turn endpoint, async_mode + SSE on the same).

- src/frontend/src/views/AgentDetail.vue — new Session tab inserted
  between Chat and Dashboard/Schedules, gated on
  sessionsStore.sessionTabEnabled. Layout sites that previously
  branched on activeTab === 'chat' now use a shared isFullscreenTab
  computed so Chat and Session both get the input-pinned-to-bottom flex
  layout. ?tab=session deep-link allowlist updated.

- src/frontend/e2e/session-tab.spec.js — Phase 3.6 Playwright spec.
  Marked @interactive (not @smoke) because each run makes one real
  Claude API call (~10–60s). Snapshots the prior flag value in
  beforeAll, force-enables for the run, restores in afterAll so a
  failed run doesn't leave the platform with the flag dirty. Three
  cases:
    * tab is hidden when flag is off
    * tab appears, "+ New Session" → send turn → reply visible →
      Reset memory modal opens + closes
    * Chat tab still works after Session interaction; switching back
      preserves Session state

Visually verified in the live dev server: tab renders in correct
position, header layout matches Chat's structure, empty state and
placeholder copy match the design doc, "Reset memory" only shown when
an active session exists, full-viewport flex layout pins input to
bottom.

Phase 1 + Phase 2 work behind this change is unchanged: 25 unit tests
+ 5 integration tests still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(session-tab): hardening + observability — cleanup service, contamination gate, docs

Phase 4 of docs/planning/SESSION_TAB_2026-04.md. Closes the JSONL
disk-growth loop, validates the GA-blocking cross-session contamination
hypothesis empirically, and lands architecture.md / feature-flows
documentation so the surface is discoverable.

Phase 4.3 — cross-session contamination GA gate (the load-bearing one):

- tests/integration/test_session_cross_contamination.py exercises the
  Anthropic #26964 hypothesis end-to-end. Plants a randomly-generated
  secret token in session A with explicit "do not echo" framing, asks
  session B (different UUID, same agent, same cwd) to recall the token.
  Hard-fails if the exact token leaks; soft-fails on partial-prefix
  recall (PURPLE-DRAGON without the random suffix would only be
  knowable from A's JSONL, not from training).
- PASSED in 9.5s on the current Claude Code version → shared-cwd model
  is safe → Phase 5 rollout unblocked. Test stays in the suite as the
  per-version regression guard.

Phase 4.2 — JSONL cleanup service:

- services/session_cleanup_service.py runs a 6h periodic sweep that
  diffs every running agent's
  ~/.claude/projects/-home-developer/<uuid>.jsonl set against
  db.list_active_claude_session_ids(agent) and reaps orphans whose
  mtime is older than the 1h race guard. Race guard prevents the
  cold-turn-vs-cleanup window where a brand-new JSONL exists on disk
  before the backend has updated cached_claude_session_id.
- Same service exposes a synchronous reap_jsonl(agent, uuid) helper
  called best-effort from routers/sessions.py reset/delete handlers so
  the user-perceived disk-reclaim latency is sub-second. Never raises;
  failures are logged and the periodic sweep is the safety net.
- Implementation uses execute_command_in_container — the same primitive
  git_service / ssh_service / scheduler pre-check / agent terminal use.
  No new agent-server endpoint, no base-image rebuild.
- New db.list_active_claude_session_ids(agent) facade method backed by
  SessionOperations.list_active_claude_session_ids querying every
  agent_sessions row whose cached_claude_session_id is non-null for the
  agent.
- main.py wires startup (staggered +7.5s after cleanup_service to
  offset Docker hits) and clean shutdown.
- tests/integration/test_session_cleanup.py: reset reaps synchronously,
  delete reaps synchronously, periodic sweep keeps the active JSONL,
  reaps an aged orphan, respects the 1h race guard for fresh orphans.

Phase 4.4 — architecture.md updates:

- Background Services table gets a Session Cleanup row.
- New "Session Tab" subsection in API Endpoints documenting all six
  /api/agents/{name}/sessions* routes including the per-user ownership
  rule (404 not 403) and the resume-failure fallback / Redis lock.
- New /api/settings/feature-flags row.
- New agent_sessions / agent_session_messages DDL block in Database
  Schema, with the three Session-specific fields called out
  (cached_claude_session_id, consecutive_resume_failures,
  cache_read_tokens, claude_session_id audit).

Phase 4.5 — feature-flows/session-tab.md vertical slice:

- Full path from UI → API → DB → Side Effects with the JSONL lifecycle
  table, the spike-pitfall defense map (L1/L2/L3/#20992/#26964), the
  error-handling matrix, and the complete test catalog with the docker
  run command for the integration suite.
- feature-flows.md index updated (Recent Updates row + Chat & Sessions
  section entry).

Test totals: 25 unit + 9 integration = 34 tests, all green. Phase 4.3
serves as both the GA gate and the per-Claude-version regression guard.

Phase 4.1 (cache_read_tokens UI surfacing) deferred — the column is
already populated by the Phase 2 turn endpoint; surfacing is a minor
observability follow-up that doesn't block Phase 5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(session-tab): tag Session-tab turns with triggered_by="session" and a gold badge

Previously Session-tab turns went into schedule_executions with
triggered_by="chat", so the Tasks tab couldn't tell them apart from
the Chat tab. The user-visible signal was that every Session turn
showed up under the sky-blue "chat" badge.

Backend (routers/sessions.py): both call sites that invoke
task_execution_service.execute_task — the cold/resume turn and the
resume-failure fallback retry — now pass triggered_by="session".
Existing rows are unchanged; the cutover is per-write.

Frontend (TasksPanel.vue): adds a "Session" option between "Chat" and
"Manual" in the trigger filter dropdown, plus an amber/gold badge
branch (bg-amber-100 dark:bg-amber-900/30 text-amber-700
dark:text-amber-300) — visually distinct from "paid" (bright yellow)
and from the sky-blue "chat" badge.

triggered_by is a free-form TEXT column (no enum constraint at the DB
or service layer), so adding "session" as a new value doesn't require
any migration or downstream consumer updates. Filter, badge, audit
log, activity stream, and dashboards all just see another value and
display it; nothing has to know about it explicitly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(session-tab): correct context-window accounting + raise frontend turn timeout

Five interrelated fixes from manual testing — all about the per-turn
"context %" metric being misleading and the browser timing out before
long-running session turns finished.

1) Agent server (docker/base-image/agent_server/services/claude_code.py)
   process_stream_line's `result` event handler used to overwrite
   metadata.input_tokens, cache_read_tokens, and cache_creation_tokens
   with the values from result.usage. Those values are CUMULATIVE
   across every internal API call the turn made (Claude Code packs
   tool-use loops into a single user turn that maps to N internal API
   calls). For an 18-iteration turn each reading the same 70K cached
   prefix, result.usage.cache_read_input_tokens = 18 * 70K = 1.26M
   tokens — billing-cumulative, not the prompt size of any single call.
   Overwriting per-message values with that aggregate made our
   context-window-pressure metric grow far beyond the 200K limit even
   when no individual API call was anywhere close to the wall.

   Fix: result handler now only extracts model-level facts (cost,
   duration, num_turns, session_id, error info, modelUsage.contextWindow).
   Per-API-call usage stays in the per-assistant-message handler, where
   the LATEST message's values represent the FINAL API call's prompt
   size — exactly what determines whether the next turn will fit.

   Also added a per-message usage-extraction block to the assistant
   branch of process_stream_line (it previously had no usage extraction
   at all, relying entirely on the result handler — which made my
   first attempt at this fix produce zero values). parse_stream_json_output
   already had the equivalent block (lines 211-215).

   Base image rebuilt; agent-testfix recreated onto the new image
   (image sha 0a1e20b40da1).

2) Backend (services/task_execution_service.py)
   Replaced `context_used = metadata.input_tokens` with
   `cache_read + cache_creation` (with input_tokens fallback when
   caching isn't engaged). input_tokens is sometimes the disjoint
   fresh value and sometimes inflated by the agent server's
   modelUsage.inputTokens override on tool-call turns. cache_read
   and cache_creation come straight from Anthropic's usage object
   and (post agent-server fix) are reliable per-call values that
   monotonically reflect the cached conversation prefix.

3) DB (db/sessions.py)
   total_context_used is now a HIGH-WATERMARK (MAX of prior + new),
   not the latest value. Per-turn context naturally oscillates by ~2x
   between text-only and tool-call turns; the watermark gives users
   a stable monotonic upper bound on session pressure that only goes
   up.

   Capped the watermark at total_context_max as a safety belt against
   any future agent-server bug that emits cumulative-billing token
   counts. Genuine per-call peaks should never exceed the model's
   context window — if they do, that's an accounting error not a
   real overflow, and the UI should display 100% rather than 648%.

4) Frontend (stores/sessions.js)
   Bumped the Axios timeout on the session turn endpoint from 305s
   (~5 min) to 7260s (= TIMEOUT-001 cap of 7200s + 60s slack). The
   session turn endpoint is synchronous and may legitimately run for
   the agent's full execution timeout. With the previous 305s ceiling
   the browser threw a misleading "failed" toast on tool-heavy turns
   that ran longer; the response still landed in the DB and the UI
   recovered after a page refresh, but the user saw a phantom error.

Verified end-to-end with a 6-turn mixed sequence (text + tool-call):
per-call cache_read now reports ~11636 on text-only turns and ~18000
on tool…
Assets 2
Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!