Skip to content

Feat/UI llm health widget 322#549

Merged
AbirAbbas merged 7 commits intoAgent-Field:mainfrom
ivasuy:feat/ui-llm-health-widget-322
May 7, 2026
Merged

Feat/UI llm health widget 322#549
AbirAbbas merged 7 commits intoAgent-Field:mainfrom
ivasuy:feat/ui-llm-health-widget-322

Conversation

@ivasuy
Copy link
Copy Markdown
Contributor

@ivasuy ivasuy commented May 7, 2026

Summary

Implements the dashboard LLM health observability widget requested in #322 by surfacing backend status from GET /api/ui/v1/llm/health in a dedicated card on /ui. The UI now shows overall backend state (healthy/degraded/down/disabled), per-endpoint circuit state (closed/open/half_open), consecutive failure count, last error, last success/check timestamps, and a destructive visual alert when any circuit is open (the key troubleshooting gap called out in #316).

Type of change

  • Bug fix
  • New feature
  • Refactor / cleanup
  • Docs only
  • Tests only
  • CI / tooling
  • Breaking change

Test plan

  • npm --prefix control-plane/web/client test -- src/test/components/dashboard/LLMHealthWidget.test.tsx src/test/pages/NewDashboardPage.test.tsx src/pages/NewDashboardPage.test.tsx
  • npm --prefix control-plane/web/client run build
  • Manual verification in embedded UI:
    • /ui dashboard shows LLM backend health card
    • Disabled config shows Disabled + “LLM health monitoring is disabled for this deployment.”
    • Open circuit state shows Circuit breaker open alert, endpoint row with Open, failure count, and last error message

Test coverage

  • I ran tests for the surface(s) I changed locally.
  • New code paths are covered by tests in this PR (no bare additions).
  • If I removed code, I updated coverage-baseline.json in this PR only if the removal caused a legitimate regression and I called it out in the summary above.
  • The coverage gate check is green in CI before requesting review.

Checklist

Related issues / PRs

UI screenshots

Dashboard LLM health widget (open circuit state)

Screenshot 2026-05-07 at 17 35 38

@ivasuy ivasuy requested review from a team and AbirAbbas as code owners May 7, 2026 12:52
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Performance

SDK Memory Δ Latency Δ Tests Status
Python 9.4 KB +4% 0.44 µs +26%

✓ No regressions detected

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

📊 Coverage gate

Thresholds from .coverage-gate.toml: per-surface ≥ 86%, aggregate ≥ 88%, max per-surface regression ≤ 1.0 pp, max aggregate regression ≤ 0.50 pp.

Surface Current Baseline Δ
control-plane 87.40% 87.30% ↑ +0.10 pp 🟡
sdk-go 91.90% 90.70% ↑ +1.20 pp 🟢
sdk-python 93.66% 93.63% ↑ +0.03 pp 🟢
sdk-typescript 92.68% 92.56% ↑ +0.12 pp 🟢
web-ui 89.91% 90.01% ↓ -0.10 pp 🟡
aggregate 88.99% 89.01% ↓ -0.02 pp 🟡

✅ Gate passed

No surface regressed past the allowed threshold and the aggregate stayed above the floor.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

📐 Patch coverage gate

Threshold: 80% on lines this PR touches vs origin/main (from .coverage-gate.toml:thresholds.min_patch).

Surface Touched lines Patch coverage Status
control-plane 9 100.00%
sdk-go 0 ➖ no changes
sdk-python 0 ➖ no changes
sdk-typescript 0 ➖ no changes
web-ui 279 87.00%

✅ Patch gate passed

Every surface whose lines were touched by this PR has patch coverage at or above the threshold.

@ivasuy ivasuy force-pushed the feat/ui-llm-health-widget-322 branch from 6390fc1 to 2a9d56e Compare May 7, 2026 13:02
@ivasuy ivasuy force-pushed the feat/ui-llm-health-widget-322 branch from 2a9d56e to 8f200aa Compare May 7, 2026 13:31
ivasuy and others added 5 commits May 7, 2026 19:08
The test pinned the *boot* status of a freshly-minted replay row
(=replayed), but ReplayEvent kicks off the dispatcher in a goroutine
that immediately marks the row "failed" because the test fixture has
no target agent node registered. Under CI load that goroutine wins
the race against the test's GetInboundEvent and the assertion flips
to "failed", breaking the control-plane coverage job and (cascading)
the coverage-summary check.

Pass nil for the dispatcher in this test — many sibling tests in
triggers_api_contract_test.go already do this — and gate the three
goroutine launch sites in the trigger handler on a nil-check so the
contract is honored consistently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AbirAbbas
AbirAbbas previously approved these changes May 7, 2026
@AbirAbbas AbirAbbas enabled auto-merge May 7, 2026 15:59
The "filters active triggers..." test queries for the EventRow toggle
button with getByRole synchronously, but the events list is fetched
async by TriggerSheet on open (useEffect → refreshEvents). Under CI
load the fetch hasn't resolved when the assertion runs, so no EventRow
has mounted and the query throws. Locally the fetch is fast enough
that the race never surfaces.

Switch to findByRole so the query waits for the events list to render.
The earlier getAllByText still passes because it matches the trigger's
event_types summary which renders synchronously from the trigger row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@AbirAbbas AbirAbbas added this pull request to the merge queue May 7, 2026
Merged via the queue into Agent-Field:main with commit 27614d8 May 7, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UI: LLM health status widget in dashboard

2 participants