Skip to content

feat(web): operations-first control plane UI revamp with live updates and node logs#330

Open
santoshkumarradha wants to merge 107 commits intomainfrom
feat/ui-revamp-product-research
Open

feat(web): operations-first control plane UI revamp with live updates and node logs#330
santoshkumarradha wants to merge 107 commits intomainfrom
feat/ui-revamp-product-research

Conversation

@santoshkumarradha
Copy link
Copy Markdown
Member

@santoshkumarradha santoshkumarradha commented Apr 4, 2026

Summary

Refreshes the embedded control plane web UI into an operations-first control surface and now also folds in the full live-update stack plus agent node process logs.

Final integrated branch tip: 5236e69

Included scope

This PR now combines three previously separate lines of work into one final integration PR:

  1. Operations-first control plane UI revamp
  • shell and navigation refresh
  • dashboard, runs, agents, reasoners, settings, access, and provenance UX cleanup
  • tighter table/badge/status patterns
  1. Unified live updates and adaptive polling
  • shared SSESyncProvider at the app shell
  • execution, node, and reasoner event-driven query invalidation
  • adaptive fallback polling when the relevant SSE stream is unavailable
  • health strip and page-level live status aligned with the actual backing stream
  1. Agent node process logs and execution observability
  • control-plane proxy for GET /api/ui/v1/nodes/:id/logs
  • settings for node log proxy tail and timeout limits
  • NodeProcessLogsPanel on Agents and Node Detail pages
  • structured execution logs on the execution detail page
  • advanced raw node log debugging integrated into the execution experience
  • NDJSON process log capture in Go, Python, and TypeScript SDKs
  • execution-context stamping and structured execution log ingestion across SDK/runtime/control-plane layers
  • docs and functional coverage for the observability flow in the shared tests/functional harness

Product direction

  • Operations before analytics: health, queues, runs, and recovery actions come first.
  • Layered depth: shell and health strip stay persistent, while runs, DAG, and step detail handle diagnosis.
  • Recovery is normal: retry, cancel, cleanup, and next-step affordances stay close to degraded states.
  • Control-plane mental model: nodes and executions are the moving parts; workflows are the jobs.

Review and integration notes

  • Earlier stacked PRs for live updates and node logs were folded into this branch and superseded.
  • The SSE provider-based design won during final integration; older page-local invalidation wiring was removed.
  • The node logs proxy follows the same control-plane-to-agent trust model already used on execute paths.
  • Draft PR docs: execution observability RFC #342 was merged into this branch as the execution-observability implementation line and should be treated as already integrated here.

Issue linkage

Fixes #324.

Validation status

GitHub CI is the source of truth for this final integrated branch and is running on PR #330 after commit 5236e69.

Local verification on the integrated branch included:

  • npm exec -- tsc --noEmit
  • go test ./internal/handlers/... ./internal/storage/... ./internal/server/... -count=1
  • shared docker functional run for tests/test_go_sdk_cli.py -k observability

Previous review fixes still included

  • audit verification hardening for HTTP verification path
  • DID did:web encoded-port handling fix
  • SSE/CORS hardening already present on this branch
  • TypeScript multimodal helper fixes and tests already present on this branch

santoshkumarradha and others added 30 commits April 1, 2026 19:38
Replace all @phosphor-icons/react imports with lucide-react equivalents.
Rewrote icon-bridge.tsx to re-export Lucide icons under the same names
used throughout the codebase, so no consumer files needed changing.
Updated icon.tsx to use Lucide directly. Removed weight= props from
badge.tsx, segmented-status-filter.tsx, and ReasonerCard.tsx since
Lucide does not support the Phosphor weight API.
…tures being redesigned

- Delete src/components/mcp/ (MCPServerList, MCPServerCard, MCPHealthIndicator, MCPServerControls, MCPToolExplorer, MCPToolTester)
- Delete src/components/authorization/ (AccessRulesTab, AgentTagsTab, ApproveWithContextDialog, PolicyContextPanel, PolicyFormDialog, RevokeDialog)
- Delete src/components/packages/ (AgentPackageCard, AgentPackageList)
- Delete src/components/did/ (DIDIdentityCard, DIDDisplay, DIDStatusBadge, DIDInfoModal, DIDIndicator)
- Delete src/components/vc/ (VCVerificationCard, WorkflowVCChain, SimpleVCTag, SimpleWorkflowVC, VCDetailsModal, VCStatusIndicator, VerifiableCredentialBadge)
- Delete MCP hooks: useMCPHealth, useMCPMetrics, useMCPServers, useMCPTools
- Delete pages: AuthorizationPage, CredentialsPage, DIDExplorerPage, PackagesPage, WorkflowDeckGLTestPage
- Remove MCP Servers, Tools, Performance tabs from NodeDetailPage
- Remove Identity & Trust and Authorization sections from navigation config
- Remove deprecated routes from App.tsx router
- Fix broken imports in WorkflowDetailPage, ReasonerDetailPage
- Trim src/mcp/index.ts barrel to API services + types only (no component re-exports)

API services (vcApi, mcpApi), types, and non-MCP hooks are preserved.
TypeScript check passes with zero errors after cleanup.
… foundation system

- Rewrote src/index.css with clean standard shadcn/ui theme (HSL tokens for light/dark mode)
- Deleted src/styles/foundation.css (custom token system)
- Rewrote tailwind.config.js to minimal shadcn-standard config (removed custom spacing, fontSize, lineHeight, transitionDuration overrides)
- Replaced ~130 component files: bg-bg-*, text-text-*, border-border-*, text-nav-*, bg-nav-*, text-heading-*, text-body*, text-caption, text-label, text-display, interactive-hover, card-elevated, focus-ring, glass, gradient-* with standard shadcn equivalents
- Migrated status sub-tokens (status-success-bg, status-success-light, status-success-border etc.) to opacity modifiers on base status tokens
- Updated lib/theme.ts STATUS_TONES to use standard token classes
- Fixed workflow-table.css status dot and node status colors to use hsl(var(--status-*))
- Zero TypeScript errors after migration
…ariants

- Delete 4 JSON viewer duplicates (JsonViewer, EnhancedJsonViewer x2, AdvancedJsonViewer); all callers already use UnifiedJsonViewer
- Delete 3 execution header duplicates (ExecutionHero, ExecutionHeader, EnhancedExecutionHeader); update RedesignedExecutionDetailPage to use CompactExecutionHeader
- Delete 3 status indicator duplicates (ui/StatusIndicator, ui/status-indicator, reasoners/StatusIndicator); consolidate legacy StatusIndicator into UnifiedStatusIndicator module and create ReasonerStatusDot for reasoner-specific dot display
- Delete RedesignedInputDataPanel and RedesignedOutputDataPanel standalone files; InputDataPanel/OutputDataPanel already export backward-compat aliases
- Delete legacy Navigation/Sidebar, NavigationItem, NavigationSection (unused; SidebarNew is active)
- Delete enterprise-card.tsx (no callers; card.tsx already exports cardVariants)
- Delete animated-tabs.tsx; add AnimatedTabs* re-exports to tabs.tsx and update 5 callers
…dark mode default

- Rewrote navigation config with 5 items: Dashboard, Runs, Agents, Playground, Settings
- Built AppSidebar using shadcn Sidebar with icon-rail collapsed by default (collapsible="icon")
- Built HealthStrip sticky bar showing LLM, Agent fleet, and Queue status placeholders
- Built AppLayout using SidebarProvider/SidebarInset/Outlet pattern with breadcrumb header
- Updated App.tsx to use AppLayout as layout route wrapper, removing old SidebarNew/TopNavigation
- Added placeholder routes for /runs, /playground and their detail pages
- Set defaultTheme="dark" for dark-first UI
- All existing pages (Dashboard, Executions, Workflows, Nodes, Reasoners) preserved under new layout
…ents, health

- Install @tanstack/react-query v5
- Create src/lib/query-client.ts with 30s stale time, 5min GC, retry=1
- Wrap App with QueryClientProvider
- Add src/hooks/queries/ with useRuns, useRunDAG, useStepDetail, useAgents, useLLMHealth, useQueueStatus, useCancelExecution, usePauseExecution, useResumeExecution
- Barrel export via src/hooks/queries/index.ts
- Hooks delegate to existing service functions (workflowsApi, executionsApi, api)
- Polling: agents 10s, system health 5s, active run DAGs 3s
…lows

Add RunsPage component at /runs with:
- Filter bar: time range, status, and debounced search
- Table with columns: Run ID, Root Reasoner, Steps, Status, Duration, Started
- Checkbox row selection with bulk action bar (Compare / Cancel Running)
- Paginated data via useRuns hook with Load more support
- Status badge using existing badge variants (destructive/default/secondary)
- Duration formatting (Xs, Xm Ys, —)
- Row click navigates to /runs/:runId

Wire RunsPage into App.tsx replacing the placeholder at /runs.
…iew results

Adds a new /playground and /playground/:reasonerId route with:
- Reasoner selector grouped by agent node
- Split-pane JSON input textarea and result display
- Execute button with loading state (Loader2 spinner)
- View as Execution link on successful run
- Recent Runs table (last 5) with Load Input shortcut
- Route-sync: selecting a reasoner updates the URL path
…t runs

Replaces /dashboard with NewDashboardPage — a focused, operations-first
view that answers "Is anything broken? What's happening now?" rather than
displaying metrics charts. The legacy enhanced dashboard is preserved at
/dashboard/legacy.

Key sections:
- Issues banner (conditional): surfaces unhealthy LLM endpoints and
  queue-saturated agents via useLLMHealth / useQueueStatus polling
- Recent Runs table: last 10 runs with reasoner, step count, status
  badge, duration, and relative start time; click navigates to detail
- System Overview: 4 stat cards (Total Runs Today, Success Rate,
  Agents Online, Avg Run Time) backed by dashboardService + TanStack
  Query with auto-refresh
…ner list

Replaces the /agents placeholder with a fully functional page showing
each registered agent node as a collapsible Card. Each card displays
status badge with live dot, last heartbeat, reasoner/skill count,
health score, and an inline reasoner list fetched lazily from
GET /nodes/:id/details. Supports Restart and Config actions. Auto-
refreshes every 10 s via useAgents polling.
…tity, About

Adds NewSettingsPage with four tabs:
- General: placeholder for future system config
- Observability: full webhook config (migrated from ObservabilityWebhookSettingsPage) with live forwarder status and DLQ management
- Identity: DID system status, server DID display, export credentials
- About: version, server URL, storage mode

Updates App.tsx to route /settings to NewSettingsPage and redirect /settings/observability-webhook to /settings.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds RunDetailPage (/runs/:runId) as the primary execution inspection
screen, replacing the placeholder. Features a split-panel layout with
a proportional-bar execution trace tree on the left and collapsible
Input/Output/Notes step detail on the right. Single-step runs skip the
trace and show step detail directly. Includes smart polling for active
runs and a Trace/Graph toggle (graph view placeholder).

New files:
- src/pages/RunDetailPage.tsx — main page, wires useRunDAG + state
- src/components/RunTrace.tsx — recursive trace tree with duration bars
- src/components/StepDetail.tsx — step I/O panel with collapsible sections

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace 4 stat cards with a horizontal stats strip above the table
- Fix duration formatter to handle hours and days (e.g. "31d 6h", "5h 23m")
- Compact table rows: TableHead h-8 px-3 text-[11px], TableCell px-3 py-1.5
- Table text reduced to text-xs for all data columns
- Remove double padding — page container is now plain flex col gap-4
- Remove Separator between CardHeader and table
- Tighten CardHeader to py-3 px-4 with text-sm font-medium title
- Limit recent runs to 15 (up from 10)
- Fix "View All" link to navigate to /runs instead of /workflows
- Remove unused StatCard component and Clock/XCircle imports
…clickable headers

- Cards start collapsed by default (was open): prevents 300+ item flood with 15 agents × 20+ reasoners
- Entire card header row is the expand/collapse trigger (was isolated chevron button on far right)
- Reasoner rows reduced to py-1 ~24px (was ~40px with tree characters)
- Removed tree characters (├──), replaced with clean font-mono list
- Play button always visible (was hidden on hover) with icon + label
- Truncate reasoner list at 5, "Show N more" link to expand
- Removed Config button and Restart text label — icon-only restart button
- Removed redundant "15 TOTAL" badge from page header
- Replaced space-y-* with flex flex-col gap-2 for card list
- Removed Card/CardHeader/CardContent/Collapsible/Separator — plain divs for density
- TableHead height reduced from h-10 to h-8, padding px-4 → px-3, text-[11px]
- TableCell padding reduced from p-4 to px-3 py-1.5 across all row cells
- Table base text changed from text-sm to text-xs for dense data display
- Run ID and Started cells use text-[11px], Reasoner cell uses text-xs font-medium
- Steps and Duration cells use tabular-nums for numeric alignment
- formatDuration now handles ms, seconds, minutes, hours, and days correctly
- space-y-4 → space-y-3 and mb-4 → mb-3 for tighter page layout
…imestamps

- Rewrite AgentsPage from bordered cards to a borderless divide-y list inside a single Card
- Fix formatRelativeTime to guard against bogus/epoch timestamps (was showing '739709d ago')
- Expanded reasoner rows now render inline (bg-muted/30, pl-8, text-[11px]) instead of in a nested Card
- Remove page <h1> heading from AgentsPage — breadcrumb in AppLayout already identifies the page
- Add delayDuration={300} to HealthStrip TooltipProvider so tooltips don't appear immediately
- navigation.ts already correct (5 items, correct icons) — no change needed
- Dashboard already reads runsQuery.data?.workflows and navigates to /runs — no change needed
…ips, theme toggle

- Use useSidebar() state to conditionally render logo text vs icon-only in collapsed mode,
  eliminating text overflow/clipping behind the icon rail
- Add SidebarRail for drag-to-resize handle on desktop
- Add SidebarSeparator between header and nav content for visual separation
- Implement ModeToggle in SidebarFooter (sun/moon theme toggle, centered when collapsed)
- Replace bg-primary/text-primary-foreground with bg-sidebar-primary/text-sidebar-primary-foreground
  in logo icon container to use correct semantic sidebar tokens
- Use text-sidebar-foreground and text-sidebar-foreground/60 for logo text
- Add tooltip="AgentField" to logo SidebarMenuButton so collapsed state shows tooltip on hover
- Header bar: use border-sidebar-border and bg-sidebar/30 backdrop-blur instead of border-border
…result linking

- Add cURL dropdown with sync and async variants; clipboard copy with "Copied!" feedback
- Add collapsible schema section showing input_schema and output_schema when a reasoner is selected
- Show status badge and duration in Result card header after execution
- Replace "View as Execution" with "View Run →" linking to /runs/:runId
- Add "Replay" button to re-run with same input
…observability check

- AppLayout: change SidebarProvider defaultOpen from false to true so
  sidebar shows labels on first load (users can collapse via Cmd+B)
- Settings/General: replace empty placeholder with useful content —
  API endpoint display with copy button and quick-start env var snippet
- Settings/Identity: fix Server DID display — was incorrectly showing
  res.message (a status string) as the DID; now fetches the actual DID
  from /api/v1/did/agentfield-server and displays it with a copy button;
  shows "DID system not configured" when unavailable (local mode)
- Settings: default tab remains "general" which is now useful content
- Settings/Observability: tab already has full webhook config, status,
  DLQ management — no changes needed
…uttons

- Dashboard: routes already correct (/runs/:runId and /runs)
- Playground: View Run link already uses /runs/:runId
- HealthStrip: connected to real data (useLLMHealth, useQueueStatus, useAgents)
- RunsPage: added agent filter Select, functional Compare Selected and Cancel Running buttons
- RunDetailPage: removed broken Trace/Graph toggle (Tabs/ViewMode were declared but unused), added Cancel Run button (useCancelExecution) for running runs and Replay button for failed/timeout runs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…opy, VC export

- StepDetail: replace plain <pre> blocks with JsonHighlight component
  (regex-based coloring for keys, strings, numbers, booleans, null)
- StepDetail: add copy-action row (Copy cURL, Copy Input, Copy Output)
  with transient check-icon feedback after clipboard write
- RunDetailPage: add Export VC button in header that opens the
  /api/v1/did/workflow/:id/vc-chain endpoint in a new tab
- RunTrace: extend formatDuration to handle hours (Xh Ym) and days (Xd Yh)
…order, status dots

- Add click-to-sort on Status, Steps, Duration, and Started headers with
  asc/desc arrow indicators; sort state flows through useRuns to the API
- Reorder columns: Status | Reasoner | Agent | Steps | Duration | Started | Run ID
  (status first for scannability, run ID de-emphasised at the far right)
- Add Agent column showing agent_id / agent_name per row
- Replace Badge with a compact StatusDot (coloured dot + short label) for
  denser status display in table rows
- Update search placeholder to "Search runs, reasoners, agents…" to reflect
  multi-field search capability
- Import cn from @/lib/utils for conditional class merging
Wire up the existing WorkflowDAGViewer component into the Run Detail
page as a proper Graph tab alongside the Trace view. Multi-step runs
show a Trace/Graph toggle in the header; single-step runs skip the
toggle entirely and show step detail directly. Clicking a node in the
graph panel selects the step and populates the right-hand detail panel.
…mpty state

- Add copy button next to each Run ID (copies full ID to clipboard)
- Combine Agent + Reasoner columns into a single "Target" column showing
  agent.reasoner in monospace (agent part muted, reasoner part primary)
- Remove separate Agent column; new order: Status | Target | Steps | Duration | Started | Run ID
- Add HoverCard on reasoner cell that lazily fetches and displays root
  execution input/output preview (only when root_execution_id is present)
- Replace plain "No runs found" cell with a centered empty state using
  Play icon and context-aware helper text
- TypeScript: 0 errors
…n, active sidebar

- RunDetailPage: flex column layout with h-[calc(100vh-8rem)] so trace/step
  panels fill the viewport instead of using fixed 500px heights
- Reorganized header: status badge and DID badge inline with title, subtitle
  shows workflow name + step count + duration
- Added Replay button (navigates to playground with agent/reasoner target)
- Added Copy ID button for quick clipboard access to the run ID
- Replaced single Export VC button with an Export dropdown containing
  "Export VC Chain" and "Export Audit Log" (downloads JSON)
- AppSidebar: active nav item now renders a left-edge accent bar
  (before:w-0.5 bg-sidebar-primary) for clear visual distinction in both
  light and dark mode, supplementing the existing bg-sidebar-accent fill
…roup separators

- Add sequential step numbers (1-based) on every trace row for disambiguation
- Show relative start times per step ("+0:00", "+1:23") anchored to run start
- Color-code duration bars: green=succeeded, red=failed, amber=timeout, blue/pulse=running
- Replace large status icons with compact inline status dots (size-1.5)
- Add group count badge (×N) on first node of consecutive same-reasoner runs
- Add subtle border separator when reasoner_id changes between siblings
- Reduce row height to py-1 (28px) for better visual density
- Pass runStartedAt prop from RunDetailPage down to RunTrace
Adds a CommandPalette component using shadcn Command + Dialog, registered
globally via AppLayout. Cmd+K / Ctrl+K toggles the palette; items navigate
to Dashboard, Runs, Agents, Playground, Settings, and filtered run views.
A ⌘K hint badge is shown in the header bar on medium+ screens.
@santoshkumarradha
Copy link
Copy Markdown
Member Author

@AbirAbbas ready to go after your review.

CC: @Sridhar-Vetrivel / @SivasankaranPSIOG

- Use constant-time comparison (crypto/subtle) for bearer token
  validation in Go SDK process logs endpoint
- Add MaxBytesReader (10 MiB) to execution logs ingestion handler
  to prevent memory exhaustion from oversized payloads
- Remove accidentally committed .cursor/ IDE state files
- Add .cursor/ to .gitignore

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:control-plane Control plane server functionality area:did DID/VC cryptographic identity area:web-ui React web UI functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UI: Live execution log viewer (SSE)

3 participants