Skip to content

feat: service detail overview tab with AI brief + MCP-powered panels#45

Merged
WZ merged 23 commits into
mainfrom
feature/service-detail-overview
Mar 26, 2026
Merged

feat: service detail overview tab with AI brief + MCP-powered panels#45
WZ merged 23 commits into
mainfrom
feature/service-detail-overview

Conversation

@WZ
Copy link
Copy Markdown
Owner

@WZ WZ commented Mar 26, 2026

Summary

  • Add new Overview tab as default landing view on Service Detail page
  • AI Brief (full-width, top) — LLM-generated 2-3 sentence summary correlating recent changes with current health
  • Recent Changes section — deployments, MRs, config changes from GitLab MCP
  • Infrastructure section — K8s pod status, CPU/memory utilization bars, restart counts, warning events from K8s MCP
  • Dependencies — existing graph enhanced with live health badges, "estimated topology" label, accessible table fallback
  • New /api/services/:name/brief aggregator endpoint with per-section caching, timeouts, and graceful degradation
  • Input validation on service name route parameter
  • Prompt sanitization for LLM inputs

Architecture

GET /api/services/:name/brief
  ├── Promise.allSettled([changes, infra, deps])
  ├── Per-section timeout (3s) with timer cleanup
  ├── Server-side cache (TTLs: deps 5min, infra 30s, changes 2min)
  ├── In-flight request dedup
  ├── generateText() for AI summary (after data settles)
  └── Graceful degradation (failed sections → null + informative empty state)

Test Coverage

  • service-brief.test.ts — 8 unit tests covering: happy path, GitLab fail, K8s timeout, all fail, cache hit, stale-while-revalidate, in-flight dedup, AI summary fail
  • Updated ServiceDetail.test.tsx — default tab changed from Metrics to Overview
  • All 60 test files, 717 tests passing

Pre-Landing Review

5 issues found (0 critical, 5 informational) — all fixed:

  • Dead code removed from ServiceDependencyGraph
  • ServiceBrief type guard fixed
  • Shared freshness utility extracted
  • Double-fetch eliminated via initialData props
  • Timer leak fixed in withTimeout

Adversarial Review

16 findings from Claude adversarial review (large tier, 2319 lines):

  • [FIXED] Timer leak in withTimeout — clearTimeout added
  • [FIXED] Input validation on :name route param — regex pattern, all 5 routes
  • [FIXED] LLM prompt injection — sanitizeForPrompt() for service names and MCP data
  • [FIXED] Error state shows infinite skeletons — error state + retry button added
  • [FIXED] initialDepData ref changes — useMemo added
  • [DEFERRED] O(n²) inferDependencyGraph — pre-existing code, not introduced by this PR
  • [DEFERRED] Substring matching false positives — pre-existing, Phase 2 replaces with real topology
  • [DEFERRED] Namespace hardcoded to "default" — TODO comment, needs ServiceConfig schema extension

Plan Completion

19/21 DONE, 0 NOT DONE, 2 CHANGED (equivalent approaches):

Test plan

  • npx tsc --noEmit — type check passes
  • npx vitest run — 60 files, 717 tests, all passing
  • Manual: Overview tab loads as default on service detail
  • Manual: existing Metrics/History/Dependencies tabs unaffected

🤖 Generated with Claude Code

Wilson Li added 23 commits March 26, 2026 11:49
…Schema

Creates src/types/service-brief.ts with all 10 types required by the
Service Detail Overview tab: BriefDependencyNode/Edge, Deployment,
MergeRequest, ConfigChange, ContainerStatus, K8sEvent, SectionStatus,
AISummary, and ServiceBrief.  Adds gitlabProject: z.string().optional()
to ServiceSchema so config.yaml can wire a service to a GitLab project.
- Make SectionStatus.fetchedAt optional (no timestamp for unconfigured/error states)
- Extract ChangesSection and InfrastructureSection as named exported interfaces
- Extract DependencyGraphSource as a named exported type
- Widen Deployment.pipelineStatus to string with doc comment listing known values
- Fix BriefDependencyEdge doc comment (was "DependencyNode concept")
…d AI summary

Implements buildServiceBrief() — the core backend aggregator for the
Service Detail Overview tab. Queries changes (GitLab MCP), infrastructure
(K8s MCP), and dependencies in parallel, then generates an AI summary
correlating recent changes with current health.

Key features:
- Per-section timeouts (3s) via Promise.race
- In-memory cache with per-section TTLs and stale-while-revalidate
- In-flight dedup (concurrent callers share the same Promise)
- Graceful degradation (each section independent)
- Extracts inferDependencyGraph for reuse from routes.ts

Includes 8 unit tests covering happy path, partial failures, timeouts,
cache hits, stale-while-revalidate, in-flight dedup, and LLM failure.
…el fetch, error vs unconfigured, fake timers

- withTimeout() now creates an AbortController per section and aborts it on timeout
- doBuildServiceBrief() uses unconditional Promise.allSettled([fetchChanges, fetchInfra, fetchDeps]);
  each fetcher owns its own cache check internally (stale-while-revalidate stays intact)
- fetchChanges/fetchInfra let getToolsByRole errors propagate (→ "error" status);
  empty tools still returns null (→ "unconfigured" status)
- Test 3 uses vi.useFakeTimers() + vi.advanceTimersByTimeAsync() instead of real 5s wait
- All 8 tests pass in <10ms
…eout, stale summary refresh

- Fix misleading "unconditional" comment to describe actual conditional fan-out behavior
- Simplify withTimeout to return Promise<T> directly (remove unused AbortController signal)
- Extract inferDependencyGraph to shared src/server/dependency-graph.ts (DRY)
- Add background refresh path for stale AI summaries (fire-and-forget like data sections)
- Add TODO comments for gitlabProject wiring and hardcoded namespace
- Remove unused makeFailingTool helper and unnecessary type casts in tests
- Add stale status assertion to stale-while-revalidate test
Delegates to buildServiceBrief() for aggregated service overview data
including changes, infrastructure, dependencies, and AI summary.
Full-width card with teal left-border accent, shimmer loading, per-state
rendering (ok/stale/error/unconfigured/null), evidence ref badges,
relative freshness timestamp with warning color on stale, and AI-generated
label. Exports ServiceBriefSkeleton for in-flight placeholder use.
Merges deployments, merge requests, and config changes into a single
time-sorted timeline (capped at 10 items) with per-type color dots,
metadata lines, freshness indicator, and all five UI states (loading,
data, empty, error, unconfigured).
Shows workload type + replica status, per-container CPU/memory utilization
bars with green/warning/destructive thresholds, restart counts, and Warning
K8s events. Handles loading, error, unconfigured, and empty states with
freshness indicator.
…le to ServiceDependencyGraph

- New optional props: healthMap (per-service health status) and dependencySource
- Health dot (6px circle) rendered left of node label when healthMap is provided, using success/warning/destructive/muted-foreground CSS vars
- "Estimated topology" disclaimer shown below graph when dependencySource is "inferred" or omitted (10px JetBrains Mono, muted-foreground/50)
- Collapsible "Show as table" section below graph with role=table/row/cell markup, keyboard-navigable rows, and health dots in Status column
- onNodeClick fixed to always pass plain service name string (via _labelText) regardless of whether dot label JSX is used
…pendency sections

Create ServiceOverview.tsx that fetches /api/services/:name/brief and
renders all four section components (ServiceBrief, RecentChanges,
InfrastructureStatus, ServiceDependencyGraph) with skeleton loaders.

Wire Overview as the new default tab in ServiceDetail.tsx via lazy loading,
keeping existing Metrics/History/Dependencies tabs unchanged.
The default tab changed from Metrics to Overview as part of the
Service Brief feature. Update the test assertion to match.
- ServiceDependencyGraph: remove dead labelElement HTML-string and nodeById map
- ServiceBrief: remove unreachable summary===undefined check; loading is
  determined by the parent's sectionStatus, not by checking for undefined
- RecentChanges + InfrastructureStatus: extract duplicated STALE_THRESHOLD_MS
  and freshness formatting into src/web/lib/freshness.ts
- ServiceDependencyGraph: add initialData/initialHealthMap props so
  ServiceOverview can pass pre-fetched brief data instead of triggering
  a redundant /api/dependencies fetch
…zation, error state

- Fix withTimeout timer leak: clear setTimeout handle via .finally() when
  the underlying promise wins the race
- Add NAME_PATTERN validation on :name route params to reject cache-key
  injection (colon separator) and restrict to K8s-valid names (max 253 chars)
- Add sanitizeForPrompt() to strip control characters and truncate
  user-derived fields before embedding in LLM prompts
- Track fetch error state in ServiceOverview to show error UI instead of
  infinite loading skeletons when /brief fails entirely
- Wrap initialDepData in useMemo to prevent unnecessary re-renders of
  ServiceDependencyGraph
Add coerce helpers (coerceContainer, coerceEvent, coerceDeployment, coerceMergeRequest)
that produce typed structs with sensible defaults from unknown MCP tool output, preventing
NaN/undefined from reaching callers. Add MAX_CACHE_ENTRIES=200 constant and LRU-style
eviction (oldest insertion order) in setCache() to bound memory growth.
The AI Brief section showed "Configure an LLM provider" because
the LLM model wasn't passed through registerRoutes to buildServiceBrief.
When K8s MCP returns a service-not-found response with all zeros
(replicas: 0/0, containers: [], events: []), don't feed that to the
LLM. The AI was reporting "0/0 replicas" for services that actually
had pods running (visible via Prometheus metrics).
Bump console header buttons (Clear, New chat) from caption-tier (10px)
to label-tier (11px) with proper height (h-7) to reduce visual gap
with service detail action buttons. Bump RCA card body text from
10-11px to 11-12px for better readability and DESIGN.md alignment.
- Match console buttons (Clear, New chat) to service detail secondary
  button style: h-9, px-4, 12px font, rounded-lg, border-border/50
- Brighten RCA card text in console (foreground opacity 60-75% → 85-90%)
- Change Investigate button from pill (rounded-full) to rounded-lg
Sync initialData when it arrives after mount — the parent fetches
the brief async, so initialData starts undefined then becomes defined.
Also cap fitView maxZoom at 0.8 to prevent single-node graphs from
appearing oversized.
Update sidebar tooltip, service detail breadcrumb, and test.
@WZ WZ merged commit 19b1f46 into main Mar 26, 2026
1 check passed
WZ added a commit that referenced this pull request Apr 2, 2026
…45)

* feat(types): add ServiceBrief interfaces and gitlabProject to ServiceSchema

Creates src/types/service-brief.ts with all 10 types required by the
Service Detail Overview tab: BriefDependencyNode/Edge, Deployment,
MergeRequest, ConfigChange, ContainerStatus, K8sEvent, SectionStatus,
AISummary, and ServiceBrief.  Adds gitlabProject: z.string().optional()
to ServiceSchema so config.yaml can wire a service to a GitLab project.

* refactor(types): clean up service-brief.ts per code review

- Make SectionStatus.fetchedAt optional (no timestamp for unconfigured/error states)
- Extract ChangesSection and InfrastructureSection as named exported interfaces
- Extract DependencyGraphSource as a named exported type
- Widen Deployment.pipelineStatus to string with doc comment listing known values
- Fix BriefDependencyEdge doc comment (was "DependencyNode concept")

* feat(server): add service-brief aggregator with caching, timeouts, and AI summary

Implements buildServiceBrief() — the core backend aggregator for the
Service Detail Overview tab. Queries changes (GitLab MCP), infrastructure
(K8s MCP), and dependencies in parallel, then generates an AI summary
correlating recent changes with current health.

Key features:
- Per-section timeouts (3s) via Promise.race
- In-memory cache with per-section TTLs and stale-while-revalidate
- In-flight dedup (concurrent callers share the same Promise)
- Graceful degradation (each section independent)
- Extracts inferDependencyGraph for reuse from routes.ts

Includes 8 unit tests covering happy path, partial failures, timeouts,
cache hits, stale-while-revalidate, in-flight dedup, and LLM failure.

* fix: spec compliance — AbortController timeouts, unconditional parallel fetch, error vs unconfigured, fake timers

- withTimeout() now creates an AbortController per section and aborts it on timeout
- doBuildServiceBrief() uses unconditional Promise.allSettled([fetchChanges, fetchInfra, fetchDeps]);
  each fetcher owns its own cache check internally (stale-while-revalidate stays intact)
- fetchChanges/fetchInfra let getToolsByRole errors propagate (→ "error" status);
  empty tools still returns null (→ "unconfigured" status)
- Test 3 uses vi.useFakeTimers() + vi.advanceTimersByTimeAsync() instead of real 5s wait
- All 8 tests pass in <10ms

* fix: code review cleanup — extract dependency graph, simplify withTimeout, stale summary refresh

- Fix misleading "unconditional" comment to describe actual conditional fan-out behavior
- Simplify withTimeout to return Promise<T> directly (remove unused AbortController signal)
- Extract inferDependencyGraph to shared src/server/dependency-graph.ts (DRY)
- Add background refresh path for stale AI summaries (fire-and-forget like data sections)
- Add TODO comments for gitlabProject wiring and hardcoded namespace
- Remove unused makeFailingTool helper and unnecessary type casts in tests
- Add stale status assertion to stale-while-revalidate test

* feat: add GET /api/services/:name/brief route handler

Delegates to buildServiceBrief() for aggregated service overview data
including changes, infrastructure, dependencies, and AI summary.

* feat: add ServiceBrief.tsx AI summary card for service overview tab

Full-width card with teal left-border accent, shimmer loading, per-state
rendering (ok/stale/error/unconfigured/null), evidence ref badges,
relative freshness timestamp with warning color on stale, and AI-generated
label. Exports ServiceBriefSkeleton for in-flight placeholder use.

* feat: add RecentChanges.tsx timeline component for service overview

Merges deployments, merge requests, and config changes into a single
time-sorted timeline (capped at 10 items) with per-type color dots,
metadata lines, freshness indicator, and all five UI states (loading,
data, empty, error, unconfigured).

* feat: add InfrastructureStatus.tsx K8s resource cards

Shows workload type + replica status, per-container CPU/memory utilization
bars with green/warning/destructive thresholds, restart counts, and Warning
K8s events. Handles loading, error, unconfigured, and empty states with
freshness indicator.

* feat: add health badges, estimated topology label, and accessible table to ServiceDependencyGraph

- New optional props: healthMap (per-service health status) and dependencySource
- Health dot (6px circle) rendered left of node label when healthMap is provided, using success/warning/destructive/muted-foreground CSS vars
- "Estimated topology" disclaimer shown below graph when dependencySource is "inferred" or omitted (10px JetBrains Mono, muted-foreground/50)
- Collapsible "Show as table" section below graph with role=table/row/cell markup, keyboard-navigable rows, and health dots in Status column
- onNodeClick fixed to always pass plain service name string (via _labelText) regardless of whether dot label JSX is used

* feat: add ServiceOverview tab composing brief, changes, infra, and dependency sections

Create ServiceOverview.tsx that fetches /api/services/:name/brief and
renders all four section components (ServiceBrief, RecentChanges,
InfrastructureStatus, ServiceDependencyGraph) with skeleton loaders.

Wire Overview as the new default tab in ServiceDetail.tsx via lazy loading,
keeping existing Metrics/History/Dependencies tabs unchanged.

* test: update ServiceDetail test for Overview as default tab

The default tab changed from Metrics to Overview as part of the
Service Brief feature. Update the test assertion to match.

* fix: pre-landing review fixes

- ServiceDependencyGraph: remove dead labelElement HTML-string and nodeById map
- ServiceBrief: remove unreachable summary===undefined check; loading is
  determined by the parent's sectionStatus, not by checking for undefined
- RecentChanges + InfrastructureStatus: extract duplicated STALE_THRESHOLD_MS
  and freshness formatting into src/web/lib/freshness.ts
- ServiceDependencyGraph: add initialData/initialHealthMap props so
  ServiceOverview can pass pre-fetched brief data instead of triggering
  a redundant /api/dependencies fetch

* fix: adversarial review — timer leak, input validation, prompt sanitization, error state

- Fix withTimeout timer leak: clear setTimeout handle via .finally() when
  the underlying promise wins the race
- Add NAME_PATTERN validation on :name route params to reject cache-key
  injection (colon separator) and restrict to K8s-valid names (max 253 chars)
- Add sanitizeForPrompt() to strip control characters and truncate
  user-derived fields before embedding in LLM prompts
- Track fetch error state in ServiceOverview to show error UI instead of
  infinite loading skeletons when /brief fails entirely
- Wrap initialDepData in useMemo to prevent unnecessary re-renders of
  ServiceDependencyGraph

* fix: MCP output shape validation and cache size bound

Add coerce helpers (coerceContainer, coerceEvent, coerceDeployment, coerceMergeRequest)
that produce typed structs with sensible defaults from unknown MCP tool output, preventing
NaN/undefined from reaching callers. Add MAX_CACHE_ENTRIES=200 constant and LRU-style
eviction (oldest insertion order) in setCache() to bound memory growth.

* fix: wire LLM model into service brief endpoint

The AI Brief section showed "Configure an LLM provider" because
the LLM model wasn't passed through registerRoutes to buildServiceBrief.

* fix: instruct LLM to output plain text in AI brief, no markdown

* fix: skip default-zero infra data in AI summary prompt

When K8s MCP returns a service-not-found response with all zeros
(replicas: 0/0, containers: [], events: []), don't feed that to the
LLM. The AI was reporting "0/0 replicas" for services that actually
had pods running (visible via Prometheus metrics).

* style: consistent font and button sizes in console panel

Bump console header buttons (Clear, New chat) from caption-tier (10px)
to label-tier (11px) with proper height (h-7) to reduce visual gap
with service detail action buttons. Bump RCA card body text from
10-11px to 11-12px for better readability and DESIGN.md alignment.

* style: polish console & service detail button consistency

- Match console buttons (Clear, New chat) to service detail secondary
  button style: h-9, px-4, 12px font, rounded-lg, border-border/50
- Brighten RCA card text in console (foreground opacity 60-75% → 85-90%)
- Change Investigate button from pill (rounded-full) to rounded-lg

* fix: dependency graph loading forever in Overview tab

Sync initialData when it arrives after mount — the parent fetches
the brief async, so initialData starts undefined then becomes defined.
Also cap fitView maxZoom at 0.8 to prevent single-node graphs from
appearing oversized.

* style: rename Dashboard nav to Operations Desk

Update sidebar tooltip, service detail breadcrumb, and test.

---------

Co-authored-by: Wilson Li <wli02@fortinet.com>
@WZ WZ deleted the feature/service-detail-overview branch April 15, 2026 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant