Skip to content

feat(services): Path view + Overview agent charts (deploy roll-up)#27

Open
vaderyang wants to merge 41 commits into
mainfrom
feat/deploy-services-and-flash
Open

feat(services): Path view + Overview agent charts (deploy roll-up)#27
vaderyang wants to merge 41 commits into
mainfrom
feat/deploy-services-and-flash

Conversation

@vaderyang
Copy link
Copy Markdown
Collaborator

Integration / deploy branch that stacks the open Services-page work
(PR #25) and the LLM-proxy pair-detection work (PR #22), then adds
six new commits on top that make up this PR's reviewable change:

Commit What
6882cd3 Path view — new tab on /services that renders the service→service topology as a directed SVG graph. Backend GET /api/services/topology returns {nodes, edges} (proxy edges from pair sweeper + synthetic __clients__ edges into entry-point services).
7b2f0aa Inferred edges — when an inbound client_ip matches the server_ip of a known service (e.g. LiteLLM forwarding without a pair-sweeper match), draw the edge from that service instead of the anonymous clients node. Dashed-blue line, distinct from solid-blue proxy edges.
bf4887f Perf — 7d window 10× faster. arg_max(body, LENGTH(body)) over a 7-day window scanned 5+ GB of bodies (17 s on prod); replaced with a clipped 24 h window + ROW_NUMBER() OVER (...) WHERE rn <= 5 top-N sampling + body-shape filtering in Rust. Services 17.8 s → 1.5 s; topology 12.9 s → 1.3 s.
fea1d83 Drop weak classifier rule. "uvicorn + ≥ 3 distinct models → litellm" was window-width-sensitive: a vLLM serving Qwen3.5-35B picks up stray model names over 7 d and gets flipped to LiteLLM. Removed; real LiteLLM still detected via x-litellm-* header.
3b35166 Model view as a tab in Services, sidebar tidy. Models entry removed (now reachable as Services → Model tab); /models route still resolves. Sidebar "Traffic" relabelled "Usage" (route unchanged).
8e191a1 Overview agent charts. New endpoints GET /api/agent-turns/summary and GET /api/agent-turns/activity aggregate agent_turns by agent_kind. Two recharts on Overview: stacked-area activity timeseries and horizontal-bar distribution.

Stack note

The 31 commits below 6882cd3 come from the open base PRs:

Once #22 and #25 land, this PR's effective diff against main will be the six commits above.

Verification (live on wuneng)

Endpoint Latency Output
GET /api/services?start&end (7d) 1.5 s (was 17.8 s) 22 service rows
GET /api/services/topology?start&end (7d) 1.3 s (was 12.9 s) 10 nodes, 13 edges
GET /api/agent-turns/summary?start&end (1d) < 100 ms 3 agent_kinds (generic 68 538, hermes 105, openclaw 84)
GET /api/agent-turns/activity?start&end (1d) < 200 ms 108 buckets

Path view edge break-down on prod (last hour):

proxy   (5)  127.0.0.1:4000 (litellm) → multiple sglang/vllm backends
proxy   (1)  172.16.103.81:9000 → 172.17.0.4:30000  (haproxy → docker sglang)
inferred(2)  127.0.0.1:4000 (litellm) → 127.0.0.1:9000 (sglang)   etc.
client  (6)  __clients__ → entry-point services

Test plan

  • Open /services → Table view loads in ~1.5 s on 7d window.
  • Switch to Path tab — graph renders with three edge styles + count labels.
  • Switch to Model tab — embedded ModelsPage works as before.
  • Sidebar shows "Usage" (not "Traffic"); Models entry gone.
  • Overview shows Agent Activity (stacked area) and Agent Distribution (horizontal bar) between the latency row and the model panels.
  • cargo test -p ts-storage-duckdb apps passes (21 tests, classifier fallback removed).

🤖 Generated with Claude Code

Vader Yang and others added 30 commits May 15, 2026 17:01
…d link

Builds on the previous PR (selected id in URL): copying a list page
URL like `?preset=15m&selected=<id>` and opening it half an hour
later would compute `start=now-15m, end=now` from scratch — the
selected item is no longer in that window. The detail panel still
loaded (it queries by id) but the list behind it showed an unrelated
slice, the row had no highlight, prev/next disabled.

Fix the window without changing the original tab's behaviour:

1. List pages also write `?selected_at=<unix_s>` when an item is
   selected — taken from the item's start_time (agent turns) or
   request_time (llm calls / http exchanges). Cleared together with
   `selected` when the panel is closed.

2. `useToolbarUrlSync` reads `selected_at` during hydration. If the
   anchor falls outside the preset-derived window, override:
   - keep the preset's *duration* (the original user's "show me this
     much context" signal),
   - slide so `end = anchor + 60s` (small breathing pad keeps the
     item from sitting flush at the edge in a desc-by-time list),
   - promote `preset` to `custom` so subsequent URL writes carry
     absolute start/end and the shift survives navigation.

   No-op when the anchor is inside the window, absent, unparseable,
   or future-dated relative to a window that already includes it.

3. Pure helper `applySelectedAtAnchor` lives in its own module
   (`selected-at-anchor.ts`, no `@/` aliases) so it's directly
   testable under bun without the toolbar-store / react-router
   runtime chain. 7 unit tests cover the no-op cases, the stale-
   preset shift, default-1h fallback, and clock-skew anchors.

Effects:
- Original tab: relative preset still ticks `now` as usual; no
  surprise switch to `custom`.
- Fresh URL load: window auto-widens / slides to bracket the
  shared item; list, highlight, prev/next all work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ar logo

Three small UI changes batched into one branch — none of them touch
backend or data shapes:

* Overview "Avg TPOT" KPI surfaces as "Avg TPS" with units of tok/s
  (= 1000 / tpot_avg_ms). TPOT itself is what the backend stores;
  the conversion is one division at render time. "Generation speed"
  reads better in a glance than "milliseconds per token".

* Models table column "TPOT" → "Generation TPS", same unit swap.
  Sort key still points at tpot_avg under the hood but getSortValue
  inverts to 1000/tpot_avg so clicking the column desc gives
  fastest-first — matches what someone clicking "Generation TPS"
  expects.

* Agent Turns table column order rewritten around how operators
  actually triage a turn: Time, Agent, Client, Calls, Status, In,
  Out, then the less-frequently-scanned dimensions (Model, Wire
  API, Server, Duration) and the long User Input preview last.

* New TokenScope brand mark replaces the bare panel-toggle button
  at the top-left of the sidebar:
  - Expanded: wordmark on the left, collapse button on the right.
  - Collapsed: icon-only mark; click toggles to expand (the icon
    doubles as the expand affordance — discoverable, saves a row).
  Both variants share the same glyph (rounded "scope" frame
  containing three decreasing token bars) so they line up
  visually as the sidebar opens/closes. Stroke uses currentColor
  for dark-mode and theme inheritance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every chart had its own copy of:

  function formatAxisTime(epoch) {
    const d = new Date(epoch * 1000)
    return `${HH}:${MM}`
  }

Result: a 7-day window rendered ticks as a wrapping clock face
("00:00", "12:00", "00:00", "12:00", ...) with no day attached.
Same problem at 24h. Easy to mis-read.

Centralize the formatter in lib/format as `formatAxisTime(epoch, span)`
and have it pick the right shape based on the visible window:

  span < 24h       →  HH:MM           (5m / 15m / 1h / 6h presets)
  24h ≤ span < 7d  →  MM-DD HH:MM     (24h preset)
  span ≥ 7d        →  MM-DD           (7d preset; time-of-day is noise
                                       when ticks come ~daily)

Each chart derives span from its data (last timestamp − first), so the
formatter requires no toolbar dependency and naturally handles partial
ranges (e.g. tail of a 7d window after retention trimmed the head).

Replaces the inline copies in:
  - timeseries-line-chart  (Overview latency, Models, Performance)
  - request-volume-chart   (Overview)
  - latency-overview-chart (Overview)
  - stacked-bar-chart      (Performance, Traffic)

6 unit tests in lib/format.test.ts cover each duration bucket plus the
24h / 7d inclusive boundaries and the single-point fallback (span = 0
→ HH:MM). Tests assert *shape* not literal values so they pass under
any TZ.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds ProxyPair, ProxyRole, PairCandidate, PairAssignment with the
classify_pair / pair_all entry points. No call sites yet — this is the
pure-data foundation that the storage sweeper + API filter will build on.

Pairing rule (verified against the haproxy_glm5 turn pair on wuneng:
turns 019e3a95-bb7c-7eb3-8240-d3ecacb0c583 / d3d6fdd76249, same session
gen-b93380c5210ed98a, 11345/128 tokens, start_gap 2ms / end_gap 1ms):

  - same session_id / agent_kind / wire_api
  - same call_count, total_input_tokens, total_output_tokens
  - same final_finish_reason and primary model
  - differing (client_ip, server_ip) view
  - |start_time gap| ≤ 100ms

Role:
  - mirror (same packet on br0 + docker0) when both start and end times
    agree within 500us
  - strict nesting (real proxy hop) when outer.start ≤ inner.start and
    outer.end ≥ inner.end
  - else: ambiguous, no pair

10 unit tests cover both real-data scenarios and the non-pair cases
(cross-session, same view, time-gap exceeded, tokens differ, ambiguous
non-nesting).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the background sweeper that scans recently-finalized turns,
classifies pairs via ts_turn::pair_all, and writes pair_id/role/peer
back via update_turn_metadata. Spawned alongside the storage sink in
pipeline.rs — one sweeper per process, owns its own Arc<dyn
StorageBackend>.

StorageBackend trait gains two methods with safe defaults so mock
backends don't need to change:

  - query_pair_candidates(start_us, end_us) → light projection of
    agent_turns rows whose metadata.proxy.role is unset (idempotent
    sweep guard)
  - update_turn_metadata(turn_id, patch) → shallow top-level JSON merge
    into agent_turns.metadata (no schema change; metadata is already a
    VARCHAR holding JSON)

DuckDB implementation:
  - SELECT projects via json_extract_string(metadata, '$.proxy.role')
  - UPDATE is read-modify-write to preserve any pre-existing metadata
    keys; no-op when turn_id is absent (sweeper races finalization)

Default schedule: 2s interval, 5min lookback. The lookback comfortably
exceeds tracker grace (1s) + storage flush jitter (~100ms) so neither
leg of a pair can land late enough to miss its peer.

Tests:
  - ts-storage pair_sweeper: 3/3 (matched pair, role assignment matches
    real wuneng haproxy_glm5 shape, lone turn ignored)
  - ts-storage-duckdb turns: pair_candidates returns only unpaired,
    update_turn_metadata merges with existing keys, noop on missing
    row

Workspace: 815+ unit tests all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
API surface for pair-folded turn list. By default, /api/agent-turns
hides the leg the pair sweeper marked hidden (proxy_out /
mirror_secondary) — one logical call collapses to one row. Pass
?include_proxy_hops=true to surface every captured row for diagnostics.

  - TurnListItem gains proxy_role + proxy_peer_turn_id (skip_serializing
    when absent → direct turns serialize unchanged)
  - TurnsQuery + TurnsParams gain include_proxy_hops: bool (default
    false)
  - query_turns DuckDB SELECT projects metadata; row reader parses
    metadata.proxy.{role, peer_turn_id}
  - WHERE clause adds the hide-by-default filter via
    json_extract_string(metadata, '$.proxy.role')

Tests: new query_turns_hides_proxy_hops_by_default_and_surfaces_them_with_flag
exercises both default-hide and include-flag, asserting field
propagation and total-count consistency. Workspace test suite stays
green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-visible fold for llmproxy duplicates. The Agent Turns list now:

  - Renders a small inline badge next to the Time column on rows the
    backend marked as proxy_in / mirror_primary (e.g. "↔ via proxy").
    Hover shows the peer turn_id for navigation.
  - Adds a "Show proxy hops" checkbox in the filter bar. Off by default
    (collapsed view = single row per logical call); when on, the hidden
    proxy_out / mirror_secondary peer surfaces too, getting its own
    "proxy hop" / "mirror copy" badge.
  - Sticky in the URL as ?show_hops=1 so a shared link preserves the
    user's view choice.

AgentTurnListItem in types/api.ts gains optional proxy_role /
proxy_peer_turn_id matching the backend additions; useAgentTurns hook
forwards includeProxyHops to the API.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Field-tuning the default after deploying on wuneng. The metadata.proxy.role
IS NULL filter keeps already-paired turns out of every sweep so a wider
lookback has bounded per-tick cost — the only thing 30min buys us is
backfilling pairs that took a turn to flush from one shard before the
peer landed in another. 5min was tight enough to miss real haproxy_glm5
peers spread across shards in production traffic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the 2-member pair model with arbitrary-size ProxyGroup so the
haproxy_glm5 case — host-IP view + docker-IP view + real upstream
forward, all three captured under the same session — collapses into ONE
row in the default list. Previously the greedy "closest peer first"
rule paired the 0ms mirror and left the real-hop leg unpaired.

ts-turn proxy_pair rewritten:
  - PairAssignment → ProxyGroup{members: Vec<GroupMember>}
  - pair_all → group_all: bucket by content fingerprint, time-cluster
    within 100ms, pick canonical = widest-span (lex tiebreak), assign
    per-member roles (mirror_secondary for time-tied peers, proxy_out
    for nested peers, ambiguous-time peers dropped). Canonical role
    upgrades to proxy_in whenever the group contains any proxy_out;
    falls back to mirror_primary for pure-mirror groups.
  - metadata_for emits both peer_turn_ids (full list, sorted lex) and
    peer_turn_id (first peer, for pre-N-leg API consumers).

ts-storage pair_sweeper: SweepStats now reports both pairs_assigned
(group count = duplicate calls folded) and turns_tagged (per-row
metadata writes — distinguishes "1 fat 3-leg group" from "3 mirror
pairs" in metrics).

API:
  - TurnListItem gains proxy_peer_turn_ids: Option<Vec<String>>;
    proxy_peer_turn_id retained as the first peer for backward compat.
  - DuckDB row reader extracts both forms.

Console:
  - AgentTurnListItem mirrors the schema.
  - ProxyBadge tooltip lists every peer; label shows "(+N hops)" when
    the group has more than one peer.

Tests:
  - ts-turn proxy_pair: 11 unit tests including the verified
    haproxy_three_leg_collapses_into_single_group scenario (a_br0
    canonical = proxy_in, b_dock0 = mirror_secondary, c_hop =
    proxy_out, all sharing one group_id).
  - ts-storage pair_sweeper: 4 unit tests including 3-leg
    metadata-patch correctness.
  - Workspace test suite: green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds GET /api/agent-turns/{id}/proxy-view and a "Proxy View" tab on the
Agent Turn detail panel, gated on the turn being part of a proxy group
(metadata.proxy.role set).

The endpoint aggregates every member of the group:
  - Per-member snapshot (client/server IP, ports, role, e2e latency,
    request_model, wire_api, raw request + response headers parsed
    from the stored JSON blob).
  - Header diff across legs, with three kinds:
      * common — same (name, value) in every leg (collapsed in UI)
      * modified — every leg sent it but the proxy rewrote the value
        (e.g. Host)
      * per_leg — only some legs carry it (e.g. x-litellm-call-id on
        proxy_in, anthropic-request-id on proxy_out)
    Names match case-insensitively; canonical-case spelling preserved.
  - Optional model_rewrite when the canonical and upstream legs'
    request bodies advertise different `model` field values.
  - Latency breakdown: client_observed_ms − upstream_observed_ms =
    proxy_overhead_ms when both are available.

UI (proxy-view-tab.tsx) renders, in order:
  - Topology row per leg with role chip + IP:port + e2e latency
  - Latency breakdown 3-stat card
  - Model rewrite banner when present
  - Response header diff (modified + per-leg expanded by default, common
    collapsed under <details>)
  - Request header diff (secondary; usually just Host rewrite)

Backend tests (7): header diff classification (common/modified/per_leg),
case-insensitive header matching, model rewrite detect/none, latency
breakdown happy + mirror-only-without-overhead path, body model
extraction edge cases, headers JSON parse round-trip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous proxy-view commit added the handler but missed the
.route(...) registration in lib.rs, so the endpoint fell through to
the SPA index. Adds the missing line right next to /calls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaces "<N>-leg via proxy" / "mirrored" chip under the duration in
the GanttNav header whenever the turn is part of a proxy group. Tells
the user upfront — without opening the Proxy view tab — that the
timeline they're looking at is one captured vantage point of a larger
group.

Extracted readProxyMeta / proxyGroupSize into lib/proxy-meta.ts so the
same JSON-walking logic serves both the detail panel tab gate and the
GanttNav badge. ProxyBadge in the list page intentionally keeps reading
the flat proxy_role field (it's already projected by the list API; no
need to re-parse metadata).

Tooltip on the chip lists every peer turn_id so the user can copy one
out and navigate to it manually.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same logical LLM call captured twice — once at the LiteLLM listener
(client_port:LITELLM_PORT, e.g. :4000) and once at LiteLLM's outbound
to the real upstream (client_port:UPSTREAM_PORT, e.g. :9008) — both
landed in the same agent_turn as separate llm_calls rows. The turn
detail panel rendered all of them, so a 12-call agent run showed 24
steps in the timeline and 24 CallCards on the right.

Adds a client-side grouping in lib/call-pair.ts that mirrors the
backend turn-level rule (same fingerprint + ≤100ms time window +
distinct (client:port, server:port)), surfaces the canonical leg as
the visible row, hides the proxy hops by default. A 'Show proxy hops
(N)' toggle in the tab bar flips back to the raw view. Canonical
CallCards get a small '+N' chip in the header.

State is lifted to AgentTurnDetailPanel so GanttNav and the CallCard
list stay in sync — the timeline bars match the cards.

No backend / schema change: llm_calls has no metadata column today,
and adding one for purely-presentational folding would be heavy. The
trade-off is that agent_turns.call_count still reports the raw count;
surfacing a 'logical' count is a follow-up if it matters.

Tests: 8 unit tests in lib/call-pair.test.ts covering the 2-leg
client→litellm pair (using the user's verified data shape), 3-leg
haproxy br0+docker0+upstream, time-gap rejection, content-fingerprint
rejection, same-view rejection, order preservation, and pure direct
calls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…URLs

Live wuneng data showed every captured pair failed to fold because the
client SDK sent /v1/chat/completions to LiteLLM (port 4000) while
LiteLLM forwarded the bare /chat/completions to the upstream (port
9008). Including the path in the content fingerprint dropped the pair
rate to ~0.

Tokens + model + wire_api + status + finish + stream-flag is
sufficient content equivalence — matches what the backend
proxy_pair::group_all rule on turns has always used.

Regression test added in call-pair.test.ts using the exact path-pair
shape (/chat/completions vs /v1/chat/completions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related additions when the turn isn't itself part of a backend
proxy group but its calls were captured at multiple vantages:

GanttNav (Timeline sidebar)
  - Canonical bars with folded hops now carry a thin blue underline
    sized to the same span as the main bar — reads as a 'shadow' of
    the leg.
  - The latency column shows a small Layers icon next to the ms count
    on the canonical row.
  - Border-left flips blue (low-prio relative to slow/error tones) to
    catch the eye in long timelines.

Proxy view tab (re-enabled for in-turn case)
  - Tab gate widened from `proxyRole` only to `proxyRole || hopCount > 0`.
  - ProxyViewTab takes `hasBackendPair` + `canonicalCalls` +
    `hopsByCanonical`. When the backend hasn't paired the turn but
    the client-side fold caught duplicates, it renders the new
    InTurnProxyView instead of fetching /proxy-view.
  - InTurnProxyView lays out one card per canonical-with-hops,
    showing each leg's 5-tuple + e2e latency + per-hop overhead delta
    (canonical e2e − hop e2e) + model-rewrite chip when the model
    field differs.
  - Header-diff (response x-litellm-* etc.) deferred for in-turn —
    would require parsing the stored headers JSON client-side; v1
    surfaces topology + timing + model which covers the user's most
    common question.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…llmproxy-pair-detection

# Conflicts:
#	console/src/components/layout/sidebar.tsx
In-session clicks on agent-turn rows write ?selected_at=<unix_s> to the
URL so a subsequent share-link recipient can recover the item's window.
But useToolbarUrlSync was running applySelectedAtAnchor on EVERY
searchParams change — every click → URL update → URL→store effect
re-runs → helper sees that 'now' has advanced a few seconds → the
just-clicked item falls outside the (slightly-newer) preset window
→ window auto-shifts → list goes empty.

Gate the anchor with a useRef so it fires once per mount of the
AppLayout (which mounts useToolbarUrlSync). External shared links still
get the rescue behavior — the helper runs on the FIRST hydration of
that fresh load. After that, the URL → store sync no longer touches the
toolbar window in response to selected_at changes.

The existing applySelectedAtAnchor unit tests cover the rescue
semantic and still pass; this fix is purely about when the helper
gets called.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live wuneng data shows a 4-leg topology where LiteLLM advertises
`glm5` (alias) to the client and rewrites it to `GLM-5.1` for the
upstream. Leg 1 carries the alias; legs 2-4 carry the rewritten name.
With `model` in the content key, leg 1 never clusters with the
others — the user still sees the alias-leg as a duplicate row.

Drop `model` from contentKey (same fix as `request_path` earlier).
Tokens + wire_api + finish + status + stream-flag is sufficient
content equivalence. Model rewrite is intentionally NOT pairing-key
material because it's exactly what the Proxy view tab exists to
display per-leg.

Tests: + pairs-even-when-model-differs (the 2-leg shape) and the
full 4-leg topology from the user's reported case
(019e3edf-…/seq=1..4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rint

LiteLLM (and similar LLM proxies) translate API styles across the
client/upstream boundary. Live wuneng setups include the Anthropic
→ OpenAI bridge: client SDK speaks /v1/messages with finish_reason=
end_turn, LiteLLM forwards /v1/chat/completions with finish_reason=
stop. All three of wire_api, final_finish_reason, and primary_model
translate alongside each other, so requiring them to match dropped
the pair rate on those topologies to zero.

Frontend lib/call-pair.ts::contentKey: drop wire_api + finish_reason
(model + request_path were already out). Remaining keys: is_stream,
status_code, input_tokens, output_tokens. Combined with the 100ms
time window and the distinct-5-tuple requirement, false positives
are still effectively nil — these are the API-format-invariant
fields proxies pass through unchanged.

Backend ts-turn::proxy_pair::content_fingerprint: drop wire_api,
final_finish_reason, primary_model. Remaining keys: session_id (the
strongest signal — agent profiles content-hash on first user
message), agent_kind, call_count, total_input_tokens,
total_output_tokens.

Tests: + frontend pairs-across-api-styles (Anthropic ingress, OpenAI
upstream), + backend pairs_across_api_style_translation matching the
same scenario at turn level.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Clicking on a turn with hundreds of agentic iterations would freeze
the browser. The /api/agent-turns/{id}/calls endpoint returned every
call's full request_body + response_body + headers; an 878-call turn
on real data lands a 168 MB JSON response that the browser can't
parse or render.

Fix is in two parts that ship together:

**Server** (StorageBackend trait + DuckDB impl + API route)
- `query_turn_calls(turn_id, include_bodies: bool)` and
  `query_calls_by_ids(call_ids, include_bodies: bool)` now accept a
  flag. When false, the SQL projection selects `NULL::VARCHAR` for
  the four heavy fields — DuckDB never reads the body pages off disk
  and they don't transfer to Rust as Strings.
- New `?lite=1` query param on `GET /api/agent-turns/{id}/calls`
  flips `include_bodies = false`. Default behavior unchanged for
  every existing caller.
- `tokens_estimated` derivation falls back to `false` in lite mode
  (it inspects response_body); documented on the trait.

**Console** (auto-opt-in for large turns + lazy-load on expand)
- `useAgentTurnCalls(id, lite)` passes `?lite=1` when caller asks.
- `AgentTurnDetailPanel` watches `turn.call_count`; above 200 it
  flips lite mode on. Renders a small amber banner so the user
  knows bodies are being lazy-loaded.
- `CallCard` lazy-fetches `/api/llm-calls/{id}` only when the user
  expands a card whose inline bodies are null. Gated on `expanded`
  so a mega-turn with 800 collapsed cards doesn't fire 800
  background requests at mount.
- Tools index / classifier already null-safe — no extra changes.

Real-world impact on the 878-call turn observed in production:
list response shrinks from 168 MB to under 1 MB; detail page now
loads in well under a second; expanding any single call fetches its
~190 KB of bodies independently.

Tests:
- ts-storage-duckdb: extended `query_turn_calls_orders_and_sequences`
  to assert lite mode strips all four heavy fields and preserves
  every other field byte-for-byte.
- console: 111 existing tests pass, no behavior change for
  small-turn workflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The three Agent Session list hooks (`useAgentSessions`,
`useAgentSessionDetail`, `useSessionTurns`) were missing the
`placeholderData: (prev) => prev` setting that every other list hook
in the app uses (`useAgentTurns`, `useLlmCalls`, `useHttpExchanges`,
`useMetrics`, etc.). Without it, every auto-refresh tick / toolbar
key change wipes the query cache to undefined before the new
response lands — react-query renders the loading skeleton, then the
new data — and the user sees a full-page flash on every refresh
while other list pages do a frame-perfect swap.

Setting `placeholderData: (prev) => prev` keeps the last-known data
visible while a background refetch is in flight. New data drops in
when the response arrives; no skeleton, no blanked-out list.

Caught by user: "Agent Session 界面每次刷新都会重刷整个页面,
而不是像其他页面一样看上去重刷幅度很小".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… perf)

New "Services" page that aggregates llm_calls by the actual serving
endpoint (server_ip, server_port) — answering "what's
172.16.103.81:9000 serving, and how is it performing?".

Why not reuse `llm_metrics`? Its pre-aggregated grouping sets stop
at `server_ip` and don't carry server_port — two vLLM instances on
the same host (port 8000 / 9000) would collapse into one row.

## Backend

- `ts_storage::query::ServiceRow` + `ServicesQuery` (one row per
  endpoint with distinct models, wire APIs, call/error counts,
  TTFT/E2E avg + p95, total tokens, first/last seen).
- `StorageBackend::query_services` trait method + DuckDB impl.
  Query is `GROUP BY (server_ip, server_port)` on `llm_calls`;
  models / wire_apis come back as `list_distinct(array_agg(...))`,
  bridged to Rust as JSON strings (DuckDB rust bindings have no
  `FromSql for Vec<String>`).
- `GET /api/services?start=&end=&sort_by=&sort_order=&limit=`
  serves it. `sort_by` whitelist matches the table column names.

## Console

- Sidebar adds "Services" between "Models" and "Agent Sessions"
  with a `Server` icon.
- `ServicesPage` table: Endpoint • Models (chips) • Wire APIs •
  Calls (+stream %) • Error % • TTFT avg/p95 • E2E avg/p95 •
  In/Out tokens • Last seen (relative). Headers click-to-sort
  in-place — no refetch on resort.
- `useServices` hook follows the same `placeholderData: prev`
  pattern as every other list hook (no flash on refresh).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…p/litellm)

Adds an App column to the Services page that classifies each
endpoint into one of a fixed enum from cheap wire-traffic signals.

## Signals used (highest-confidence first)

| App         | Signal                                                       |
|-------------|--------------------------------------------------------------|
| `ollama`    | path `/api/chat` / `/api/generate` / `/api/tags`             |
| `llamacpp`  | path `/completion` / `/tokenize` / `/props` (root-level)     |
| `litellm`   | response header `x-litellm-*` OR `Server: litellm`           |
| `openai`    | request `Host: api.openai.com`                               |
| `anthropic` | request `Host: api.anthropic.com`                            |
| `gemini`    | request `Host: generativelanguage.googleapis.com`            |
| `openai-compat` | `Server: uvicorn` — vLLM and SGLang both, body sample    |
|             | follow-up will disambiguate                                  |
| `litellm`   | tiebreaker: an `openai-compat` endpoint serving ≥ 3 distinct |
|             | models (real signal from wuneng's 127.0.0.1:4000)            |
| (none)      | nothing matches — UI shows muted "unknown" badge             |

## Implementation

- `ts-storage-duckdb/src/apps.rs` — pure-function classifier with 12
  unit tests covering each rule + edge cases (Ollama compat mode
  serving `/v1/chat/completions`, multi-model uvicorn tiebreaker,
  path-wins-over-uvicorn precedence, header-absent fallback).
- SQL aggregate now also pulls `arg_min(response_headers, LENGTH(...))`
  and the matching request_headers as a per-group sample plus
  `list_distinct(array_agg(request_path))[1:16]`. `arg_min` picks
  the shortest non-null blob deterministically — small enough that
  streaming it to Rust costs nothing.
- New fields on `ServiceRow`: `app`, `server_header`, `request_paths`.
- Console renders a colored `AppBadge` per row with a `title=Server:`
  tooltip so the user can sanity-check the label.

## What ships vs. follow-up

vLLM and SGLang both run under uvicorn and don't have a distinctive
custom header. Today they both label as `openai-compat`. A follow-up
will pull one small response body per group and look for
`chatcmpl-tool-<hex>` (vLLM's tool_call_id pattern, observed in
production) vs. SGLang's distinct response shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Services-page aggregate uses `arg_min(headers, LENGTH(headers))`
to pick one representative header sample per endpoint. Without a
shape filter it picks ANY shortest non-null value — including rows
where the response parser stashed an empty/corrupted string. That
fed `null` (or similar) to the classifier and dropped four real
endpoints (the GLM-5.1 cluster on port 9000) to `unknown` even
though every other call from those endpoints carries a clean
`Server: uvicorn` blob.

Restrict the sample to JSON arrays of at least 30 chars (`[%`
pattern). The shortest real header list captured in production is
~140 chars; 30 is a comfortable floor that excludes literal `null`,
`[]`, `{}`, and any other malformed short response without losing
genuine samples.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`arg_min(headers, LENGTH(headers))` was still returning NULL for
endpoints with mixed-header data (e.g. SSE/streaming calls where the
parser captured something the LIKE filter doesn't catch).

Switch to `MAX(response_headers)` — lexicographic on a column whose
values all start with `[[` makes it a stable arbitrary pick AND it
doesn't have arg_min's failure mode of picking anomalously short
malformed values. Filter to `[%` to guarantee the picked sample is
shaped like a JSON array (drops literal "null", "{}", etc.).
Vader Yang and others added 7 commits May 20, 2026 12:13
Per the user's ask: every endpoint must land on a concrete label.
Replace the `openai-compat` placeholder by stacking up cheap signals
already present in `llm_calls`:

**New SQL aggregates** (alongside the existing header / paths sample):
- `list_distinct(array_agg(finish_reason))[1:32]`        — distinct
  finish_reasons in the window
- `arg_max(request_body, LENGTH(request_body))`           — largest
  captured request body (deepest agentic history; only materialises
  once, length comparison is u64-cheap)
- `arg_max(response_body, LENGTH(response_body))`         — largest
  captured response body (capped at 8 KB so streamed/oversized rows
  don't bloat the read)

**New classifier signals** (in order, highest confidence first):

1. SGLang-specific paths (`/generate`, `/health_generate`,
   `/get_server_info`, `/flush_cache`, `/encode`, profile endpoints).
2. vLLM-specific paths (`/version`, `/v1/score`).
3. SGLang-exclusive finish_reasons (`matched_stop`, `matched_eos`,
   `stop_str`) — works even when responses are SSE-streamed, since
   finish_reason is captured from the final SSE event regardless.
4. Response body fingerprint:
   - `"id":"chatcmpl-tool-…"` (vLLM's tool_call_id format)
   - `"system_fingerprint":"fp_…"` (vLLM only; SGLang leaves it null)
5. Request body fingerprint: `chatcmpl-tool-` substring — agentic
   replays carry assistant.tool_calls history back to the server,
   and the previous round's tool_call_id reveals vLLM.
6. Uvicorn fallback:
   - ≥3 models → LiteLLM (multi-model tiebreaker, real wuneng signal)
   - Model starts with `glm` / `deepseek` → SGLang (reference deployment)
   - Otherwise → vLLM (more common)

Console: drop the `openai-compat` badge color since the label is no
longer emitted by the classifier.

22 classifier tests (was 12) covering every new rule + the
beats-the-heuristic precedence cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a tab switcher to the Services page (default Table, alternative
Path). The Path view fetches a new GET /api/services/topology endpoint
and renders a directed SVG graph:

  * Nodes are real (server_ip:server_port) endpoints — colored by app
    class — plus one synthetic "clients" node aggregating all upstream
    callers.
  * Edges come in two kinds:
      - `proxy` (solid blue) — definitive hops confirmed by the
        pair_sweeper (litellm -> sglang, haproxy -> docker backend, …).
      - `client` (dashed grey) — synthetic edges from the clients node
        into every service that receives non-proxy_out traffic. So
        even endpoints without a paired upstream still render
        connected.

Layout is a BFS-by-depth column layout from the clients node; sibling
order within a column is stable (call_count desc). Edge stroke width
scales with turn_count so the hot paths stand out.

Backend pieces:

  * ServicesTopologyQuery / TopologyNode / TopologyEdge /
    ServicesTopology types in ts-storage::query.
  * DuckDB impl in metrics.rs — two SQL passes (proxy edges from
    pair_sweeper-written metadata.proxy.pair_id; client entry edges
    from any non-proxy_out turn) joined to llm_calls for server_port
    (agent_turns doesn't carry it).
  * GET /api/services/topology route, time-windowed by ?start&end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When pair_sweeper hasn't paired a turn but the inbound `client_ip`
matches the server_ip of an existing service node (the canonical
"LiteLLM accepts a user call and immediately forwards to a backend"
pattern), draw an "inferred" edge from the caller-service to the
destination instead of routing the traffic into the anonymous
`__clients__` super-node.

Resolution rule when caller_ip has multiple services: prefer
`litellm` first, then any other proxy-class app, then highest
call_count. Skip resolution entirely when the *target* itself is a
proxy (litellm/haproxy/nginx) — co-host vllm is the destination's
neighbour, not its caller, and attributing inbound litellm traffic
to vllm produced backwards edges before this guard was added.

Inferred edges render as a dashed mid-blue line with the count
label; proxy stays solid blue, client stays dashed grey. Legend
updated to spell out which is which.

Live data after deploy:
  * `127.0.0.1:4000 (litellm) → 127.0.0.1:9000 (sglang)`  inferred 11
  * `127.0.0.1:4000 (litellm) → 127.0.0.1:9008 (vllm)`    inferred 1
  * `__clients__:0 → 172.16.103.81:4210 (litellm)`        client 3221
    (the bulk of real user traffic into the main LiteLLM —
     correctly stays a client edge because the *target* is litellm)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A 7-day Services / Path query was taking 17 s on prod because the
per-endpoint aggregation included `arg_max(request_body,
LENGTH(request_body)) FILTER (LENGTH(body) BETWEEN ...)` — DuckDB
materializes every body in the window (5+ GB on prod) just to pick
one short sample per (server_ip, server_port).

Split body sampling out of the main aggregation into
`fetch_app_samples`:

* Window clipped to last 24 h of the user's range. App
  classification doesn't change over the wider window, and most
  views are "now" so this is a no-op in practice.
* Top-5 most-recent rows per endpoint via `ROW_NUMBER() OVER
  (PARTITION BY server_ip, server_port ORDER BY request_time DESC)`
  + `WHERE rn <= 5`. DuckDB ranks in place and only emits 5 rows
  per group — no full body scan.
* Body / header shape filtering (`LIKE '{%'`, length bounds) moved
  to Rust. The 5 returned rows give us plenty of candidates to find
  a representative sample.
* Distinct request_paths / finish_reasons come from a separate
  cheap dim query over the same clipped window.

Wall-clock on prod (7 d window, 662 k llm_calls rows):

  before: services=17.8 s   topology=12.9 s
  after:  services= 1.5 s   topology= 1.3 s

App classification output unchanged for endpoints with active recent
traffic (verified against the prior 1 h baseline).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "uvicorn endpoint serving ≥ 3 distinct models = LiteLLM" rule
turned out to be window-width-sensitive: a vLLM serving one real
model picks up 2-4 stray model names from misconfigured clients
(`text-embedding-ada-002`, `test`, …) over a 7-day window and gets
flipped to litellm. Verified misclassifications on prod:

  before                     after
  172.17.0.7:8000  vllm   ↛  litellm    →   vllm     ✓ Qwen3.5-35B
  172.17.0.9:9000  sglang ↛  litellm    →   sglang   ✓ GLM-5.1 haproxy
  172.17.0.4:30000 sglang ↛  litellm    →   sglang   ✓ GLM-5.1 docker

Real LiteLLM endpoints (.81:4210, 127.0.0.1:4000) still classify via
the `x-litellm-*` response header rule. The GLM/DeepSeek model-name
heuristic still routes those families to sglang on uvicorn. The
fallback was the weakest signal in the chain — body / path / header
evidence already weighed above is enough.

Removes the rule, its dedicated unit test (sglang_via_glm_model_
heuristic already covers the case the test was using), and the
table-of-rules doc-comment row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Services page gets a third tab "Model" that renders the existing
ModelsPage component. The standalone `/models` route still resolves
for shared links, but the sidebar entry is removed — Models was a
service-level cross-cut, sitting one click deeper under Services
makes the IA more honest.

Sidebar tweaks:
  * Drop Models entry (now reachable via Services → Model tab).
  * Rename "Traffic" → "Usage" — clearer label for what the page
    actually shows (token throughput, byte volume, etc.). Route
    stays /traffic so existing bookmarks keep working.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two new charts on the Overview page so the operator can see, at a
glance:

  * Agent Activity — stacked-area timeseries of agent_turn counts,
    bucketed by a server-chosen window (1-min for ~1h ranges, 30-min
    for 1d, 4h for 30d), split by agent_kind.
  * Agent Distribution — horizontal bar chart of total turns per
    agent_kind in the selected window.

Backend:

  * `AgentSummaryQuery` / `AgentKindSummary` and
    `AgentActivityQuery` / `AgentActivityPoint` types in
    `ts-storage::query`.
  * DuckDB impls in metrics.rs: `query_agent_summary` (one row per
    agent_kind), `query_agent_activity` (per-bucket counts split by
    agent_kind, bucket size auto-picked from window width).
  * Routes `GET /api/agent-turns/summary` + `GET
    /api/agent-turns/activity` (with optional `?bucket=` override).

Console:

  * Types `AgentKindSummary` / `AgentActivityPoint` and the two
    response shapes.
  * Hooks `useAgentSummary` / `useAgentActivity` keyed on the
    toolbar window (placeholderData keeps prior data during
    refetch — no flash).
  * Charts `AgentActivityChart` (recharts AreaChart stacked) and
    `AgentDistributionChart` (recharts BarChart horizontal).
  * Inserted as a new row between the existing "middle row" and
    the model panels — same width budget as the rest of Overview.

Live data after deploy (1 d window): generic=68538, hermes=105,
openclaw=84 turns; 108 activity buckets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both list pages now accept a CSV `Server Port` filter in the head
row alongside the existing `Client IP`. URL-serialized as
`?server_port=4210,9000` so shared links carry the filter.

* `llm_calls`: direct `WHERE server_port IN (...)` — fast.
* `agent_turns`: `agent_turns` has no `server_port` column, so we
  resolve through the turn's first `call_ids` entry against
  `llm_calls` via an EXISTS subquery — same shortcut the topology
  query uses. A turn's calls almost always hit one endpoint in
  practice, so first-call resolution is a safe approximation.

Verified live (1h window, prod):
* `GET /api/llm-calls?server_port=9000` → all returned calls have
  server_port=9000 (GLM-5.1 sglang).
* `GET /api/agent-turns?server_port=4210` → returned turns are
  litellm-bound (server_ip 172.16.103.81).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Agent run failed (exit 1). See workflow logs.


🤖 Reviewed by viviworkflow run

github-actions[bot]
github-actions Bot previously approved these changes May 21, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my review of PR #27, here are my findings:

Summary

PR #27 adds a Services Path view with topology graph, Overview agent activity/distribution charts, and delivers a critical 10× performance fix for 7-day window queries. The performance fix correctly replaces the problematic arg_max(body, LENGTH(body)) pattern with a ROW_NUMBER-based top-N approach that clips body sampling to 24h. The window-width-sensitive classifier rule was correctly removed. APPROVE — no blocking issues.

Verified

  • Body-scan fix: fetch_app_samples at metrics.rs:788-955 uses ROW_NUMBER() OVER (PARTITION BY server_ip, server_port ORDER BY request_time DESC) WHERE rn <= 5 + clipped 24h window — avoids the 5+ GB body materialization that caused the 17s stall. This is the canonical fix referenced in the repo's bite-history.

  • Schema mirror: ServiceRow Rust (query.rs:54-90) ↔ TS (api.ts:70-94) matches field-for-field: server_ip, server_port, models, wire_apis, request_paths, call_count, error_count, stream_count, total_input_tokens, total_output_tokens, ttft_avg_ms, ttft_p95_ms, e2e_avg_ms, e2e_p95_ms, first_seen_ms, last_seen_ms, app, server_header all align.

  • Schema mirror: AgentKindSummary Rust (query.rs:153-160) ↔ TS (api.ts:133-141) matches: agent_kind, turn_count, total_input_tokens, total_output_tokens, avg_duration_ms, last_seen_ms.

  • Schema mirror: AgentActivityPoint Rust (query.rs:173-178) ↔ TS (api.ts:149-153) matches: timestamp_ms, agent_kind, turn_count.

  • Route registration: /api/agent-turns/summary, /api/agent-turns/activity, and /api/agent-turns/{id}/proxy-view are all registered in lib.rs:139-157.

  • queryKey correctness: useAgentSummary, useAgentActivity, useServicesTopology all include {start, end} in their queryKeys (use-agent-overview.ts:10,21; use-services-topology.ts:11).

  • Classifier removal: The window-width-sensitive "uvicorn + ≥3 models → litellm" rule is removed with a clear comment at apps.rs:192-198 explaining the misclassification risk at 7d windows.

Suggestions

  • metrics.rs:811 — The 24h sample window clip (SAMPLE_WINDOW_US = 24 * 60 * 60 * 1_000_000) is hardcoded. If app classification ever becomes time-sensitive (e.g., an endpoint that switches serving software), this would need to expand. Currently fine — the comment at 803-804 notes app classification doesn't change over wider windows.

  • metrics.rs:1414-1416 — The is_proxy_app helper matches only litellm, haproxy, nginx. If another proxy type emerges (e.g., envoy), it should be added here to prevent the self-loop attribution issue described at 1420-1426.

Questions

  • Why does the sidebar relabel "Traffic" → "Usage" without changing the route? The commit message says "route unchanged" for bookmark compatibility, which makes sense — but the label "Usage" for /traffic might confuse users who bookmark the old label. Not a merge blocker, just a UX consistency note.

🤖 Reviewed by viviworkflow run

…and-flash

# Conflicts:
#	console/src/hooks/use-llm-calls.ts
#	console/src/hooks/use-url-sync.ts
#	console/src/pages/agent-turns.tsx
#	console/src/pages/llm-calls.tsx
#	console/src/pages/overview.tsx
#	server/ts-api/src/routes/llm_calls.rs
Vader Yang and others added 2 commits May 21, 2026 19:18
…and-flash

# Conflicts:
#	console/src/components/turn-detail/call-card.tsx
#	console/src/pages/agent-turn-detail-panel.tsx
Same fix already on main from PR#22's path — pair_sweeper.rs in
this branch was still missing the method after the latest
main merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant