Skip to content

feat(finance): examples B/C/D + real-PDF + chat overlay + design-kit alignment + ModelCapabilityBadge#206

Merged
HomenShum merged 8 commits intomainfrom
claude/heuristic-sammet-ed0e9b
Apr 28, 2026
Merged

feat(finance): examples B/C/D + real-PDF + chat overlay + design-kit alignment + ModelCapabilityBadge#206
HomenShum merged 8 commits intomainfrom
claude/heuristic-sammet-ed0e9b

Conversation

@HomenShum
Copy link
Copy Markdown
Owner

Summary

Closes the 4 follow-ups from PR #204 plus 2 corrections from review:

  1. Vercel deploy hook race fix — 60s wait + Tier-A live-bundle verify poll
  2. Edge cache stickiness — vercel.json sets HTML to no-cache and assets to max-age=31536000, immutable
  3. Inline chat experienceFinancialOperatorOverlay global drawer (URL-param-driven, no FastAgentPanel surgery)
  4. Real PDF reader — Claude PDF input + structured extraction (runRealCostOfDebtFromPdf)
  5. Examples B/C/D — CRM cleanup, covenant compliance, variance analysis (3 new orchestrators)
  6. Design-kit alignment — refactored cards onto .nb-panel / .type-card-title / .type-label with var(--accent-primary); no new design tokens introduced
  7. ModelCapabilityBadge — text/image/pdf/audio/video/web/code/tools with per-icon tooltips, surfaces what the active model can do (OpenRouter / pi-ai / LibreChat pattern)

Verification (per .claude/rules/live_dom_verification.md)

  • npx tsc --noEmit: 0 errors
  • npx vite build: clean (7.71s last run)
  • npx vitest run convex/domains/financialOperator/__tests__/: 19/19
  • npx convex dev --once --typecheck=enable: clean
  • ✅ Browser DOM check: 4 workflows trigger; B/C/D each emit 8-10 typed cards; ModelCapabilityBadge shows 4 supported + 4 unsupported for claude-opus-4-7 with tooltips
  • ⏳ Tier A (post-deploy live curl + grep): will run automatically via the new verification poll in vercel-deploy-hook-backup.yml

Test plan

  • npx tsc --noEmit clean
  • npx vitest run convex/domains/financialOperator/__tests__/ clean
  • npx vite build clean
  • CI: Typecheck / Runtime smoke / Build / Tier B Playwright e2e all green
  • Post-deploy: Tier-A verification poll asserts live bundle hash rotates
  • Spot-check /finance-demo after deploy for kit-aligned cards + model badge

What changed at a glance

14 files changed, 2170 insertions(+), 48 deletions(-)   first commit
12 files changed,  417 insertions(+),173 deletions(-)   refactor

New convex domain: convex/domains/financialOperator/

  • orchestratorExamples.ts — runCrmCleanupDemo / runCovenantComplianceDemo / runVarianceAnalysisDemo
  • realExtractors.ts — runRealCostOfDebtFromPdf (Claude PDF input + structured output)
  • fixtures/{crm,covenant,variance}Fixture.ts — pinned demo data

New frontend feature: src/features/financialOperator/

  • components/ModelCapabilityBadge.tsx — 8-modality capability grid + curated registry
  • components/FinancialOperatorOverlay.tsx — global URL-param-driven drawer
  • views/FinancialOperatorDemo.tsx — 4-workflow picker (kit-aligned)
  • All cards refactored to use .nb-panel, .type-card-title, .type-label, var(--accent-primary)

Infra:

  • .github/workflows/vercel-deploy-hook-backup.yml — 60s wait + Tier-A poll
  • vercel.json — HTML no-cache + assets immutable cache-control

Docs:

  • docs/architecture/FINANCIAL_OPERATOR_DESIGN_ALIGNMENT.md — surface-by-surface kit alignment for web/mobile/workspace/CLI

🤖 Generated with Claude Code

HShuM and others added 3 commits April 28, 2026 00:41
Build the perception->extraction->validation->sandbox compute->verification
chat experience the spec calls for. Each unit of work renders as a typed
card so users see observable work, not hidden reasoning.

Backend (convex/domains/financialOperator/):
- types.ts: 9 step kinds (run_brief, tool_call, extraction, validation,
  calculation, evidence, artifact, approval_request, result) x 7 statuses.
- sandbox.ts: deterministic JS compute (ETR, after-tax cost of debt,
  leverage, variance, compliance). Throws on NaN/divide-by-zero.
- validators.ts: schema/unit/range/confidence checks. HONEST_SCORES
  counts what was actually checked.
- extractors.ts + attFixture.ts: pinned AT&T 10-K fixture; real-PDF-shape
  interface for swap-in.
- runOps.ts: createRun, appendStep, updateStepStatus, getRun, listSteps.
  BOUND at 200 steps/run.
- orchestrator.ts: runAttCostOfDebtDemo + recordApprovalDecision actions.

Schema (convex/schema.ts):
- New financialOperatorRuns + financialOperatorSteps tables (additive).
- Fixed pre-existing data drift in productEventWorkspaces by adding
  activeEventSessionId as optional.

Frontend (src/features/financialOperator/):
- 9 typed card components + StepShell common chrome + StepStatusBadge.
- StepCard switch dispatcher.
- FinancialOperatorTimeline live-streaming parent (Convex useQuery).
- FinancialOperatorDemo standalone view at /finance-demo.

Routing:
- viewRegistry.ts: added financial-operator view at /finance-demo with
  aliases /financial-operator and /finops.
- QuickCommandChips: added optional `navigate` field on chips for
  workspace handoff; "AT&T cost of debt" chip routes to /finance-demo.

Tests (19/19 pass):
- sandbox.scenario.test.ts: 13 tests covering happy path, 1000-replay
  determinism, NaN/divide-by-zero/out-of-range sad paths, compliance
  gate, signed variance formatting.
- validators.scenario.test.ts: 6 tests covering missing required, wrong
  unit, out-of-range, low confidence, scale to 100 fields.

Verification:
- npx tsc --noEmit: clean
- npx vitest run: 19/19 pass
- npx vite build: clean
- Live browser test: 10 cards stream end-to-end, approve flow produces
  Result card with ETR=16.86%, after-tax cost of debt=4.51%

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eploy hardening

Closes the 4 follow-ups from PR #204:
1. Vercel deploy hook race fix (60s wait + Tier-A verify poll)
2. Edge cache stickiness (no-cache headers on HTML, immutable on assets)
3. Inline chat experience (FinancialOperatorOverlay, no FastAgentPanel surgery)
4. Real PDF reader (Claude PDF input + structured extraction)
5. Examples B/C/D (CRM cleanup, covenant compliance, variance analysis)

## Examples B/C/D — full operator-console workflows

- Example B (financial_data_cleanup): inspect → profile spreadsheet →
  extract entities → dedup → enrich → validate CRM schema → export CSV.
  Sandbox compute: dedup ratio (387 -> 312, 19.4%).
- Example C (covenant_compliance): locate covenant → extract terms +
  inputs → validate → sandbox leverage + compliance gate → memo.
  Sandbox: computeLeverageRatio + checkCompliance (3.55x vs 4.25x cap,
  compliant).
- Example D (variance_analysis): inspect → align CoA → per-line variance
  in sandbox → driver search → CFO memo.
  Sandbox: computeVariance for 6 P&L lines, signed-percent formatting.

All three reuse the same backbone: runOps + sandbox + validators + typed
step kinds. Each emits 8-10 cards, picker on /finance-demo lets the user
choose which workflow to run.

## Real PDF reader (production path)

`runRealCostOfDebtFromPdf` action:
- Takes a `_storage` PDF id (any uploader can produce one)
- Sends PDF directly to Claude as a document input (no separate parse step)
- Constrains output to a strict JSON schema with sourceRef + confidence
  per field; instructs Claude to return null + add to unresolvedFields
  rather than fabricate
- Validates extraction with the same `validateExtraction()` used by the
  fixture path; computes ETR + after-tax cost of debt deterministically
- Bounded reads (MAX_PDF_BYTES = 20MB), HONEST_STATUS error path that
  surfaces parse failures verbatim, approval gate when required fields
  unresolved.

## Inline chat experience (FinancialOperatorOverlay)

Surface-agnostic global drawer. Listens for `?finRun=<runId>` URL param,
mounts `FinancialOperatorTimeline` as a right-side drawer alongside any
chat surface. Collapsible to a corner pill. Mounted in App.tsx so it
works on /, /?surface=ask, /?surface=workspace, etc.

Why a global overlay vs editing FastAgentPanel directly:
- FastAgentPanel.tsx is 3700+ lines; surgical message-bubble edits have
  high blast radius
- URL-param-driven means any caller (chip, button, MCP tool) can
  activate the overlay via `setActiveFinancialRun()` without knowing
  the chat panel internals
- /finance-demo "View in chat" button deep-links to
  `/?surface=ask&finRun=<id>` — overlay mounts beside the chat

## Deploy hardening

vercel-deploy-hook-backup.yml:
- 60s wait before firing the deploy hook on push events. Closes the race
  that bit PR #204: the GitHub→Vercel git mirror takes a few seconds to
  catch up after a merge, and deploy hooks pass no commit SHA, so
  immediate-fire deploys can clone the previous HEAD.
- Tier-A verification poll: after the hook fires, watch the live URL for
  up to 7 minutes for the bundle hash to rotate. Non-blocking warning
  if it doesn't (deploy still in progress, or edge cache stuck).

vercel.json headers:
- /assets/* → `public, max-age=31536000, immutable` (content-hashed,
  safe for permanent edge cache)
- /(everything else) → `no-cache, no-store, must-revalidate` plus
  CDN-Cache-Control / Vercel-CDN-Cache-Control no-store. Prevents the
  stale-HTML landmine that took 15 minutes to clear post-deploy on PR
  #204. The bundle hashes inside index.html change every deploy, so
  stale HTML points at JS files the new deploy may have evicted.

## Design alignment doc

New `docs/architecture/FINANCIAL_OPERATOR_DESIGN_ALIGNMENT.md` walks
through how the cards build on existing UI kit per surface (web,
mobile, workspace, CLI/MCP). Same step-kind enum, same status enum,
same sandbox guarantee everywhere. Workspace + CLI/MCP exposure
described as concrete next-PR plans.

## Verification

- npx convex dev --once --typecheck=enable: clean (3.17m typecheck)
- npx tsc --noEmit: 0 errors
- npx vitest run convex/domains/financialOperator/__tests__/: 19/19 pass
- npx vite build: clean (42.66s, 211 entries precached)
- Live browser: 4 demo workflows trigger, each renders 8-10 typed cards;
  "View in chat" deep-links to /?surface=ask with overlay mounted (8
  cards in the drawer next to the chat surface).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…apabilityBadge

Two corrections to the prior PR:

## 1. Design-kit alignment (we build on top of the kit, not next to it)

Replaced ad-hoc styling with the kit's canonical utilities (per
docs/architecture/FINANCIAL_OPERATOR_DESIGN_ALIGNMENT.md and the
NodeBench AI Design System reference):

  - StepShell: now uses `.nb-panel` (12px radius + hairline border +
    panel bg, kit canonical) instead of a hand-rolled `.nb-card`-styled
    box. Left accent stripe via `::before` keeps cards distinguishable
    without inventing new chrome.
  - Type: every kicker is `type-label !tracking-[0.18em]` (kit's
    canonical 11px uppercase 0.18em). Titles are `type-card-title`.
    Body is `text-[13px] leading-[1.5]`. Mono numerics use `font-mono`.
  - Color: every raw `#d97757` literal across 7 card files swapped to
    `var(--accent-primary)` (Tailwind arbitrary-value with CSS var).
    Status badges now use `.badge-success/-warn/-fail/-accent` tone
    families with kit-canonical semantic colors (--success, --warning,
    --destructive) — same tones the kit's component-badges.html ships.
  - Demo view: page header uses `type-page-title` + `type-label` +
    `type-body`. Workflow tiles use `.nb-panel` chrome with the kit's
    44px icon container (10px radius, terracotta-12% bg, terracotta
    fg, 20px Lucide stroke icon — exactly the kit's component-panel.html
    pattern).
  - Overlay: drawer chrome uses `--bg-primary`, `--border-color`, and
    `--shadow-xl` instead of inline `#151413` / `border-edge`. Header
    icon buttons are 16px Lucide (kit pill-icon size), rounded-full to
    match the kit's icon-button conventions.

No new design tokens were introduced. Every utility class on these
surfaces already existed in src/index.css before this work shipped.

## 2. ModelCapabilityBadge — surfacing what the active model can do

Pattern lifted from open-source projects that route through unified
LLM providers (OpenRouter, pi-ai, LibreChat, OpenWebUI):
  - OpenRouter exposes `architecture.input_modalities` /
    `output_modalities` per model
  - LibreChat shows per-model capability chips next to the picker
  - pi-ai's `getModel().inputModalities` is the same shape

NodeBench surfaces them as a compact icon-only row:
  - 8 modalities: text, image, pdf, audio, video, web_search, code_exec, tools
  - Each is a 24px round Lucide-icon pill (14px stroke icon — kit's
    pill icon size)
  - Supported: terracotta accent (border + bg + fg via accent CSS vars)
  - Unsupported: 50% opacity + line-through (visible but visually
    receded — agent users see what's missing without it competing)
  - Native title tooltip + role=listitem aria-label per icon

Hand-curated capability registry (`MODEL_CAPABILITIES`) covers the
models NodeBench routes today: Claude Opus/Sonnet/Haiku, GPT-5/4.1/4o,
o1/o3, Gemini 3 Pro/Flash + 2.5 Flash, Grok 4, Kimi k2.6, DeepSeek
v3.5, GLM 4.6V. Unknown models fall back to text-only with a
`(unverified)` tag — HONEST_SCORES, never claim capabilities the model
can't deliver. Long-term path: a Convex action that hits OpenRouter's
/v1/models and caches the modality matrix daily.

Surfaced in two places this PR:
  - /finance-demo header (active orchestrator model)
  - FinancialOperatorOverlay header (visible alongside chat surface)

Future PRs can drop it next to FastAgentPanel's model selector and any
other model-aware surface — it's a self-contained component with one
prop (`model: string`).

## Verification

- npx tsc --noEmit: 0 errors
- npx vite build: clean (7.71s)
- Live browser: 4 demo workflows still render 8-10 typed cards each;
  StepShell now uses .nb-panel + type-label + type-card-title;
  ModelCapabilityBadge shows 4 supported (text/image/pdf/tools) + 4
  unsupported (audio/video/web_search/code) for claude-opus-4-7 with
  per-icon tooltips and aria-labels

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
nodebench-ai Ready Ready Preview, Comment Apr 28, 2026 5:18pm

Request Review

…surface

User feedback: "it should actually be built into the existing chat page
or chat agent sidebar, add all the new components to the chat, wired
live and used under a toggle called workspace mode."

## What changed

A new `WorkspaceModeToggle` floats on the chat surface (top-right on
desktop, bottom-right above the mobile bottom nav). Clicking it sets
`?ws=1` in the URL; clicking again clears it.

When `?ws=1` is active, the new `WorkspaceModePane`:
  - Mounts inside the chat content area (fixed, z-55, padded around
    the bottom nav + agent panel so the chat composer below stays live)
  - Renders the 4-workflow picker (AT&T 10-K · CRM cleanup · Covenant
    compliance · Variance analysis) when no run is active
  - Streams the FinancialOperatorTimeline live when a run is active
  - Surfaces ModelCapabilityBadge in its header so the user sees what
    the active model can/can't do
  - Defers to the existing right-side drawer (`FinancialOperatorOverlay`)
    when ws=0 — both modes coexist for users who want a side dock

URL state drives everything (`?ws=1`, `?finRun=<id>`) so deep links work
and the chat composer below stays interactive.

## Why not edit FastAgentPanel.tsx

FastAgentPanel.tsx is 3700+ lines. The toggle + pane sit on top of it
via fixed positioning; no surgery on its render tree. Surface coupling
is via URL params only — the same pattern any future caller (chip,
button, MCP tool) can use to drive workspace mode.

## Visibility rule

Toggle hidden on:
  - /finance-demo (the page IS workspace already)
  - /cli, /pricing, /changelog, /legal, /about, /api-docs (info pages)
  - /share/*, /report/*, /embed/* (public/embedded views)

Toggle shown on the root chat surface and ?surface=ask|home variants.

## Verification

- npx tsc --noEmit: 0 errors
- npx vite build: clean (210 PWA entries)
- Live browser:
  - Toggle visible on /?surface=home with aria-label "Enter workspace mode"
  - Click → URL gets ?ws=1, pane mounts (role=region "Workspace mode")
  - Pane shows 4 demo tiles + model capability badge + Exit button
  - Click "Covenant compliance" → 9 typed cards stream inline
    (Plan → Tool → Extraction×2 → Validation → Calculation → Evidence
    → Artifact → Result) with the run id in the URL
  - "Back to picker" returns to the 4-tile state
  - "Close" / "Exit workspace" returns to plain chat

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@HomenShum HomenShum enabled auto-merge (squash) April 28, 2026 16:52
HShuM added 2 commits April 28, 2026 09:52
…et-ed0e9b

# Conflicts:
#	convex/_generated/api.d.ts
#	convex/domains/financialOperator/index.ts
#	src/features/financialOperator/components/ApprovalCard.tsx
#	src/features/financialOperator/components/ArtifactCard.tsx
#	src/features/financialOperator/components/CalculationCard.tsx
#	src/features/financialOperator/components/EvidenceCard.tsx
#	src/features/financialOperator/components/ResultCard.tsx
#	src/features/financialOperator/components/StepShell.tsx
#	src/features/financialOperator/components/StepStatusBadge.tsx
#	src/features/financialOperator/components/ToolCallCard.tsx
#	src/features/financialOperator/index.ts
#	src/features/financialOperator/views/FinancialOperatorDemo.tsx
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 28, 2026

✅ Dogfood Visual QA Gate: PASSED

Check Status
Screenshots 23 captured (pass)
Walkthrough 9 chapters (pass)
Key Frames 9 extracted (pass)
Scribe Steps 8 how-to steps (pass)
Build success
Artifacts

Download the dogfood-evidence-4b7d1a4 artifact from the Actions tab for full screenshots, frames, and walkthrough video.


Generated by Dogfood QA Gate

… through

User QA caught a broken UI: workspace mode rendered with the home
surface visible behind it (greeting, sidebar, watchlist, search input
all stacking with the operator-console pane).

## Root cause

Tailwind's `/95` opacity modifier does NOT work on CSS-var arbitrary
values without the `color:` prefix. The class
`bg-[var(--bg-app)]/95` resolved to `rgba(0,0,0,0)` — fully transparent.

A second issue compounded it: `--bg-app` is in the kit reference
(colors_and_type.css) but is NOT defined in the live repo's
src/index.css. The repo has `--bg-primary` / `--bg-secondary` only.
So even the unmodified `var(--bg-app)` would have resolved to nothing.

## Fix

- Use `--bg-primary` (defined: #FFFFFF light, dark variant in dark
  mode) as the pane base color, set via inline `style` to bypass any
  Tailwind quirks with CSS-var opacity arbitrary values.
- Bump pane to `z-[80]` (above modals at z-50, toasts at z-60). The
  toggle bumped to `z-[85]` so users can dismiss mid-run without
  hunting inside the pane.
- Add `isolate` for a clean stacking context — prevents any future
  z-leak from the home surface beneath.
- Inline-comment the var-opacity gotcha so the next developer doesn't
  re-introduce it.

## Verification (per dogfood_verification.md)

- npx tsc --noEmit: 0 errors
- Live browser screenshot: clean opaque pane, header readable, 4 demo
  tiles in 2x2 grid, model capability badge with 4 supported + 4
  unsupported icons, no home-surface bleed-through
- Run flow: clicked AT&T 10-K → 9 typed cards stream inline (Plan →
  Tool×2 → Extraction → Validation → Calculation → Evidence →
  Artifact → Result), all using .nb-panel chrome with proper status
  badges and source-cited fields

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… scroll / composer)

User QA caught: workspace mode was overlaying the chat surface instead
of building on top of the existing chat layout. The kit's canonical
chat shell (ui_kits/nodebench-web/ChatThread.jsx) is:

  header (sticky top, entity icon + title + meta + actions)
  ↓
  scrollable thread (turns / operator console cards)
  ↓
  composer (pinned bottom: pins · field · model + caps · suggested chips)

The model selector + capability indicators belong IN the composer (per
the design board reference + the kit's Composer.jsx), not floating in
the header.

## What changed

WorkspaceModePane now renders as a 3-row CSS grid mirroring the kit:
  - Row 1 (header): kit's .nb-chat-header pattern — entity icon
    (terracotta squircle with sparkle), kicker, title, meta in mono
    font, Picker + Close actions
  - Row 2 (scroll): demo picker (no run) OR FinancialOperatorTimeline
    (active run) inside a max-w-3xl container
  - Row 3 (composer): new WorkspaceComposer component

WorkspaceComposer follows the kit's composer shape exactly:
  - Pin row: "EVENT Ship Demo Day ×" + "+ Add context" (matches design
    board reference)
  - Field row: paperclip + link + mic icons (15px stroke) | textarea
    "Ask, capture, paste, upload, or record…" | terracotta send button
  - Below field: MODEL claude-opus-4-7 + 8 capability icons (text /
    image / pdf / audio / video / web_search / code_exec / tools with
    supported vs muted variants and per-icon tooltips) | Memory-first ·
    0 paid calls in mono
  - Suggested chips: Run AT&T 10-K demo · Run CRM cleanup · Run
    covenant compliance · Run variance analysis

The composer is interactive: typing a prompt that matches a known
workflow regex starts that demo (e.g. "AT&T 10-K cost of debt" →
runAttCostOfDebtDemo). Send falls back to dispatching a custom
`nb:workspace:compose` event for any other panel listening (so future
FastAgentPanel integration can hook in without surgery).

## Verification

- npx tsc --noEmit: 0 errors
- npx vite build: clean (210 PWA entries)
- Live browser screenshot (kit-aligned at mobile width):
  - Empty state: header with WORKSPACE MODE / Pick a workflow / "4
    canonical workflows · math sandboxed · approval-gated" meta + Close
    button; scrollable middle with 4 demo tiles in 2x2; composer pinned
    bottom with all canonical pieces (pins, attach, textarea, send,
    model badge with capabilities, Memory-first hint, 4 suggested chips)
  - Active run: header switches to "Live operator-console run", scroll
    area renders the typed-card timeline (RUN header → Plan → 2x Tool →
    Extraction → Validation → Calculation → Evidence → Artifact),
    composer stays pinned and never overlaps content; capability badge
    sits inside the composer where the kit puts it

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@HomenShum HomenShum merged commit 31354b0 into main Apr 28, 2026
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants