feat(finance): examples B/C/D + real-PDF + chat overlay + design-kit alignment + ModelCapabilityBadge by HomenShum · Pull Request #206 · HomenShum/nodebench-ai

HomenShum · 2026-04-28T16:47:51Z

Summary

Closes the 4 follow-ups from PR #204 plus 2 corrections from review:

Vercel deploy hook race fix — 60s wait + Tier-A live-bundle verify poll
Edge cache stickiness — vercel.json sets HTML to no-cache and assets to max-age=31536000, immutable
Inline chat experience — FinancialOperatorOverlay global drawer (URL-param-driven, no FastAgentPanel surgery)
Real PDF reader — Claude PDF input + structured extraction (runRealCostOfDebtFromPdf)
Examples B/C/D — CRM cleanup, covenant compliance, variance analysis (3 new orchestrators)
Design-kit alignment — refactored cards onto .nb-panel / .type-card-title / .type-label with var(--accent-primary); no new design tokens introduced
ModelCapabilityBadge — text/image/pdf/audio/video/web/code/tools with per-icon tooltips, surfaces what the active model can do (OpenRouter / pi-ai / LibreChat pattern)

Verification (per .claude/rules/live_dom_verification.md)

✅ npx tsc --noEmit: 0 errors
✅ npx vite build: clean (7.71s last run)
✅ npx vitest run convex/domains/financialOperator/__tests__/: 19/19
✅ npx convex dev --once --typecheck=enable: clean
✅ Browser DOM check: 4 workflows trigger; B/C/D each emit 8-10 typed cards; ModelCapabilityBadge shows 4 supported + 4 unsupported for claude-opus-4-7 with tooltips
⏳ Tier A (post-deploy live curl + grep): will run automatically via the new verification poll in vercel-deploy-hook-backup.yml

Test plan

npx tsc --noEmit clean
npx vitest run convex/domains/financialOperator/__tests__/ clean
npx vite build clean
CI: Typecheck / Runtime smoke / Build / Tier B Playwright e2e all green
Post-deploy: Tier-A verification poll asserts live bundle hash rotates
Spot-check /finance-demo after deploy for kit-aligned cards + model badge

What changed at a glance

14 files changed, 2170 insertions(+), 48 deletions(-)   first commit
12 files changed,  417 insertions(+),173 deletions(-)   refactor

New convex domain: convex/domains/financialOperator/

orchestratorExamples.ts — runCrmCleanupDemo / runCovenantComplianceDemo / runVarianceAnalysisDemo
realExtractors.ts — runRealCostOfDebtFromPdf (Claude PDF input + structured output)
fixtures/{crm,covenant,variance}Fixture.ts — pinned demo data

New frontend feature: src/features/financialOperator/

components/ModelCapabilityBadge.tsx — 8-modality capability grid + curated registry
components/FinancialOperatorOverlay.tsx — global URL-param-driven drawer
views/FinancialOperatorDemo.tsx — 4-workflow picker (kit-aligned)
All cards refactored to use .nb-panel, .type-card-title, .type-label, var(--accent-primary)

Infra:

.github/workflows/vercel-deploy-hook-backup.yml — 60s wait + Tier-A poll
vercel.json — HTML no-cache + assets immutable cache-control

Docs:

docs/architecture/FINANCIAL_OPERATOR_DESIGN_ALIGNMENT.md — surface-by-surface kit alignment for web/mobile/workspace/CLI

🤖 Generated with Claude Code

Build the perception->extraction->validation->sandbox compute->verification chat experience the spec calls for. Each unit of work renders as a typed card so users see observable work, not hidden reasoning. Backend (convex/domains/financialOperator/): - types.ts: 9 step kinds (run_brief, tool_call, extraction, validation, calculation, evidence, artifact, approval_request, result) x 7 statuses. - sandbox.ts: deterministic JS compute (ETR, after-tax cost of debt, leverage, variance, compliance). Throws on NaN/divide-by-zero. - validators.ts: schema/unit/range/confidence checks. HONEST_SCORES counts what was actually checked. - extractors.ts + attFixture.ts: pinned AT&T 10-K fixture; real-PDF-shape interface for swap-in. - runOps.ts: createRun, appendStep, updateStepStatus, getRun, listSteps. BOUND at 200 steps/run. - orchestrator.ts: runAttCostOfDebtDemo + recordApprovalDecision actions. Schema (convex/schema.ts): - New financialOperatorRuns + financialOperatorSteps tables (additive). - Fixed pre-existing data drift in productEventWorkspaces by adding activeEventSessionId as optional. Frontend (src/features/financialOperator/): - 9 typed card components + StepShell common chrome + StepStatusBadge. - StepCard switch dispatcher. - FinancialOperatorTimeline live-streaming parent (Convex useQuery). - FinancialOperatorDemo standalone view at /finance-demo. Routing: - viewRegistry.ts: added financial-operator view at /finance-demo with aliases /financial-operator and /finops. - QuickCommandChips: added optional `navigate` field on chips for workspace handoff; "AT&T cost of debt" chip routes to /finance-demo. Tests (19/19 pass): - sandbox.scenario.test.ts: 13 tests covering happy path, 1000-replay determinism, NaN/divide-by-zero/out-of-range sad paths, compliance gate, signed variance formatting. - validators.scenario.test.ts: 6 tests covering missing required, wrong unit, out-of-range, low confidence, scale to 100 fields. Verification: - npx tsc --noEmit: clean - npx vitest run: 19/19 pass - npx vite build: clean - Live browser test: 10 cards stream end-to-end, approve flow produces Result card with ETR=16.86%, after-tax cost of debt=4.51% Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eploy hardening Closes the 4 follow-ups from PR #204: 1. Vercel deploy hook race fix (60s wait + Tier-A verify poll) 2. Edge cache stickiness (no-cache headers on HTML, immutable on assets) 3. Inline chat experience (FinancialOperatorOverlay, no FastAgentPanel surgery) 4. Real PDF reader (Claude PDF input + structured extraction) 5. Examples B/C/D (CRM cleanup, covenant compliance, variance analysis) ## Examples B/C/D — full operator-console workflows - Example B (financial_data_cleanup): inspect → profile spreadsheet → extract entities → dedup → enrich → validate CRM schema → export CSV. Sandbox compute: dedup ratio (387 -> 312, 19.4%). - Example C (covenant_compliance): locate covenant → extract terms + inputs → validate → sandbox leverage + compliance gate → memo. Sandbox: computeLeverageRatio + checkCompliance (3.55x vs 4.25x cap, compliant). - Example D (variance_analysis): inspect → align CoA → per-line variance in sandbox → driver search → CFO memo. Sandbox: computeVariance for 6 P&L lines, signed-percent formatting. All three reuse the same backbone: runOps + sandbox + validators + typed step kinds. Each emits 8-10 cards, picker on /finance-demo lets the user choose which workflow to run. ## Real PDF reader (production path) `runRealCostOfDebtFromPdf` action: - Takes a `_storage` PDF id (any uploader can produce one) - Sends PDF directly to Claude as a document input (no separate parse step) - Constrains output to a strict JSON schema with sourceRef + confidence per field; instructs Claude to return null + add to unresolvedFields rather than fabricate - Validates extraction with the same `validateExtraction()` used by the fixture path; computes ETR + after-tax cost of debt deterministically - Bounded reads (MAX_PDF_BYTES = 20MB), HONEST_STATUS error path that surfaces parse failures verbatim, approval gate when required fields unresolved. ## Inline chat experience (FinancialOperatorOverlay) Surface-agnostic global drawer. Listens for `?finRun=<runId>` URL param, mounts `FinancialOperatorTimeline` as a right-side drawer alongside any chat surface. Collapsible to a corner pill. Mounted in App.tsx so it works on /, /?surface=ask, /?surface=workspace, etc. Why a global overlay vs editing FastAgentPanel directly: - FastAgentPanel.tsx is 3700+ lines; surgical message-bubble edits have high blast radius - URL-param-driven means any caller (chip, button, MCP tool) can activate the overlay via `setActiveFinancialRun()` without knowing the chat panel internals - /finance-demo "View in chat" button deep-links to `/?surface=ask&finRun=<id>` — overlay mounts beside the chat ## Deploy hardening vercel-deploy-hook-backup.yml: - 60s wait before firing the deploy hook on push events. Closes the race that bit PR #204: the GitHub→Vercel git mirror takes a few seconds to catch up after a merge, and deploy hooks pass no commit SHA, so immediate-fire deploys can clone the previous HEAD. - Tier-A verification poll: after the hook fires, watch the live URL for up to 7 minutes for the bundle hash to rotate. Non-blocking warning if it doesn't (deploy still in progress, or edge cache stuck). vercel.json headers: - /assets/* → `public, max-age=31536000, immutable` (content-hashed, safe for permanent edge cache) - /(everything else) → `no-cache, no-store, must-revalidate` plus CDN-Cache-Control / Vercel-CDN-Cache-Control no-store. Prevents the stale-HTML landmine that took 15 minutes to clear post-deploy on PR #204. The bundle hashes inside index.html change every deploy, so stale HTML points at JS files the new deploy may have evicted. ## Design alignment doc New `docs/architecture/FINANCIAL_OPERATOR_DESIGN_ALIGNMENT.md` walks through how the cards build on existing UI kit per surface (web, mobile, workspace, CLI/MCP). Same step-kind enum, same status enum, same sandbox guarantee everywhere. Workspace + CLI/MCP exposure described as concrete next-PR plans. ## Verification - npx convex dev --once --typecheck=enable: clean (3.17m typecheck) - npx tsc --noEmit: 0 errors - npx vitest run convex/domains/financialOperator/__tests__/: 19/19 pass - npx vite build: clean (42.66s, 211 entries precached) - Live browser: 4 demo workflows trigger, each renders 8-10 typed cards; "View in chat" deep-links to /?surface=ask with overlay mounted (8 cards in the drawer next to the chat surface). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…apabilityBadge Two corrections to the prior PR: ## 1. Design-kit alignment (we build on top of the kit, not next to it) Replaced ad-hoc styling with the kit's canonical utilities (per docs/architecture/FINANCIAL_OPERATOR_DESIGN_ALIGNMENT.md and the NodeBench AI Design System reference): - StepShell: now uses `.nb-panel` (12px radius + hairline border + panel bg, kit canonical) instead of a hand-rolled `.nb-card`-styled box. Left accent stripe via `::before` keeps cards distinguishable without inventing new chrome. - Type: every kicker is `type-label !tracking-[0.18em]` (kit's canonical 11px uppercase 0.18em). Titles are `type-card-title`. Body is `text-[13px] leading-[1.5]`. Mono numerics use `font-mono`. - Color: every raw `#d97757` literal across 7 card files swapped to `var(--accent-primary)` (Tailwind arbitrary-value with CSS var). Status badges now use `.badge-success/-warn/-fail/-accent` tone families with kit-canonical semantic colors (--success, --warning, --destructive) — same tones the kit's component-badges.html ships. - Demo view: page header uses `type-page-title` + `type-label` + `type-body`. Workflow tiles use `.nb-panel` chrome with the kit's 44px icon container (10px radius, terracotta-12% bg, terracotta fg, 20px Lucide stroke icon — exactly the kit's component-panel.html pattern). - Overlay: drawer chrome uses `--bg-primary`, `--border-color`, and `--shadow-xl` instead of inline `#151413` / `border-edge`. Header icon buttons are 16px Lucide (kit pill-icon size), rounded-full to match the kit's icon-button conventions. No new design tokens were introduced. Every utility class on these surfaces already existed in src/index.css before this work shipped. ## 2. ModelCapabilityBadge — surfacing what the active model can do Pattern lifted from open-source projects that route through unified LLM providers (OpenRouter, pi-ai, LibreChat, OpenWebUI): - OpenRouter exposes `architecture.input_modalities` / `output_modalities` per model - LibreChat shows per-model capability chips next to the picker - pi-ai's `getModel().inputModalities` is the same shape NodeBench surfaces them as a compact icon-only row: - 8 modalities: text, image, pdf, audio, video, web_search, code_exec, tools - Each is a 24px round Lucide-icon pill (14px stroke icon — kit's pill icon size) - Supported: terracotta accent (border + bg + fg via accent CSS vars) - Unsupported: 50% opacity + line-through (visible but visually receded — agent users see what's missing without it competing) - Native title tooltip + role=listitem aria-label per icon Hand-curated capability registry (`MODEL_CAPABILITIES`) covers the models NodeBench routes today: Claude Opus/Sonnet/Haiku, GPT-5/4.1/4o, o1/o3, Gemini 3 Pro/Flash + 2.5 Flash, Grok 4, Kimi k2.6, DeepSeek v3.5, GLM 4.6V. Unknown models fall back to text-only with a `(unverified)` tag — HONEST_SCORES, never claim capabilities the model can't deliver. Long-term path: a Convex action that hits OpenRouter's /v1/models and caches the modality matrix daily. Surfaced in two places this PR: - /finance-demo header (active orchestrator model) - FinancialOperatorOverlay header (visible alongside chat surface) Future PRs can drop it next to FastAgentPanel's model selector and any other model-aware surface — it's a self-contained component with one prop (`model: string`). ## Verification - npx tsc --noEmit: 0 errors - npx vite build: clean (7.71s) - Live browser: 4 demo workflows still render 8-10 typed cards each; StepShell now uses .nb-panel + type-label + type-card-title; ModelCapabilityBadge shows 4 supported (text/image/pdf/tools) + 4 unsupported (audio/video/web_search/code) for claude-opus-4-7 with per-icon tooltips and aria-labels Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-04-28T16:47:56Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
nodebench-ai	Ready	Preview, Comment	Apr 28, 2026 5:18pm

…surface User feedback: "it should actually be built into the existing chat page or chat agent sidebar, add all the new components to the chat, wired live and used under a toggle called workspace mode." ## What changed A new `WorkspaceModeToggle` floats on the chat surface (top-right on desktop, bottom-right above the mobile bottom nav). Clicking it sets `?ws=1` in the URL; clicking again clears it. When `?ws=1` is active, the new `WorkspaceModePane`: - Mounts inside the chat content area (fixed, z-55, padded around the bottom nav + agent panel so the chat composer below stays live) - Renders the 4-workflow picker (AT&T 10-K · CRM cleanup · Covenant compliance · Variance analysis) when no run is active - Streams the FinancialOperatorTimeline live when a run is active - Surfaces ModelCapabilityBadge in its header so the user sees what the active model can/can't do - Defers to the existing right-side drawer (`FinancialOperatorOverlay`) when ws=0 — both modes coexist for users who want a side dock URL state drives everything (`?ws=1`, `?finRun=<id>`) so deep links work and the chat composer below stays interactive. ## Why not edit FastAgentPanel.tsx FastAgentPanel.tsx is 3700+ lines. The toggle + pane sit on top of it via fixed positioning; no surgery on its render tree. Surface coupling is via URL params only — the same pattern any future caller (chip, button, MCP tool) can use to drive workspace mode. ## Visibility rule Toggle hidden on: - /finance-demo (the page IS workspace already) - /cli, /pricing, /changelog, /legal, /about, /api-docs (info pages) - /share/*, /report/*, /embed/* (public/embedded views) Toggle shown on the root chat surface and ?surface=ask|home variants. ## Verification - npx tsc --noEmit: 0 errors - npx vite build: clean (210 PWA entries) - Live browser: - Toggle visible on /?surface=home with aria-label "Enter workspace mode" - Click → URL gets ?ws=1, pane mounts (role=region "Workspace mode") - Pane shows 4 demo tiles + model capability badge + Exit button - Click "Covenant compliance" → 9 typed cards stream inline (Plan → Tool → Extraction×2 → Validation → Calculation → Evidence → Artifact → Result) with the run id in the URL - "Back to picker" returns to the 4-tile state - "Close" / "Exit workspace" returns to plain chat Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…et-ed0e9b # Conflicts: # convex/_generated/api.d.ts # convex/domains/financialOperator/index.ts # src/features/financialOperator/components/ApprovalCard.tsx # src/features/financialOperator/components/ArtifactCard.tsx # src/features/financialOperator/components/CalculationCard.tsx # src/features/financialOperator/components/EvidenceCard.tsx # src/features/financialOperator/components/ResultCard.tsx # src/features/financialOperator/components/StepShell.tsx # src/features/financialOperator/components/StepStatusBadge.tsx # src/features/financialOperator/components/ToolCallCard.tsx # src/features/financialOperator/index.ts # src/features/financialOperator/views/FinancialOperatorDemo.tsx

github-actions · 2026-04-28T17:01:08Z

✅ Dogfood Visual QA Gate: PASSED

Check	Status
Screenshots	23 captured (pass)
Walkthrough	9 chapters (pass)
Key Frames	9 extracted (pass)
Scribe Steps	8 how-to steps (pass)
Build	success

Artifacts

Download the dogfood-evidence-4b7d1a4 artifact from the Actions tab for full screenshots, frames, and walkthrough video.

Generated by Dogfood QA Gate

… through User QA caught a broken UI: workspace mode rendered with the home surface visible behind it (greeting, sidebar, watchlist, search input all stacking with the operator-console pane). ## Root cause Tailwind's `/95` opacity modifier does NOT work on CSS-var arbitrary values without the `color:` prefix. The class `bg-[var(--bg-app)]/95` resolved to `rgba(0,0,0,0)` — fully transparent. A second issue compounded it: `--bg-app` is in the kit reference (colors_and_type.css) but is NOT defined in the live repo's src/index.css. The repo has `--bg-primary` / `--bg-secondary` only. So even the unmodified `var(--bg-app)` would have resolved to nothing. ## Fix - Use `--bg-primary` (defined: #FFFFFF light, dark variant in dark mode) as the pane base color, set via inline `style` to bypass any Tailwind quirks with CSS-var opacity arbitrary values. - Bump pane to `z-[80]` (above modals at z-50, toasts at z-60). The toggle bumped to `z-[85]` so users can dismiss mid-run without hunting inside the pane. - Add `isolate` for a clean stacking context — prevents any future z-leak from the home surface beneath. - Inline-comment the var-opacity gotcha so the next developer doesn't re-introduce it. ## Verification (per dogfood_verification.md) - npx tsc --noEmit: 0 errors - Live browser screenshot: clean opaque pane, header readable, 4 demo tiles in 2x2 grid, model capability badge with 4 supported + 4 unsupported icons, no home-surface bleed-through - Run flow: clicked AT&T 10-K → 9 typed cards stream inline (Plan → Tool×2 → Extraction → Validation → Calculation → Evidence → Artifact → Result), all using .nb-panel chrome with proper status badges and source-cited fields Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… scroll / composer) User QA caught: workspace mode was overlaying the chat surface instead of building on top of the existing chat layout. The kit's canonical chat shell (ui_kits/nodebench-web/ChatThread.jsx) is: header (sticky top, entity icon + title + meta + actions) ↓ scrollable thread (turns / operator console cards) ↓ composer (pinned bottom: pins · field · model + caps · suggested chips) The model selector + capability indicators belong IN the composer (per the design board reference + the kit's Composer.jsx), not floating in the header. ## What changed WorkspaceModePane now renders as a 3-row CSS grid mirroring the kit: - Row 1 (header): kit's .nb-chat-header pattern — entity icon (terracotta squircle with sparkle), kicker, title, meta in mono font, Picker + Close actions - Row 2 (scroll): demo picker (no run) OR FinancialOperatorTimeline (active run) inside a max-w-3xl container - Row 3 (composer): new WorkspaceComposer component WorkspaceComposer follows the kit's composer shape exactly: - Pin row: "EVENT Ship Demo Day ×" + "+ Add context" (matches design board reference) - Field row: paperclip + link + mic icons (15px stroke) | textarea "Ask, capture, paste, upload, or record…" | terracotta send button - Below field: MODEL claude-opus-4-7 + 8 capability icons (text / image / pdf / audio / video / web_search / code_exec / tools with supported vs muted variants and per-icon tooltips) | Memory-first · 0 paid calls in mono - Suggested chips: Run AT&T 10-K demo · Run CRM cleanup · Run covenant compliance · Run variance analysis The composer is interactive: typing a prompt that matches a known workflow regex starts that demo (e.g. "AT&T 10-K cost of debt" → runAttCostOfDebtDemo). Send falls back to dispatching a custom `nb:workspace:compose` event for any other panel listening (so future FastAgentPanel integration can hook in without surgery). ## Verification - npx tsc --noEmit: 0 errors - npx vite build: clean (210 PWA entries) - Live browser screenshot (kit-aligned at mobile width): - Empty state: header with WORKSPACE MODE / Pick a workflow / "4 canonical workflows · math sandboxed · approval-gated" meta + Close button; scrollable middle with 4 demo tiles in 2x2; composer pinned bottom with all canonical pieces (pins, attach, textarea, send, model badge with capabilities, Memory-first hint, 4 suggested chips) - Active run: header switches to "Live operator-console run", scroll area renders the typed-card timeline (RUN header → Plan → 2x Tool → Extraction → Validation → Calculation → Evidence → Artifact), composer stays pinned and never overlaps content; capability badge sits inside the composer where the kit puts it Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

HShuM and others added 3 commits April 28, 2026 00:41

vercel Bot deployed to Preview April 28, 2026 16:48 View deployment

HomenShum enabled auto-merge (squash) April 28, 2026 16:52

HShuM added 2 commits April 28, 2026 09:52

tmp: CRLF noise (will revert)

63fcb3c

vercel Bot deployed to Preview April 28, 2026 16:53 View deployment

vercel Bot deployed to Preview April 28, 2026 16:57 View deployment

vercel Bot deployed to Preview April 28, 2026 17:09 View deployment

vercel Bot deployed to Preview April 28, 2026 17:18 View deployment

HomenShum merged commit 31354b0 into main Apr 28, 2026
14 of 15 checks passed

HomenShum mentioned this pull request Apr 28, 2026

fix(chat): restore kit-canonical chat headers (workspace mode swaps thread content only) #207

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(finance): examples B/C/D + real-PDF + chat overlay + design-kit alignment + ModelCapabilityBadge#206

feat(finance): examples B/C/D + real-PDF + chat overlay + design-kit alignment + ModelCapabilityBadge#206
HomenShum merged 8 commits intomainfrom
claude/heuristic-sammet-ed0e9b

HomenShum commented Apr 28, 2026

Uh oh!

vercel Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HomenShum commented Apr 28, 2026

Summary

Verification (per .claude/rules/live_dom_verification.md)

Test plan

What changed at a glance

Uh oh!

vercel Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Dogfood Visual QA Gate: PASSED

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Apr 28, 2026 •

edited

Loading

github-actions Bot commented Apr 28, 2026 •

edited

Loading