feat(dash): show reasoning above answer in chat (collapsible)#315
Merged
Conversation
Render `delta.reasoning_content` from the SSE stream as a distinct block above the assistant's answer instead of folding it into `content` when the answer is empty. - `useChatCompletions.ts`: stop the content||reasoning fold; ChatResponse now carries `content` + `reasoning` as separate fields and the streaming `onDelta` already surfaces both buffers. - `chat.jsx`: new `ReasoningBlock` (greyed `--fg-4` text, 3-line `-webkit-line-clamp`, "thinking ▾/▴" toggle). Block auto-expands while reasoning streams without an answer yet, auto-collapses when the answer starts; the user's manual click sticks. No block renders when the model emits no reasoning. - `dashboard.css`: `.bubble-reasoning` styling — subtle border-left accent (`--line-strong`), 3-line clamp by default, `.open` removes the clamp. - `chat-reasoning.spec.ts`: 4-scenario Playwright spec covering renders / doesn't-render / auto-collapse / manual-toggle, against a stubbed SSE body. Full e2e suite stays green (53 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
Live screenshots (LXC 105, Qwen3.5-0.8B-GGUF, prompt:
|
thinmintdev
added a commit
that referenced
this pull request
May 28, 2026
…rough + gut installer auth section (#390) - docs/operate/lemonade.md (new, .md canonical): operator reference for the v0.2 Lemonade runtime — what it is, where state lives, the /v1/* proxy + dispatcher fallthrough (PRs #248/#277), slot ↔ Lemonade model mapping (PRs #281/#282), max_loaded_models = 8 LRU cap (PR #283), per-type LRU eviction per ADR-0008 (supersedes nuclear-evict ADR-0007), OFFLINE-on-eviction (PR #276), and the three known v0.3 caveats (Vulkan KV gauge missing, whisper RUNPATH workaround, GPU cleanup unload hang). - docs/dashboard/v3.md (new, .md canonical, new docs/dashboard/ dir): page-by-page tour of the v3 React dashboard shipped in v0.3.0-alpha.1 (PR #235). Covers the shell + Mock-badge convention, /dashboard (system overview after #356), /chat (real surface per #309/#314/#315/#351), /slots (sidebar mirror per #357 + #344 UX sweep), /models (#313/#319/#353), /mcp (#304/#300), /agents (Peers per #299), /memory (graph #297, throughput #308), Settings (no Auth tab post-ADR-0012), and the footer journal (Epic #322 — PRs #321/#328/#329/#330/#332). Mock-fallback issues linked via the dashboard-v3 label, not enumerated. - installer/README.md: gut ~95 lines of stale auth prose (Caddy, Bearer-token mint/use/revoke, first-run OTP claim wizard, HAL0_AUTH_ENABLED/HAL0_AUTH_DISABLED, password recovery, basic_auth upgrade path, the TLS recipe). Replace with one paragraph pointing at docs/operate/auth.mdx for the reverse-proxy recipe and docs/agents/identity.md for the X-hal0-Agent identity model. Auth was removed in v0.3.0-alpha.1 per ADR-0012; the README hadn't caught up. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Render the model's reasoning (
delta.reasoning_contentfrom the SSE stream)as a distinct block above the assistant's answer, instead of folding it
into
contentwhen the answer is empty. Reasoning is greyed (--fg-4),truncated to 3 lines with a
-webkit-line-clamp, and expandable via asmall
thinking ▾ / ▴toggle. When the model emits no reasoning, theblock does not render at all.
Design behaviour (4 scenarios)
the user knows the model is thinking. Greyed text grows in place.
3-line clamp; the answer fills its own bubble in white below.
attempts are ignored for that message.
the answer bubble looks like a normal assistant turn.
Files touched
ui/src/api/hooks/useChatCompletions.ts— drop thecontent || reasoningfold;ChatResponsenow exposescontent+reasoningas separate fields. StreamingonDeltaalready surfacedboth, but the final response object did not.
ui/src/dash/chat.jsx— newReasoningBlockcomponent (auto-expandwhile streaming + no answer, auto-collapse once answer arrives, user
click sticks). Wired into
MessageList; assistant turn rendersreasoning above the answer bubble.
ui/src/dashboard.css—.bubble-reasoningstyling (subtle--line-strongborder-left,--fg-4greyed text, 3-line clamp bydefault,
.openremoves the clamp). Existing tokens only — no newpalette.
ui/tests/e2e/specs/chat-reasoning.spec.ts— 4 Playwright specscovering each scenario against a stubbed SSE body.
Live LXC verification
Branch deployed to LXC 105 (10.0.1.142, primary slot
Qwen3.5-0.8B-GGUF),then restored to main after capture.
Prompt 1 —
Reply with the exact word OKSent end-to-end against the live primary slot (Qwen3.5 with thinking
enabled). The model produced a ~600-char chain-of-thought followed by
the single word
OK. Observed:THINKING ▾block clamped to 3 lines, showing only the start ofthe thought trace ("Thinking Process: / 1. Analyze the Request:…").
OK.THINKING ▴and revealed all fournumbered steps unbounded.
Prompt 2 —
Why is the sky blue? Be brief.The longer prompt streams ~30s with substantial reasoning. The collapsed
state captured shows the
sending…chip + waiting bubble during theempty-stream phase, as designed (proxy buffers most of the SSE stream so
the deltas land in one burst at the end — this is a backend characteristic,
not a UI issue; the parser handles either pattern).
Verification commands run
LXC restored to main after verification:
Test plan
no-render-when-absent, 3-line clamp by default, click expands +
reverts on second click.
Out of scope
reasoningenable/disable toggles (backend concern).🤖 Generated with Claude Code