feat(dash): show reasoning above answer in chat (collapsible) by thinmintdev · Pull Request #315 · Hal0ai/hal0

thinmintdev · 2026-05-25T19:47:14Z

Summary

Render the model's reasoning (delta.reasoning_content from the SSE stream)
as a distinct block above the assistant's answer, instead of folding it
into content when the answer is empty. Reasoning is greyed (--fg-4),
truncated to 3 lines with a -webkit-line-clamp, and expandable via a
small thinking ▾ / ▴ toggle. When the model emits no reasoning, the
block does not render at all.

Design behaviour (4 scenarios)

Reasoning streams first, answer empty. The block auto-expands so
the user knows the model is thinking. Greyed text grows in place.
Answer starts streaming. The block auto-collapses back to the
3-line clamp; the answer fills its own bubble in white below.
User clicks the toggle. Their choice sticks — further auto-collapse
attempts are ignored for that message.
Model emits no reasoning_content. No reasoning block renders;
the answer bubble looks like a normal assistant turn.

Files touched

ui/src/api/hooks/useChatCompletions.ts — drop the
content || reasoning fold; ChatResponse now exposes content +
reasoning as separate fields. Streaming onDelta already surfaced
both, but the final response object did not.
ui/src/dash/chat.jsx — new ReasoningBlock component (auto-expand
while streaming + no answer, auto-collapse once answer arrives, user
click sticks). Wired into MessageList; assistant turn renders
reasoning above the answer bubble.
ui/src/dashboard.css — .bubble-reasoning styling (subtle
--line-strong border-left, --fg-4 greyed text, 3-line clamp by
default, .open removes the clamp). Existing tokens only — no new
palette.
ui/tests/e2e/specs/chat-reasoning.spec.ts — 4 Playwright specs
covering each scenario against a stubbed SSE body.

Live LXC verification

Branch deployed to LXC 105 (10.0.1.142, primary slot Qwen3.5-0.8B-GGUF),
then restored to main after capture.

ssh hal0 \
  'cd /opt/hal0 && git checkout feat/chat-thinking-separator \
     && cd ui && rm -rf dist node_modules/.vite && npm run build \
     && systemctl restart hal0-api'

Prompt 1 — Reply with the exact word OK

Sent end-to-end against the live primary slot (Qwen3.5 with thinking
enabled). The model produced a ~600-char chain-of-thought followed by
the single word OK. Observed:

Greyed THINKING ▾ block clamped to 3 lines, showing only the start of
the thought trace ("Thinking Process: / 1. Analyze the Request:…").
Below: separate white answer bubble containing just OK.
Clicking the toggle flipped to THINKING ▴ and revealed all four
numbered steps unbounded.

Prompt 2 — Why is the sky blue? Be brief.

The longer prompt streams ~30s with substantial reasoning. The collapsed
state captured shows the sending… chip + waiting bubble during the
empty-stream phase, as designed (proxy buffers most of the SSE stream so
the deltas land in one burst at the end — this is a backend characteristic,
not a UI issue; the parser handles either pattern).

Verification commands run

cd /tmp/hal0-chat-thinking/ui
npm run typecheck   # clean
npx playwright test # 53 passed (full suite incl. 4 new specs)

LXC restored to main after verification:

ssh hal0 \
  'cd /opt/hal0 && git checkout main \
     && cd ui && rm -rf dist node_modules/.vite && npm run build \
     && systemctl restart hal0-api'
# systemctl is-active hal0-api → active

Test plan

Unit/E2E: 4 Playwright specs assert render-when-present,
no-render-when-absent, 3-line clamp by default, click expands +
reverts on second click.
Live LXC end-to-end with the real Qwen3.5 primary slot.
Full e2e suite green (53 passed / 16 unrelated skipped).
Typecheck green.
Clean vite build (no stale CSS chunks).

Out of scope

Markdown rendering inside reasoning or answer (deferred).
Persisting expand state across page reload.
Per-model reasoning enable/disable toggles (backend concern).
"Copy reasoning" button.

🤖 Generated with Claude Code

Render `delta.reasoning_content` from the SSE stream as a distinct block above the assistant's answer instead of folding it into `content` when the answer is empty. - `useChatCompletions.ts`: stop the content||reasoning fold; ChatResponse now carries `content` + `reasoning` as separate fields and the streaming `onDelta` already surfaces both buffers. - `chat.jsx`: new `ReasoningBlock` (greyed `--fg-4` text, 3-line `-webkit-line-clamp`, "thinking ▾/▴" toggle). Block auto-expands while reasoning streams without an answer yet, auto-collapses when the answer starts; the user's manual click sticks. No block renders when the model emits no reasoning. - `dashboard.css`: `.bubble-reasoning` styling — subtle border-left accent (`--line-strong`), 3-line clamp by default, `.open` removes the clamp. - `chat-reasoning.spec.ts`: 4-scenario Playwright spec covering renders / doesn't-render / auto-collapse / manual-toggle, against a stubbed SSE body. Full e2e suite stays green (53 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

thinmintdev · 2026-05-25T19:47:53Z

Live screenshots (LXC 105, Qwen3.5-0.8B-GGUF, prompt: `Reply with the exact word OK`)

Captured to /tmp/hal0-live-chat-ok-collapsed.png (~99 KB) and
/tmp/hal0-live-chat-ok-expanded.png (~125 KB) on hal0-dev — not
committed (out of scope per brief). Observed:

Collapsed. Greyed THINKING ▾ block with the model's chain-of-thought
truncated to 3 lines:

THINKING ▾
Thinking Process:
1. **Analyze the Request:**…

Below it, the answer bubble in primary white: OK.

Expanded (after clicking the toggle). THINKING ▴, full 4-step thought
trace unbounded:

THINKING ▴
Thinking Process:

1. **Analyze the Request:**
   * Task: Reply with the exact word "OK".
   * Constraint: The user wants a direct, unambiguous response.
   * Input: "Reply with the exact word OK"

2. **Determine the Output:**
   * The instruction is clear: The response must be the string "OK".
   * There are no additional constraints (e.g., "in a specific format",
     "no explanation", "use a specific tone").

3. **Verify the Output:**
   * Does the output match the constraint exactly? Yes.
   * Is there any hidden trap? Sometimes prompts like this test strict
     adherence. Since the word is "OK", I must output "OK".

4. **Formulate the Response:**
   * Output: OK

Answer bubble below it: OK.

The auto-collapse + manual-override behaviour both worked. Backend SSE was
the unmodified /v1/chat/completions from the live primary slot; the
client now reads delta.reasoning_content separately and feeds the
new <ReasoningBlock>.

LXC was switched back to main immediately after capture
(systemctl is-active hal0-api → active).

…rough + gut installer auth section (#390) - docs/operate/lemonade.md (new, .md canonical): operator reference for the v0.2 Lemonade runtime — what it is, where state lives, the /v1/* proxy + dispatcher fallthrough (PRs #248/#277), slot ↔ Lemonade model mapping (PRs #281/#282), max_loaded_models = 8 LRU cap (PR #283), per-type LRU eviction per ADR-0008 (supersedes nuclear-evict ADR-0007), OFFLINE-on-eviction (PR #276), and the three known v0.3 caveats (Vulkan KV gauge missing, whisper RUNPATH workaround, GPU cleanup unload hang). - docs/dashboard/v3.md (new, .md canonical, new docs/dashboard/ dir): page-by-page tour of the v3 React dashboard shipped in v0.3.0-alpha.1 (PR #235). Covers the shell + Mock-badge convention, /dashboard (system overview after #356), /chat (real surface per #309/#314/#315/#351), /slots (sidebar mirror per #357 + #344 UX sweep), /models (#313/#319/#353), /mcp (#304/#300), /agents (Peers per #299), /memory (graph #297, throughput #308), Settings (no Auth tab post-ADR-0012), and the footer journal (Epic #322 — PRs #321/#328/#329/#330/#332). Mock-fallback issues linked via the dashboard-v3 label, not enumerated. - installer/README.md: gut ~95 lines of stale auth prose (Caddy, Bearer-token mint/use/revoke, first-run OTP claim wizard, HAL0_AUTH_ENABLED/HAL0_AUTH_DISABLED, password recovery, basic_auth upgrade path, the TLS recipe). Replace with one paragraph pointing at docs/operate/auth.mdx for the reverse-proxy recipe and docs/agents/identity.md for the X-hal0-Agent identity model. Auth was removed in v0.3.0-alpha.1 per ADR-0012; the README hadn't caught up. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

thinmintdev merged commit abd39f3 into main May 25, 2026
4 checks passed

thinmintdev deleted the feat/chat-thinking-separator branch May 25, 2026 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dash): show reasoning above answer in chat (collapsible)#315

feat(dash): show reasoning above answer in chat (collapsible)#315
thinmintdev merged 1 commit into
mainfrom
feat/chat-thinking-separator

thinmintdev commented May 25, 2026

Uh oh!

thinmintdev commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thinmintdev commented May 25, 2026

Summary

Design behaviour (4 scenarios)

Files touched

Live LXC verification

Test plan

Out of scope

Uh oh!

thinmintdev commented May 25, 2026

Live screenshots (LXC 105, Qwen3.5-0.8B-GGUF, prompt: Reply with the exact word OK)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Live screenshots (LXC 105, Qwen3.5-0.8B-GGUF, prompt: `Reply with the exact word OK`)