Skip to content

feat(dash): show reasoning above answer in chat (collapsible)#315

Merged
thinmintdev merged 1 commit into
mainfrom
feat/chat-thinking-separator
May 25, 2026
Merged

feat(dash): show reasoning above answer in chat (collapsible)#315
thinmintdev merged 1 commit into
mainfrom
feat/chat-thinking-separator

Conversation

@thinmintdev
Copy link
Copy Markdown
Contributor

Summary

Render the model's reasoning (delta.reasoning_content from the SSE stream)
as a distinct block above the assistant's answer, instead of folding it
into content when the answer is empty. Reasoning is greyed (--fg-4),
truncated to 3 lines with a -webkit-line-clamp, and expandable via a
small thinking ▾ / ▴ toggle. When the model emits no reasoning, the
block does not render at all.

Design behaviour (4 scenarios)

  1. Reasoning streams first, answer empty. The block auto-expands so
    the user knows the model is thinking. Greyed text grows in place.
  2. Answer starts streaming. The block auto-collapses back to the
    3-line clamp; the answer fills its own bubble in white below.
  3. User clicks the toggle. Their choice sticks — further auto-collapse
    attempts are ignored for that message.
  4. Model emits no reasoning_content. No reasoning block renders;
    the answer bubble looks like a normal assistant turn.

Files touched

  • ui/src/api/hooks/useChatCompletions.ts — drop the
    content || reasoning fold; ChatResponse now exposes content +
    reasoning as separate fields. Streaming onDelta already surfaced
    both, but the final response object did not.
  • ui/src/dash/chat.jsx — new ReasoningBlock component (auto-expand
    while streaming + no answer, auto-collapse once answer arrives, user
    click sticks). Wired into MessageList; assistant turn renders
    reasoning above the answer bubble.
  • ui/src/dashboard.css.bubble-reasoning styling (subtle
    --line-strong border-left, --fg-4 greyed text, 3-line clamp by
    default, .open removes the clamp). Existing tokens only — no new
    palette.
  • ui/tests/e2e/specs/chat-reasoning.spec.ts — 4 Playwright specs
    covering each scenario against a stubbed SSE body.

Live LXC verification

Branch deployed to LXC 105 (10.0.1.142, primary slot Qwen3.5-0.8B-GGUF),
then restored to main after capture.

ssh hal0 \
  'cd /opt/hal0 && git checkout feat/chat-thinking-separator \
     && cd ui && rm -rf dist node_modules/.vite && npm run build \
     && systemctl restart hal0-api'

Prompt 1 — Reply with the exact word OK

Sent end-to-end against the live primary slot (Qwen3.5 with thinking
enabled). The model produced a ~600-char chain-of-thought followed by
the single word OK. Observed:

  • Greyed THINKING ▾ block clamped to 3 lines, showing only the start of
    the thought trace ("Thinking Process: / 1. Analyze the Request:…").
  • Below: separate white answer bubble containing just OK.
  • Clicking the toggle flipped to THINKING ▴ and revealed all four
    numbered steps unbounded.

Prompt 2 — Why is the sky blue? Be brief.

The longer prompt streams ~30s with substantial reasoning. The collapsed
state captured shows the sending… chip + waiting bubble during the
empty-stream phase, as designed (proxy buffers most of the SSE stream so
the deltas land in one burst at the end — this is a backend characteristic,
not a UI issue; the parser handles either pattern).

Verification commands run

cd /tmp/hal0-chat-thinking/ui
npm run typecheck   # clean
npx playwright test # 53 passed (full suite incl. 4 new specs)

LXC restored to main after verification:

ssh hal0 \
  'cd /opt/hal0 && git checkout main \
     && cd ui && rm -rf dist node_modules/.vite && npm run build \
     && systemctl restart hal0-api'
# systemctl is-active hal0-api → active

Test plan

  • Unit/E2E: 4 Playwright specs assert render-when-present,
    no-render-when-absent, 3-line clamp by default, click expands +
    reverts on second click.
  • Live LXC end-to-end with the real Qwen3.5 primary slot.
  • Full e2e suite green (53 passed / 16 unrelated skipped).
  • Typecheck green.
  • Clean vite build (no stale CSS chunks).

Out of scope

  • Markdown rendering inside reasoning or answer (deferred).
  • Persisting expand state across page reload.
  • Per-model reasoning enable/disable toggles (backend concern).
  • "Copy reasoning" button.

🤖 Generated with Claude Code

Render `delta.reasoning_content` from the SSE stream as a distinct block
above the assistant's answer instead of folding it into `content` when
the answer is empty.

- `useChatCompletions.ts`: stop the content||reasoning fold; ChatResponse
  now carries `content` + `reasoning` as separate fields and the streaming
  `onDelta` already surfaces both buffers.
- `chat.jsx`: new `ReasoningBlock` (greyed `--fg-4` text, 3-line
  `-webkit-line-clamp`, "thinking ▾/▴" toggle). Block auto-expands while
  reasoning streams without an answer yet, auto-collapses when the answer
  starts; the user's manual click sticks. No block renders when the
  model emits no reasoning.
- `dashboard.css`: `.bubble-reasoning` styling — subtle border-left
  accent (`--line-strong`), 3-line clamp by default, `.open` removes
  the clamp.
- `chat-reasoning.spec.ts`: 4-scenario Playwright spec covering renders
  / doesn't-render / auto-collapse / manual-toggle, against a stubbed
  SSE body. Full e2e suite stays green (53 passed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@thinmintdev
Copy link
Copy Markdown
Contributor Author

Live screenshots (LXC 105, Qwen3.5-0.8B-GGUF, prompt: Reply with the exact word OK)

Captured to /tmp/hal0-live-chat-ok-collapsed.png (~99 KB) and
/tmp/hal0-live-chat-ok-expanded.png (~125 KB) on hal0-dev — not
committed (out of scope per brief). Observed:

Collapsed. Greyed THINKING ▾ block with the model's chain-of-thought
truncated to 3 lines:

THINKING ▾
Thinking Process:
1. **Analyze the Request:**…

Below it, the answer bubble in primary white: OK.

Expanded (after clicking the toggle). THINKING ▴, full 4-step thought
trace unbounded:

THINKING ▴
Thinking Process:

1. **Analyze the Request:**
   * Task: Reply with the exact word "OK".
   * Constraint: The user wants a direct, unambiguous response.
   * Input: "Reply with the exact word OK"

2. **Determine the Output:**
   * The instruction is clear: The response must be the string "OK".
   * There are no additional constraints (e.g., "in a specific format",
     "no explanation", "use a specific tone").

3. **Verify the Output:**
   * Does the output match the constraint exactly? Yes.
   * Is there any hidden trap? Sometimes prompts like this test strict
     adherence. Since the word is "OK", I must output "OK".

4. **Formulate the Response:**
   * Output: OK

Answer bubble below it: OK.

The auto-collapse + manual-override behaviour both worked. Backend SSE was
the unmodified /v1/chat/completions from the live primary slot; the
client now reads delta.reasoning_content separately and feeds the
new <ReasoningBlock>.

LXC was switched back to main immediately after capture
(systemctl is-active hal0-api → active).

@thinmintdev thinmintdev merged commit abd39f3 into main May 25, 2026
4 checks passed
@thinmintdev thinmintdev deleted the feat/chat-thinking-separator branch May 25, 2026 19:53
thinmintdev added a commit that referenced this pull request May 28, 2026
…rough + gut installer auth section (#390)

- docs/operate/lemonade.md (new, .md canonical): operator reference for
  the v0.2 Lemonade runtime — what it is, where state lives, the /v1/*
  proxy + dispatcher fallthrough (PRs #248/#277), slot ↔ Lemonade
  model mapping (PRs #281/#282), max_loaded_models = 8 LRU cap (PR
  #283), per-type LRU eviction per ADR-0008 (supersedes nuclear-evict
  ADR-0007), OFFLINE-on-eviction (PR #276), and the three known v0.3
  caveats (Vulkan KV gauge missing, whisper RUNPATH workaround, GPU
  cleanup unload hang).

- docs/dashboard/v3.md (new, .md canonical, new docs/dashboard/ dir):
  page-by-page tour of the v3 React dashboard shipped in
  v0.3.0-alpha.1 (PR #235). Covers the shell + Mock-badge convention,
  /dashboard (system overview after #356), /chat (real surface per
  #309/#314/#315/#351), /slots (sidebar mirror per #357 + #344 UX
  sweep), /models (#313/#319/#353), /mcp (#304/#300), /agents (Peers
  per #299), /memory (graph #297, throughput #308), Settings (no Auth
  tab post-ADR-0012), and the footer journal (Epic #322 — PRs
  #321/#328/#329/#330/#332). Mock-fallback issues linked via the
  dashboard-v3 label, not enumerated.

- installer/README.md: gut ~95 lines of stale auth prose (Caddy,
  Bearer-token mint/use/revoke, first-run OTP claim wizard,
  HAL0_AUTH_ENABLED/HAL0_AUTH_DISABLED, password recovery, basic_auth
  upgrade path, the TLS recipe). Replace with one paragraph pointing
  at docs/operate/auth.mdx for the reverse-proxy recipe and
  docs/agents/identity.md for the X-hal0-Agent identity model. Auth
  was removed in v0.3.0-alpha.1 per ADR-0012; the README hadn't
  caught up.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant