Skip to content

Improve v9 tests#66

Closed
Bojan131 wants to merge 3 commits into
mainfrom
v9-tests
Closed

Improve v9 tests#66
Bojan131 wants to merge 3 commits into
mainfrom
v9-tests

Conversation

@Bojan131
Copy link
Copy Markdown
Contributor

@Bojan131 Bojan131 commented Apr 1, 2026

  • Audit and harden all existing tests across 14 test files in 8 packages to eliminate false positives, silent skips, and loose assertions
  • Replace if (skipSuite) return with ctx.skip() in chain/agent E2E tests so skipped tests are visible in CI reports instead of silently passing
  • Fix mcp-server/test/tools.test.ts — removed dead import that would hang tests by triggering stdio connection
  • Fix evm-module/test/unit/Ask.test.ts — replace gte(0n) with exact value assertions, remove .catch(() => {}) swallowed errors, use deterministic inputs instead of Math.random()
  • Tighten core test assertions: toBeFalsy() → toBe('') in proto, exact canonical output checks in crypto, snapshot-based genesis integrity hashes, edge-case paranet ID tests in constants, fix conditional guard in oracle-verify that could skip tamper detection
  • Fix cli/config.test.ts — remove flawed process.chdir('/tmp') test that always produced false negatives, add explicit skip warnings
  • Expand attested-assets/gossip-handler.test.ts with full AKAGossipHandler coverage (subscribe/unsubscribe/publish/event dispatch)
  • Add genesis snapshot file to lock down integrity hashes and detect accidental modifications
  • Update genesis quad count assertion (34 → 40) to match current main

const tstake = await AskStorage.totalActiveStake();
expect(wsum).to.be.gte(0n);
expect(tstake).to.be.gte(0n);
expect(await AskStorage.weightedActiveAskSum()).to.equal(remainingAfterPartial * newAsk);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Ask.recalculateActiveSet() does not keep every node in the aggregate after updateAsk(). It filters by askLowerBoundFactor/askUpperBoundFactor around the previous weighted average, so the first newAsk = 500n here is already outside the allowed band after 300n and the active set should drop out instead of equaling remainingAfterPartial * newAsk. Please derive these expectations from the contract’s bound logic, or choose ask values that stay inside the active-set window. The same assumption shows up again in the later ask-change / multi-node exact-sum cases below.

const total3 = await AskStorage.totalActiveStake();
expect(weighted3).to.be.gte(0n);
expect(total3).to.be.lte(stAmount);
expect(total3).to.equal(remainingStake);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: after withdrawing 79_999 from 80_000, this node is below ParametersStorage.minimumStake() (50,000 ether). requestWithdrawal() removes it from the sharding table and Ask.recalculateActiveSet() excludes it, so totalActiveStake and weightedActiveAskSum should both be 0 here, not the 1-ether remainder.

function createTestServer(): McpServer {
async function createServerAndClient(): Promise<{ client: Client }> {
// src/index.ts calls main() at module scope (connects stdio), so we can't
// import the real server directly. Instead we replicate the tool registration
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: this test no longer exercises the real server registration path; it reimplements a second copy of the production logic instead. That means drift in src/index.ts (tool names, SPARQL_ONLY gating, formatting, adapter loading, etc.) can ship while this suite still passes. A safer fix is to extract tool registration into an exported helper and have both src/index.ts and this test call that shared implementation.

const config = await loadNetworkConfig();
if (!config) return;
expect(config.networkName).toBeDefined();
if (!config) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: returning early here turns a missing network/testnet.json into a passing test, so a path-resolution regression in loadNetworkConfig() will no longer fail CI. If this case is truly optional, mark it skipped via Vitest; otherwise, since this suite is meant to run from the monorepo, treat null as a failure.

expect(paranetFinalizationTopic('testing')).toBe('dkg/paranet/testing/finalization');
});

it('handles empty string paranet ID', () => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: these assertions bake malformed paranet IDs ('' here, and a/b below) into the supported contract even though the rest of the system treats the ID as a single topic/URI segment. That makes future validation look like a breaking change and normalizes outputs like did:dkg:paranet: and dkg/paranet/a/b/.... Prefer limiting coverage to documented valid IDs, or add explicit validation/rejection tests instead.

}));

function createTestServer(): McpServer {
async function createServerAndClient(): Promise<{ client: Client }> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: This helper now re-implements the MCP server instead of exercising src/index.ts, so the suite can stay green while production drifts. dkg_file_summary is already simplified here compared with the real handler. Please extract a server factory / registerTools() from production and instantiate that in the test rather than maintaining a second copy of the tool logic.

onMessage: vi.fn((topic: string, handler: GossipMessageHandler) => {
handlers.set(topic, handler);
}),
offMessage: vi.fn((topic: string) => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: This mock offMessage ignores the handler argument and deletes by topic only. That means the unsubscribe tests would still pass even if AKAGossipHandler unregisters a different callback than the one it subscribed. Mirror the real GossipSubManager.offMessage(topic, handler) behavior here and assert the same function reference is removed.

expect(mockGossip.publish).toHaveBeenCalledWith(topic, expect.any(Uint8Array));
});

it('incoming gossip message dispatches to registered event handlers', () => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The test name says it covers dispatch, but it only exercises the malformed-payload drop path. The happy path from decodeAKAEvent() to the registered onEvent handler is still untested, so a decode/dispatch regression would slip through. Add a valid encoded event assertion here and keep the malformed case as a separate test.

});

it('Repeated distributing rewards, then partial withdraw that puts node below min stake, verifying Ask excludes node', async () => {
it('Restake operator fees then partial withdraw: sums remain consistent', async () => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The withdrawal half of this scenario was removed, so the test no longer covers the accounting transition it claims to verify. As written it only checks restake math; a regression in requestWithdrawal updating weightedActiveAskSum / totalActiveStake would now pass. Reintroduce the partial withdrawal and assert the post-withdraw sums.

branarakic pushed a commit that referenced this pull request Apr 3, 2026
Cherry-pick test improvements from Bojan131's PR #66, adapted for V10 naming:
- Replace silent `if (skip) return` with ctx.skip() in chain/agent E2E tests
- Fix mcp-server dead import that could hang tests
- Tighten Ask.test.ts: exact values instead of gte(0n), deterministic inputs
- Tighten core assertions: toBeFalsy() → toBe(''), exact canonical outputs
- Fix oracle-verify conditional guard that could skip tamper detection
- Expand attested-assets gossip-handler coverage (subscribe/unsubscribe/publish)
- Add genesis snapshot for integrity hash detection
- Update topic expectations from dkg/paranet/ to dkg/context-graph/ (V10)
- Update genesis snapshots for V10 URI scheme

Skipped: lift-job source deletion and publish-flow.md removal (kept for RC1 stability).
Made-with: Cursor
@Bojan131 Bojan131 closed this Apr 14, 2026
@Bojan131 Bojan131 deleted the v9-tests branch April 14, 2026 14:47
Jurij89 pushed a commit that referenced this pull request May 14, 2026
…h items

UI-lead's WCAG sweep on PR #516 surfaced three blocking contrast
fails plus two pre-existing items that PR4 made more visible. All
five are single-property swaps — same recipe as the `.v10-md-hr`
fix that already landed.

**Blockers (all bubble-removal regressions)**

Root cause: PR4 dropped the assistant bubble, so markdown surfaces
now sit on `--bg-panel` directly. Three markdown containers were
styled against the previous `--bg-surface` bubble interior and used
`--border-subtle` / `--border-strong` — both fail WCAG 1.4.11 (3:1
non-text) against `--bg-panel` in either theme.

- Task #65 `.v10-md-pre` (code block) — outline `--border-subtle` →
  `--border-prominent`. The `--bg-elevated` fill is barely a lift
  over `--bg-panel` (~1.05-1.11:1), so the border is the only visual
  cue defining the block.

- Task #66 `.v10-md-table-scroll` (table container) — same swap.
  Outer border was 1.14:1 dark / 1.16:1 light against panel.

- Task #67 `.v10-md-blockquote` (left rail) — `--border-strong` →
  `--border-prominent`. The rail is the only visual cue; the 5%
  text-tertiary wash inside is imperceptible on panel. Strong was
  1.57:1 dark / 2.31:1 light; prominent clears 5.58 / 7.17.

**Pre-existing polish, surfaced by PR4**

- Task #68 `.v10-local-agent-msg-time` — `--text-tertiary` →
  `--text-secondary`. Tertiary on bg-panel clears only 2.31-2.66:1
  (fails AA). Pre-existing fail, but PR4's expanded "May 14, 2026,
  10:05 PM" format makes the strings more prominent so it's worth
  fixing now. Secondary clears 5.58 / 7.17.

- Task #69 `var(--panel-elevated)` (in-flight message attachment
  chip inline style at PanelRight.tsx:1326) — token is undefined;
  correct name is `--bg-elevated`. The silent fallback used to
  coincidentally land near `--bg-surface` and look right; with the
  bubble removed it now falls back to the panel and the chip blends
  into its parent. One-character fix.

All 546 unit tests still pass. No new tests added — these are
visual/CSS-only changes with the contrast math already verified by
UI-lead. PR4 contrast story: assistant text 16.4:1 dark / 16.2:1
light; non-text surfaces all ≥3:1 in both themes.
Jurij89 added a commit that referenced this pull request May 15, 2026
… states + timestamp expansion (#516)

* feat(node-ui): PR4 chat panel polish — full-width assistant + send-button states + timestamp expansion

Consolidation pass on the chat panel after the 3-PR revamp landed
(#503 / #504 / #505). Five distinct improvements rolled into one PR:

1. Drop the assistant bubble — full-width content.
   User messages stay as a right-aligned pill (unchanged). Assistant
   replies now render full-width without a background, border, or
   max-width constraint — matching Claude Desktop / ChatGPT / VS Code
   Copilot. The `.v10-chat-msg.assistant` wrapper switches to
   `align-items: stretch`; `.v10-chat-bubble.assistant` keeps only
   typography rhythm. As a side benefit the dark-mode contrast
   complaint disappears — assistant text inherits --text-primary on
   --bg-panel directly (~16.5:1 dark, ~13.8:1 light, well above
   WCAG AAA).

2. Send button state machine: idle / uploading / streaming.
   - Idle (default): ArrowUp icon, normal send semantics.
   - Uploading attachments: lucide `Loader2` spinner with a CSS spin
     animation; button is informational and disabled until upload
     settles. Honors `prefers-reduced-motion`.
   - Streaming an assistant reply: lucide `Square` (stop) icon; click
     re-binds to `onStopLocalStream` and aborts the in-flight
     AbortController. The existing `catch (err: any)` in
     `sendLocalMessage` already handles `err?.name === 'AbortError'`
     by setting the assistant bubble to "Request cancelled.", so a
     single `.abort()` is enough — no extra teardown.
   New `stopLocalStream` callback exposed from the host through a new
   `onStopLocalStream` prop on `ConnectedAgentsTab`.

3. Expand the inline timestamp format to include the date.
   `formatLocalTimestamp` now uses
   `toLocaleString({ dateStyle: 'medium', timeStyle: 'short' })` so
   each bubble reads e.g. "May 14, 2026, 10:05 PM" instead of just
   "10:05 PM". The two inline `new Date().toLocaleTimeString(...)`
   call-sites (user-send + assistant-complete) now route through the
   same helper for a single consistent format across history-loaded,
   live-sent, and stream-completed timestamps.

4. WCAG 1.4.11 (3:1 non-text) polish.
   - `.v10-attachment-chip-remove`: bumped from `--text-tertiary` to
     `--text-secondary` (~4.07:1 dark / ~7.1:1 light against the
     chip's --bg-elevated background). Hover still promotes to
     --text-primary.
   - `.v10-md-hr`: bumped from `--border-subtle` to
     `--border-prominent`. Subtle / default / strong all failed 3:1
     against --bg-panel in both themes; --border-prominent is the same
     chip-outline token already in use and clears 3:1 comfortably.

5. Tests.
   - `formatLocalTimestamp` test pins the new date+time output
     (locale-resilient — asserts year + colon/AM/PM rather than the
     exact string).
   - Two new send-button state tests: uploading mode shows the
     spinner SVG + "Uploading attachments" aria-label; streaming mode
     shows the stop button + "Stop reply" aria-label.
   - Bubble-removal test pins down that `.v10-chat-bubble.assistant`
     carries no inline background/border attributes.
   - 4 test fixtures gain `onStopLocalStream: noop` for the new prop.

Out of scope (deferred separately): Select typeahead, hover-only
timestamps. Industry-aligned no-bubble layout pattern referenced
from Claude Desktop / ChatGPT / VS Code Copilot.

* fix(node-ui): PR4 UX-lead round-2 — stop-button distinction, ARIA wording, time semantic

Three P1 fixes from UX-lead's review of PR #516:

- P1-B: distinguish the streaming Stop button from the idle Send
  button so a user typing a follow-up mid-stream doesn't reflexively
  click the same-looking surface and accidentally abort the reply.
  Switch the filled silhouette for an outlined-square treatment
  (`--bg-active` surface + `--border-prominent` outline) — same
  shape, different visual reading. Matches Claude / ChatGPT's
  stop-button pattern. `--border-prominent` against `--bg-active`
  clears WCAG 1.4.11 3:1 in both themes.

- P1-C: aria-label / title wording. WAI-ARIA APG: button labels
  describe the action (or its current unavailability), not narrate
  state. `"Uploading attachments"` reads as narration; the new
  `"Send message (attachments uploading)"` reads as role + reason —
  matches what screen readers expect.

- P1-A (minimum version): wrap the inline timestamp in `<time
  dateTime={tsRaw}>{ts}</time>` for screen-reader / machine-parseable
  semantics. Added a companion `toIsoTimestamp` helper and a new
  `tsRaw` field on `LocalAgentMessage`; the three timestamp-creation
  sites (history-load, user-send, assistant-complete) now write both
  `ts` (display) and `tsRaw` (ISO) so they always point at the same
  instant. The full "X minutes ago" + hover-only relative-time
  treatment is deferred per user direction — a separate PR will
  layer it on top of this semantic foundation.

Affected test updated: send-button uploading-state assertion now
pins down the new aria-label format ("Send message (attachments
uploading)"). 546 / 38 skipped, 0 failed.

* fix(node-ui): PR4 UI-lead audit — three contrast blockers + two polish items

UI-lead's WCAG sweep on PR #516 surfaced three blocking contrast
fails plus two pre-existing items that PR4 made more visible. All
five are single-property swaps — same recipe as the `.v10-md-hr`
fix that already landed.

**Blockers (all bubble-removal regressions)**

Root cause: PR4 dropped the assistant bubble, so markdown surfaces
now sit on `--bg-panel` directly. Three markdown containers were
styled against the previous `--bg-surface` bubble interior and used
`--border-subtle` / `--border-strong` — both fail WCAG 1.4.11 (3:1
non-text) against `--bg-panel` in either theme.

- Task #65 `.v10-md-pre` (code block) — outline `--border-subtle` →
  `--border-prominent`. The `--bg-elevated` fill is barely a lift
  over `--bg-panel` (~1.05-1.11:1), so the border is the only visual
  cue defining the block.

- Task #66 `.v10-md-table-scroll` (table container) — same swap.
  Outer border was 1.14:1 dark / 1.16:1 light against panel.

- Task #67 `.v10-md-blockquote` (left rail) — `--border-strong` →
  `--border-prominent`. The rail is the only visual cue; the 5%
  text-tertiary wash inside is imperceptible on panel. Strong was
  1.57:1 dark / 2.31:1 light; prominent clears 5.58 / 7.17.

**Pre-existing polish, surfaced by PR4**

- Task #68 `.v10-local-agent-msg-time` — `--text-tertiary` →
  `--text-secondary`. Tertiary on bg-panel clears only 2.31-2.66:1
  (fails AA). Pre-existing fail, but PR4's expanded "May 14, 2026,
  10:05 PM" format makes the strings more prominent so it's worth
  fixing now. Secondary clears 5.58 / 7.17.

- Task #69 `var(--panel-elevated)` (in-flight message attachment
  chip inline style at PanelRight.tsx:1326) — token is undefined;
  correct name is `--bg-elevated`. The silent fallback used to
  coincidentally land near `--bg-surface` and look right; with the
  bubble removed it now falls back to the panel and the chip blends
  into its parent. One-character fix.

All 546 unit tests still pass. No new tests added — these are
visual/CSS-only changes with the contrast math already verified by
UI-lead. PR4 contrast story: assistant text 16.4:1 dark / 16.2:1
light; non-text surfaces all ≥3:1 in both themes.

* fix(node-ui): PR4 Codex round-1 — per-conversation abort + unified canSend gate

Four critical Codex comments on PR #516, two root causes:

**Per-conversation abort controllers (CIV4a / CIcaM / CIlg0)**

Three independent reports flagged the same bug: `localAbortRef` was a
single global `useRef<AbortController | null>`, but `localSending` is
tracked per conversation. Concurrent streams or a quick switch between
conversations would silently overwrite the ref — clicking Stop in
conversation A could then abort conversation B's request (or no-op
if A's stream had finished and cleared the ref).

Replaced with `useRef<Map<string, AbortController>>` keyed by
`conversationKey`. Three call-sites updated:
- `sendLocalMessage` stores `controller` under its `conversationKey`.
- `finally` does a compare-and-delete so a late teardown from a
  prior request can't wipe a newer same-key entry on retry.
- `stopLocalStream` looks up the controller for
  `selectedConversationKey` and aborts only that one.

**Unified `canSend` gate (CIlgu)**

The button correctly disabled itself when any draft was `uploading`,
but the textarea Enter / Cmd+Enter handlers still consulted only the
original "inputDisabled / sendable drafts" gate. A user pressing Enter
mid-upload would race `prepareAttachmentDraftsForSend`, which treats
`uploading` drafts as sendable — either starting a second import for
the same file or pushing the turn before the first upload finished.

Added a single `canSend` flag computed from `inputDisabled +
!isUploadingAttachments + has-text-or-sendable-drafts`. Both the
button's `disabled` prop and the two Enter handlers consult it, so the
two surfaces stay in lockstep.

Coverage: two new composer-autosize tests pin Enter and Cmd+Enter both
becoming no-ops while a draft is `uploading`. 548 / 38 skipped / 0
failed.

* fix(node-ui): PR4 round-2 — un-escape literal \n on history-load (refresh regression)

Live-streamed agent text arrives with real `
` characters and renders
markdown correctly. The DKG-memory persistence path, though, encodes
those newlines as literal `\n` (backslash + the letter n). On panel
refresh / history reload the literal characters survived into the
React state and the markdown renderer treated the entire content as
one long paragraph — code fences didn't open, paragraphs ran together,
table separators stayed as `|---|`-as-text.

Root cause: PR3 round-17 (Codex CHWpS) removed the global
`replace(/\n/g, '\n')` from `normalizeMessageContent` because it
corrupted legitimate `\n` content in live-stream code samples (JSON
like `{"text":"a\nb"}`, shell like `echo -e "a\nb"`). That fix is
still right for live content. The mistake was extending it to history
content, where the persistence layer has already encoded the newlines.

Symmetric fix at the transport boundary — exactly the place Codex
itself recommended ("the right place is the transport boundary, not
the renderer"). New `unescapeNewlinesFromHistory` helper applied
only in `mapHistoryMessage`, leaving the live-stream path
(`sendLocalMessage`) untouched so CHWpS stays addressed for typing-
during-stream and other live transports.

Known tradeoff: a code sample that intentionally contains literal
`\n` AND was later persisted via history will get its escape
unwrapped on reload. Fixing it cleanly requires the persistence
layer to round-trip strings faithfully (emit raw UTF-8 with real
newlines instead of JSON-escaped). Worth a daemon-side follow-up;
meanwhile the markdown-broken-after-refresh regression was the worse
user-visible problem and is what this PR is meant to polish.

Test impact:
- `openclaw-bridge.test.ts`'s static-text assertion updated for the
  new `decodedText` variable name and the new helper call.
- All 548 tests still pass.

* fix(node-ui): PR4 Codex round-2 — narrow history newline-decode heuristic (CLWmd)

Round-1 of PR4 round-2 fixed the "markdown breaks after refresh"
regression by blanket-decoding `\n -> \n` in history-loaded text.
Codex CLWmd flagged the lossy side of that — same concern PR3
round-17 (CHWpS) raised about the original global rewrite: code /
JSON samples containing intentional literal `\n` got mangled.
Windows `\r\n` also half-decoded to `\r` + real newline.

Detection heuristic: persisted-and-escaped content has zero real
newlines (the persistence layer replaced them all). Live content
that round-tripped correctly keeps its real newlines. So:

- If `text` already contains ANY real `\n` or `\r`, treat the
  `\n` sequences as intentional literals and skip the decode.
- Otherwise decode `\r\n` first (to avoid the half-decode
  Codex flagged), then `\n`.

False-positive scope: a single-line agent message that intentionally
contains literal `\n` AND zero real newlines AND was persisted —
that combination still gets unwrapped. Rare enough to accept;
clean fix is daemon-side (persistence should round-trip strings
faithfully, emit raw UTF-8 with real newlines).

New test pins down 4 categories: persisted-escaped decodes, live
content with literals is preserved, CRLF decodes to LF without
half-decode, edge cases (undefined / empty / no-escapes / plain
multi-line). 549 tests passing.

* fix(node-ui): PR4 Codex round-3 — narrow history-decode to markdown-structure markers (CNGB8)

Codex CNGB8 flagged that round-2's "no real newlines anywhere"
heuristic still mangled single-line JSON examples that intentionally
contain literal `\n` — `{"text":"a\nb"}`, `echo -e "a\nb"`,
prompts discussing escape sequences.

Narrowed the decode further: now requires the text to also carry an
unambiguous markdown-structure marker AFTER a `\n`. Seven markers
checked:
  - `\n\n`          paragraph break
  - `\n#`            heading
  - `\n- ` / `\n* `       bullet list
  - `\n[0-9]+. `     ordered list
  - `\n` + ``` + `` ``     fenced code block
  - `\n|`            table row
  - `\n>`            blockquote

CRLF content (`\r\n` between paragraphs) is handled by
pre-collapsing `\r\n` → `\n` for the marker probe only — keeps
the regex set compact and lets both LF and CRLF payloads share one
detection path. The actual decode still runs `\r\n` first so we
don't leave a half-decoded `\r` + real newline (Codex CLWmd's
secondary concern).

Single-line JSON / code samples without markdown markers now survive
unchanged. The remaining false-negative — a persisted plain
"line1\nline2" two-line message — gets shown with its literal `\n`
visible. Rare enough to accept; the proper fix is daemon-side
(persistence should round-trip strings as raw UTF-8 with real
newlines), tracked separately.

Test impact: existing `unescapeNewlinesFromHistory` test expanded to
8 categories — persisted markdown (paragraph break, heading, bullet
list, ordered list, fenced code, table, blockquote) decodes; single-
line JSON / code literal preserves; live content with real newlines
untouched; CRLF folds to LF without half-decode; edge cases
(undefined / empty / no escapes / plain multi-line). 549 tests
passing locally.

* fix(node-ui): PR4 Codex round-4 — boundary-only decode + per-conversation abort regression test

Two Codex follow-ups on PR #516:

**CSI-f (Critical) — decode only at structural boundaries**

Round-3 still ran a blanket `replace(/\n/g, '\n')` once the
markdown-marker gate opened, which corrupted `\n` literals INSIDE
fenced code blocks. A persisted

  Here is JSON:\n```json\n{"text":"a\nb"}\n```

would reload with `a\nb` turned into `a` + real newline + `b`,
breaking the rendered JSON example.

Replaced the blanket replace with seven targeted boundary replaces —
each only fires when `\n` is immediately followed by a specific
markdown marker (`\n`, `#`, `- ` / `* `, `digit. `, ` ``` `, `|`,
`>`). CRLF variants paired with each LF variant. The `\n` between
alphanumerics inside a code sample matches no rule and stays
literal — exactly the inline-JSON case Codex flagged.

Tradeoff: multi-line code blocks whose internal lines were joined
with `\n` will now show literal `\n` between code lines (the
internal `\n`s have no marker after them). Faithful display,
strictly better than the corruption blanket-decode would produce.
The proper fix is daemon-side — persistence should round-trip
strings as raw UTF-8 with real newlines, not escape-encode them.

**CSI-j (Issue) — per-conversation abort regression test**

Round-1's fix to the `localAbortRef` race (CIV4a/CIcaM/CIlg0) added
per-conversation abort controllers but had no regression cover.
Added a focused unit test that pins down the four invariants the
`useRef<Map<conversationKey, AbortController>>` implementation must
hold:

  1. Two conversations can hold separate controllers concurrently.
  2. Aborting the selected conversation's controller does NOT affect
     the other's.
  3. The `finally` compare-and-delete cleanup removes only its OWN
     controller — a late teardown from a stale request can't wipe a
     newer same-key entry on retry.
  4. Cleanup on the OTHER conversation leaves the selected one
     untouched.

24 panel-right logic tests passing, all 120 affected node-ui tests
pass in isolation. (Full-suite vitest run shows parallel-worker
flakes in panel-right.component / attachment-chip / markdown-message
/ openclaw-bridge / panel-right.refresh-history — pre-existing
infrastructure flakes documented across the prior 17 rounds; all
pass when their files run in isolation.)

* revert(node-ui): PR4 round-5 — remove UI-side history newline decode

Codex CSqGa correctly caught a fourth false-positive (and counting):
the markdown-marker gate still rewrites legitimate literals that
happen to contain a marker-shaped escape sequence, e.g.
`{"pattern":"\n- item"}` or `The token is \n#`. After history
reload those messages would render with real newlines the sender
never sent.

This is the fourth round of catching corruption in this heuristic:
  - CLWmd (round-2): blanket decode mangled JSON / shell snippets
  - CNGB8 (round-3): "no real newlines anywhere" gate still
    mangled single-line JSON examples
  - CSI-f (round-4): "markdown-marker" gate did a blanket decode
    inside the gated branch, corrupting code-block-internal `\n`
  - CSqGa (round-5): even boundary-only decode rewrites literals
    that happen to contain a marker-shaped sequence

The fundamental issue is that the UI cannot reliably distinguish
"agent intended a literal `\n`" from "persistence encoded a
newline as `\n`" without a richer signal. Reverted the helper
entirely; documented the known issue + proper fix path
(persistence-side: round-trip raw UTF-8 with real newlines, or
carry an explicit "escaped" marker on encoded payloads) in a
comment on `mapHistoryMessage`.

User-visible: persisted markdown turns will continue to show
literal `\n` characters on history reload until the persistence
layer fix lands. That's strictly less broken than the corruption
the UI-side guess introduced.

CSqGe (Issue) — separately, the openclaw-bridge.test.ts source-text
assertion is restored to its pre-PR4 form (`content: message.text
|| buildAttachmentSummary(...)`). Codex's broader concern about
source-string pinning vs. observable bridge behavior is valid and
applies to several pre-existing assertions in that file; addressing
it cleanly is a separate test-refactor and is tracked but not part
of this revert.

Tests: 24 (logic) + 4 (component) + 77 (openclaw-bridge) + 22
(markdown) + 14 (composer) = 142 passing locally. The CSI-j
per-conversation abort regression test added in round-4 stays.

* fix(node-ui): PR4 round-6 — dark-mode plain-text contrast + markdown-after-refresh

Bug 1: global `p { color: var(--text-secondary) }` won over `.v10-md-p`
(no color) once PR4 removed the assistant bubble whose own
`--text-primary` had masked it. Add explicit `--text-primary` to
`.v10-md-p`/`.v10-md-li`/`.v10-md-td` (class specificity beats bare `p`).

Bug 2: chat text is persisted via `JSON.stringify` but read back through
`stripRdfLiteral`, which only strips the literal wrapper and leaves JSON
escapes intact — so reloaded markdown showed literal `\n`. Add
`decodeRdfStringLiteral`, the exact deterministic inverse (re-quote +
`JSON.parse`), scoped to the two chat-text read sites only. Shared
`stripRdfLiteral` is untouched (nested-JSON / scalar call sites). Adds
round-trip unit tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(node-ui): PR4 r6 follow-up — re-dim blockquote body + combined round-trip fixture

ui-lead review of 6264e14: react-markdown renders blockquote body as
`<blockquote><p class="v10-md-p">`, so the new explicit
`.v10-md-p { color: var(--text-primary) }` won directly over the
`--text-secondary` the inner `<p>` previously inherited from
`.v10-md-blockquote`, defeating the intentional dim-quote affordance the
plan required preserving. Add a scoped `.v10-md-blockquote .v10-md-p/li`
re-dim rule (specificity 0,2,0 beats 0,1,0; clears AA 5.45:1 dark /
6.92:1 light) and correct the now-precise comment.

Add the ux/qa-lead-suggested fixture: one string holding both a real
newline and a literal backslash-n token, locking both halves of the
JSON.stringify inverse against any future heuristic regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(node-ui): PR4 r6 — qa-lead hardening cases for decodeRdfStringLiteral

Add the two belt-and-suspenders cases qa-lead suggested in review:
lone single backslash (`a\b`, distinct from the `\n` token) and an
astral/surrogate-pair char (emoji + 𝕏). Both are lossless by
construction since JSON.parse is the exact inverse of the write-side
JSON.stringify; they pin the behavior against future regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(node-ui): PR4 r6 — Codex round-6: decode stored-transition reply + delocalize timestamp test

🔴 chat-memory.ts: the stored-transition overwrite used wrapper-only
stripRdfLiteral, so a persisted (stored) assistant turn — the dominant
path on reload — re-broke markdown with literal `\n` despite the base
schema:text decode. The transition assistantReply is written via the
same JSON.stringify (opts.assistantReply quad), so decode it with
decodeRdfStringLiteral too. Adds a multi-line stored-transition
regression test.

🟡 panel-right.logic.test.ts: formatLocalTimestamp test hard-coded
en-US / Gregorian traits (/2026/, /:|AM|PM/) that fail under non-English
or non-Gregorian runtime locales. Pin the actual PR4 contract by
comparing against the same Intl options the helper uses (medium date +
short time), assert date-present via full-vs-time-only inequality, and
match the locale's own numeric year — all locale-agnostic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jurij Skornik <jurij.skornik@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant