Skip to content

fix(artifacts): close remaining streaming parser leak paths#104

Merged
hqhq1025 merged 2 commits intomainfrom
parser-leak-audit
Apr 19, 2026
Merged

fix(artifacts): close remaining streaming parser leak paths#104
hqhq1025 merged 2 commits intomainfrom
parser-leak-audit

Conversation

@hqhq1025
Copy link
Copy Markdown
Collaborator

Summary

Audit follow-up to #95. Re-checked the streaming artifact parser on 8 leak axes; one real leak class found and fixed, the other 7 were already robust and now have regression tests.

Real leak fixed: > inside a quoted attribute value

OPEN_TAG_RE = /<artifact\s+([^>]*?)>/ cannot tell a real > from one inside a quoted attribute value, so input like:

'<artifact identifier="a1" type="html" title="a > b">body</artifact>'

silently dropped title, corrupted the body, and (when the stream split landed mid-attribute) leaked raw <artifact ... markup into a text event:

['<artifact identifier="a1" type="html" title="a >', ' b">body</artifact>']

Replaced both OPEN_TAG_RE and findSafeFlushPoint with a single quote-aware scanner findOpenTag returning complete / partial / none. The partial case subsumes the previous prefix-holdback logic.

Already robust (regression tests added)

Axis Status
</artifact> close-tag split across deltas already robust (10-char holdback)
Close tag arriving one char at a time already robust
Multiple <artifact> blocks in one stream already robust (state resets in loop)
Back-to-back artifacts with close+open in same chunk already robust
Multi-line tag declaration (<artifact\n identifier=...) already robust (\s+)
Truncated artifact at stream end already robust (flush() emits final end)
Literal < and literal <artifact in body content already robust (only </artifact> exits inside-mode)

Not fixed (out of scope, exceptional)

  • Literal </artifact> inside body — not disambiguable without a delimiter; models do not emit this in practice.
  • HTML entity decoding inside attribute values — models do not escape attrs; would only matter if they did.

Principles check

  • Compatibility: green — no schema/IPC change
  • Upgradeability: green — pure parser internals
  • No bloat: green — net delta is one helper function; no new deps
  • Elegance: green — findOpenTag collapses the regex + the prefix-holdback helper into one quote-aware pass

Test plan

  • pnpm --filter @open-codesign/artifacts test — 17 / 17 pass (3 new failing tests now pass)
  • pnpm test — all packages green
  • pnpm lint — no new warnings (parser.ts complexity suppressed in line with existing convention in feed)
  • pnpm typecheck — green

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] Bare <artifact> is now treated as a valid artifact open tag, which can convert plain text into artifact events and drop visible text unexpectedly (data-loss/behavior regression). Evidence: packages/artifacts/src/parser.ts:202.
    Suggested fix:
    // Keep requiring whitespace after `<artifact` so malformed/bare tags stay text.
    if (!/\s/.test(next)) {
      from = afterPrefix;
      continue;
    }

Summary

  • Review mode: initial
  • 1 Major issue found in the latest diff.

Testing

  • Not run (automation)

open-codesign Bot

Comment thread packages/artifacts/src/parser.ts Outdated
// Buffer ends exactly at `<artifact`; can't yet decide if it's a real tag.
return { kind: 'partial', start: idx };
}
if (next !== '>' && !/\s/.test(next)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Major] Bare <artifact> is now accepted as an opening artifact tag because next === '>' passes the guard here. This changes prior behavior (which required attributes) and can reinterpret literal text as artifact markup.

Suggested fix:

if (!/\s/.test(next)) {
  from = afterPrefix;
  continue;
}

The streaming open-tag scanner is now quote-aware. Previously the
`<artifact ...>` regex used `[^>]*?` which cannot tell a real `>` from
one inside a quoted attribute value, so a title like `title="a > b"`
silently dropped the title, corrupted the body, and (when the stream
split landed mid-attribute) leaked raw `<artifact ...` markup into a
text event.

Replaces the regex with a small quote-aware scanner returning
`complete` / `partial` / `none`, which also subsumes the previous
`findSafeFlushPoint` partial-prefix logic.

Audited the other seven streaming-leak axes (close-tag splits,
char-by-char splits, multi-artifact streams, multi-line tag
declarations, literal `<` and literal `<artifact` inside body,
truncated streams) — all already robust; added regression tests.
@hqhq1025 hqhq1025 force-pushed the parser-leak-audit branch from 4ce3f17 to 2bcd95e Compare April 19, 2026 09:28
@hqhq1025
Copy link
Copy Markdown
Collaborator Author

Addressed Codex Major: tightened findOpenTag to require whitespace after <artifact, so bare <artifact> (and any prose mentioning the literal token) is no longer treated as an open tag. Real Claude artifacts always carry identifier/type. Added 2 regression tests; full pnpm test and pnpm lint green.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • None.

Summary

  • Review mode: follow-up after new commits
  • No issues found in the current added/modified lines of packages/artifacts/src/parser.ts and packages/artifacts/src/parser.test.ts.
  • Residual risks/testing gaps: no integration/e2e parser coverage is included in this diff; behavior for literal </artifact> in artifact body remains an edge-case not addressed here.

Testing

  • Not run (automation): pnpm is unavailable in this runner (pnpm: command not found).

open-codesign Bot

@hqhq1025 hqhq1025 merged commit 2e6aabd into main Apr 19, 2026
5 of 6 checks passed
@hqhq1025 hqhq1025 deleted the parser-leak-audit branch April 19, 2026 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant