fix(artifacts): close remaining streaming parser leak paths#104
Merged
fix(artifacts): close remaining streaming parser leak paths#104
Conversation
Contributor
There was a problem hiding this comment.
Findings
- [Major] Bare
<artifact>is now treated as a valid artifact open tag, which can convert plain text into artifact events and drop visible text unexpectedly (data-loss/behavior regression). Evidence:packages/artifacts/src/parser.ts:202.
Suggested fix:// Keep requiring whitespace after `<artifact` so malformed/bare tags stay text. if (!/\s/.test(next)) { from = afterPrefix; continue; }
Summary
- Review mode: initial
- 1 Major issue found in the latest diff.
Testing
- Not run (automation)
open-codesign Bot
| // Buffer ends exactly at `<artifact`; can't yet decide if it's a real tag. | ||
| return { kind: 'partial', start: idx }; | ||
| } | ||
| if (next !== '>' && !/\s/.test(next)) { |
Contributor
There was a problem hiding this comment.
[Major] Bare <artifact> is now accepted as an opening artifact tag because next === '>' passes the guard here. This changes prior behavior (which required attributes) and can reinterpret literal text as artifact markup.
Suggested fix:
if (!/\s/.test(next)) {
from = afterPrefix;
continue;
}The streaming open-tag scanner is now quote-aware. Previously the `<artifact ...>` regex used `[^>]*?` which cannot tell a real `>` from one inside a quoted attribute value, so a title like `title="a > b"` silently dropped the title, corrupted the body, and (when the stream split landed mid-attribute) leaked raw `<artifact ...` markup into a text event. Replaces the regex with a small quote-aware scanner returning `complete` / `partial` / `none`, which also subsumes the previous `findSafeFlushPoint` partial-prefix logic. Audited the other seven streaming-leak axes (close-tag splits, char-by-char splits, multi-artifact streams, multi-line tag declarations, literal `<` and literal `<artifact` inside body, truncated streams) — all already robust; added regression tests.
4ce3f17 to
2bcd95e
Compare
Collaborator
Author
|
Addressed Codex Major: tightened |
Contributor
There was a problem hiding this comment.
Findings
- None.
Summary
- Review mode: follow-up after new commits
- No issues found in the current added/modified lines of
packages/artifacts/src/parser.tsandpackages/artifacts/src/parser.test.ts. - Residual risks/testing gaps: no integration/e2e parser coverage is included in this diff; behavior for literal
</artifact>in artifact body remains an edge-case not addressed here.
Testing
- Not run (automation):
pnpmis unavailable in this runner (pnpm: command not found).
open-codesign Bot
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Audit follow-up to #95. Re-checked the streaming artifact parser on 8 leak axes; one real leak class found and fixed, the other 7 were already robust and now have regression tests.
Real leak fixed:
>inside a quoted attribute valueOPEN_TAG_RE = /<artifact\s+([^>]*?)>/cannot tell a real>from one inside a quoted attribute value, so input like:'<artifact identifier="a1" type="html" title="a > b">body</artifact>'silently dropped
title, corrupted the body, and (when the stream split landed mid-attribute) leaked raw<artifact ...markup into atextevent:Replaced both
OPEN_TAG_REandfindSafeFlushPointwith a single quote-aware scannerfindOpenTagreturningcomplete/partial/none. Thepartialcase subsumes the previous prefix-holdback logic.Already robust (regression tests added)
</artifact>close-tag split across deltas<artifact>blocks in one stream<artifact\n identifier=...)\s+)flush()emits final end)<and literal<artifactin body content</artifact>exits inside-mode)Not fixed (out of scope, exceptional)
</artifact>inside body — not disambiguable without a delimiter; models do not emit this in practice.Principles check
findOpenTagcollapses the regex + the prefix-holdback helper into one quote-aware passTest plan
pnpm --filter @open-codesign/artifacts test— 17 / 17 pass (3 new failing tests now pass)pnpm test— all packages greenpnpm lint— no new warnings (parser.ts complexity suppressed in line with existing convention infeed)pnpm typecheck— green