Skip to content

fix(visualsign): reject JSON short-form control escapes in validate_charset#332

Merged
prasanna-anchorage merged 6 commits into
mainfrom
pepefigueira/prs-231-charset-validator-misses-json-short-form-control-character
May 29, 2026
Merged

fix(visualsign): reject JSON short-form control escapes in validate_charset#332
prasanna-anchorage merged 6 commits into
mainfrom
pepefigueira/prs-231-charset-validator-misses-json-short-form-control-character

Conversation

@pepe-anchor
Copy link
Copy Markdown
Contributor

@pepe-anchor pepe-anchor commented May 25, 2026

Summary

SignablePayload::validate_charset() was only blocking \u substrings in the serialized JSON, then leaning on is_ascii() && (is_ascii_graphic() || is_ascii_whitespace()). serde_json::CompactFormatter emits short-form escapes (\b, \t, \n, \f, \r, \", \\) for control bytes and structural chars; all of those (except \n, see carve-out below) slip past the ASCII checks while still smuggling real control bytes into the decoded display string. The Tron converter forwarding parameter.type_url straight into a text field is the concrete sink; an attacker can break single-line wallet UI or splice display lines via \r, \t, etc.

What changed

  • src/visualsign/src/lib.rs::validate_charset: replaced the single "\\u" substring check with a list covering the escapes the serializer can emit that we want to reject: \u, \t, \r, \b, \f, \", \\, plus \/ defensively (the compact formatter does not currently emit \/, but the rule documents the intent in case a future serializer or replacement ever does). \/ is listed ahead of \\ so the defensive guard actually fires first on any future input containing bare \/.
  • Newline carve-out: \n is intentionally not in the deny-list. It is the wallet's documented line separator for multi-line field text (e.g. "Program ID: X\nData: Y") and 30+ trusted call sites across the chain converters emit it deliberately. Untrusted strings flowing into field text must be sanitized at the insertion site instead. test_validate_charset_accepts_newline_in_text pins this contract.
  • Added regression tests in the same file:
    • test_validate_charset_rejects_short_form_control_escapes (4 cases: \t, \r, \b, \f; each pins the exact short-form escape the serializer emits)
    • test_validate_charset_rejects_quote_and_backslash_escapes
    • test_validate_charset_rejects_backslash_slash_combination
    • test_validate_charset_still_rejects_unicode_escape
    • test_validate_charset_accepts_plain_ascii (positive case)
    • test_validate_charset_accepts_newline_in_text (positive case, pins carve-out)

No callers changed. The fix is purely an extra deny-list inside the validator, so any payload that already passed continues to pass; payloads that snuck control bytes through (other than \n) now get rejected with the same ValidationError("Restricted Characters Detected") already used for \u.

Rollback

Revert this commit. No data migrations, no protobuf changes, no on-disk state. The validator returns to its previous (looser) behavior; nothing downstream depends on the new rejections beyond the new tests in this PR.

Backwards compatibility

Strictly tightens validation on data that should never have been allowed past it. No public API changed. Any caller currently producing payloads that contain ", \, or control bytes other than \n in text fields was already producing JSON that broke the validator's intent; those callers need to sanitize their input. None of the existing chain converters in this repo exercise that path with the current fixture corpus (visualsign unit + doctest suite is green; cross-chain fixture verification is left for the PR-level CI per scope guidance on the ticket).

Test plan

Verification scoped to visualsign, ran from src/:

  • cargo fmt, no diff
  • cargo clippy -p visualsign --all-targets -- -D warnings, clean
  • cargo test -p visualsign, all passing (incl. new regression tests and the pre-existing test_signable_payload_to_json / charset suite)

Cross-chain CI on this PR will exercise the remaining converters end-to-end.

pepe-anchor and others added 2 commits May 25, 2026 14:12
…harset

`SignablePayload::validate_charset()` used to only block `\u` substrings in
the serialized JSON, then rely on `is_ascii() && (is_ascii_graphic() ||
is_ascii_whitespace())`. The default `serde_json::CompactFormatter` emits
short-form escapes for 0x08, 0x09, 0x0A, 0x0C, 0x0D as `\b`, `\t`, `\n`,
`\f`, `\r`, which are pure ASCII graphic + whitespace bytes on the wire.
Those escapes survived the validator while still smuggling real control
bytes into the decoded field text once a downstream JSON parser ran,
letting an attacker break a single-line wallet UI or splice display lines.
The Tron converter forwarding `parameter.type_url` straight into a text
field is the concrete sink called out on the ticket.

Extend the substring-rejection list to cover every escape the serializer
can emit: `\u`, `\n`, `\t`, `\r`, `\b`, `\f`, `\"`, `\\`, plus `\/`
defensively. Adds regression tests for each escape, for the existing `\u`
rule, and a positive case for plain ASCII.

Closes PRS-231

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The PRS-231 deny-list rejected `\n` along with the other short-form
control escapes, but `\n` is the wallet's documented multi-line
separator. 30+ trusted call sites across the chain converters emit
strings like "Program ID: X\nData: Y" deliberately. The original deny
broke every chain's instruction display in workspace integration tests.

The other short-form escapes stay rejected (`\t`, `\r`, `\b`, `\f`,
plus the existing `\u`, `\"`, `\\`, `\/`). Attacker-controlled strings
flowing into field text must be sanitized at the insertion site; the
remaining sinks (e.g. Tron parameter.type_url) are tracked as a
follow-up to PRS-231.

Adds a regression test pinning the `\n` carve-out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pepe-anchor pepe-anchor marked this pull request as ready for review May 25, 2026 15:09
Copilot AI review requested due to automatic review settings May 25, 2026 15:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Tightens SignablePayload::validate_charset() to reject serde_json short-form escape sequences (and other escape forms) that can otherwise pass ASCII validation while decoding into control/structural bytes in downstream display strings.

Changes:

  • Expanded validation from a single \\u check to a deny-list of JSON escape sequences emitted by serde_json (plus a defensive \\/).
  • Added regression tests covering rejection of control-byte escapes, quotes/backslashes, and unicode escapes, plus positive tests for plain ASCII and the newline carve-out.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/visualsign/src/lib.rs Outdated
Comment thread src/visualsign/src/lib.rs Outdated
Comment thread src/visualsign/src/lib.rs Outdated
Comment thread src/visualsign/src/lib.rs Outdated
Reorder FORBIDDEN_JSON_ESCAPES so `\/` is checked before `\\`. With `\\`
listed first, any input containing a literal backslash would short-circuit
on the backslash rule and the defensive `\/` entry could never attribute a
rejection, masking whether it does anything. The new order lets a
hypothetical future serializer emitting bare `\/` (the compact formatter
does not) be rejected first by the dedicated rule.

Rename the slash-related regression test to honestly describe what it
covers (a backslash-slash input combination, not a serializer-emitted `\/`),
and document the unreachable-with-current-serializer caveat inline.

Sharpen the short-form-control-escape test so the sanity check pins the
specific escape serde_json emits (`\t`, `\r`, `\b`, `\f`) per case instead
of just asserting any backslash, which would also pass if the serializer
ever switched to `\uXXXX` long-form escapes.

No behavioural change to the validator beyond the reordering (the deny-list
is unchanged); the carve-out for `\n` is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread src/visualsign/src/lib.rs Outdated
Drop the "list order matters for attribution" rationale from the
FORBIDDEN_JSON_ESCAPES comment. All matches return the same generic
ValidationError, so which entry fires first is not observable. Keep the
defensive-first ordering as a cosmetic choice. Also corrected the
parallel claim in test_validate_charset_rejects_backslash_slash_combination's
doc-comment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread src/visualsign/src/lib.rs Outdated
Replace the literal tab character inside the sanity-check comment with an
explicit "U+0009" reference so the intent is readable in source and not
mangled by editors/linters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@prasanna-anchorage
Copy link
Copy Markdown
Contributor

Heads up: this repo is public, so internal Linear refs (PRS-xxx) shouldn't leak into PR-visible surface — external readers can't resolve them.

This PR's leaks:

  • Body links linear.app/anchorlabs/issue/PRS-231 — swap for a one-line bug-class summary or a public GH issue reference.
  • 5 code references / 5 inline comments mention PRS-231 — rename to describe the bug class.
  • Branch pepefigueira/prs-231-... also exposes the ID (optional).

Suggested convention: describe the bug in the name (e.g., test_short_form_control_escapes_rejected). For persistent traceability, file a public GitHub issue.

Removes PRS-NNN references from comments so external readers of this
public repo don't see internal triage IDs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pepe-anchor
Copy link
Copy Markdown
Contributor Author

Scrubbed PRS-231 refs from body and code/comments. Branch left as-is since it'll disappear on merge.

@prasanna-anchorage
Copy link
Copy Markdown
Contributor

Deep-review verdict: ship as-is.

Verified the validator/serde contract empirically against serde_json::CompactFormatter:

Input byte serde emits Caught by
0x08, 0x09, 0x0C, 0x0D \b, \t, \f, \r new substring rules ✓
0x0A LF \n carve-out (allowed)
0x22 ", 0x5C \ (in string) \", \\ new substring rules ✓
0x7F DEL literal byte pre-existing per-char is_ascii_graphic/is_ascii_whitespace
U+202E, ZWSP, non-ASCII literal UTF-8 bytes pre-existing !is_ascii()
0x2F / literal / \/ rule is dead defensive code (harmless)

No emit/validate mismatch — the most important property for a string-pattern validator. Tests inject real control bytes and route them through to_json(), which pins the validator-emitter contract so a future serde change can't silently move the goalposts.

Residual gap (follow-up, not blocking): the \n carve-out is a UI-line-spoof primitive. An attacker who controls a string field can embed \n\n to fake a second wallet UI line ("Approved spender\n\nRecipient: 0xMallory"). Closing this needs per-sink sanitization, not a stricter central gate — per the in-source comment at lines 922-924. Tracking issue suggested: a sanitize_user_text(&str) -> String helper in visualsign that strips \n for use at every place where caller-controlled metadata flows into a field body.

#331 is now merged so the gate is closed on every chain; this PR makes the validator strict enough to catch the short-form escapes that previously slipped past is_ascii_whitespace. Together they're complete coverage modulo the per-sink follow-up above.

Copy link
Copy Markdown
Contributor

@prasanna-anchorage prasanna-anchorage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving per deep-review verdict above — ship as-is.

@prasanna-anchorage prasanna-anchorage merged commit a4d1db7 into main May 29, 2026
9 checks passed
@prasanna-anchorage prasanna-anchorage deleted the pepefigueira/prs-231-charset-validator-misses-json-short-form-control-character branch May 29, 2026 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants