Skip to content

Add JSON-inherited must-reject vectors#3

Merged
AndrianBdn merged 1 commit into
masterfrom
json-inherited-must-reject
May 26, 2026
Merged

Add JSON-inherited must-reject vectors#3
AndrianBdn merged 1 commit into
masterfrom
json-inherited-must-reject

Conversation

@peo-machine
Copy link
Copy Markdown
Contributor

@peo-machine peo-machine commented May 26, 2026

Summary

Adds nine must-reject/ vectors covering token-level RFC 8259 errors that
real CSV producers sometimes emit. Per §1's foundation decision, CSVJ values
are JSON values per RFC 8259, so any input that violates JSON tokenization
must be rejected by a conforming CSVJ reader.

Vector What it tests
10-leading-zeros.csvj 0123 — leading zeros forbidden on integers
11-bare-dot-number.csvj .5 — integer part required
12-trailing-dot-number.csvj 1. — fractional digit required
13-nan.csvj NaN — not a JSON value
14-infinity.csvj Infinity — not a JSON value
15-uppercase-true.csvj True — keywords are lowercase only
16-uppercase-null.csvj Null — keywords are lowercase only
17-single-quoted-string.csvj 'hello' — strings require double quotes
18-unescaped-control-char.csvj literal U+0001 inside a string — RFC 8259 §7 forbids unescaped U+0000U+001F

Why BOM is not in this set

PLAN.md's task list mentions "BOM at file start" but §1 explicitly adopts
RFC 8259 §8.1's BOM rule: writers MUST NOT add a BOM, but parsers MAY
ignore one. A parser that strips a leading BOM is conforming, so BOM at
file start cannot live in must-reject/. (gocsvj happens to strip BOM
via strings.TrimSpace, which is also conforming.)

Numbering

These vectors are numbered 10+ to avoid collision with #2's §1-decisions
reject set (05–09).

Reference impl

All nine vectors are confirmed rejected by gocsvj@master, which delegates
value tokenization to encoding/json. CI on this branch should land green
without needing entries in #2's referencePending skip-list.

Test plan

  • go test ./... passes locally with the new vectors
  • CI green on Go 1.23 and stable

CSVJ values are JSON values per RFC 8259 (spec §1, foundation rule).
Add nine vectors covering token-level RFC 8259 errors that CSV producers
sometimes emit but that conforming CSVJ readers must reject:

- 10-leading-zeros: 0123 — JSON forbids leading zeros on integers
- 11-bare-dot-number: .5 — JSON requires an integer part
- 12-trailing-dot-number: 1. — JSON requires a fractional digit
- 13-nan: NaN — not a JSON value
- 14-infinity: Infinity — not a JSON value
- 15-uppercase-true: True — JSON keywords are lowercase only
- 16-uppercase-null: Null — JSON keywords are lowercase only
- 17-single-quoted-string: 'hello' — JSON strings require double quotes
- 18-unescaped-control-char: literal U+0001 inside string — RFC 8259
  §7 forbids unescaped U+0000–U+001F in strings

BOM-at-file-start from PLAN.md's list is intentionally NOT included:
§1's foundation decision adopts RFC 8259 §8.1, which says parsers MAY
ignore a leading BOM rather than treat it as an error. A reader that
strips the BOM is conforming, so BOM cannot live in must-reject.

Vectors numbered 10+ to avoid collision with the §1-decisions PR's
05–09 reject set (#2).

All nine vectors are confirmed rejected by gocsvj@master, which
delegates value parsing to encoding/json.
@AndrianBdn AndrianBdn merged commit 787f089 into master May 26, 2026
2 checks passed
@AndrianBdn AndrianBdn deleted the json-inherited-must-reject branch May 26, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants