Add JSON-inherited must-reject vectors#3
Merged
Conversation
CSVJ values are JSON values per RFC 8259 (spec §1, foundation rule). Add nine vectors covering token-level RFC 8259 errors that CSV producers sometimes emit but that conforming CSVJ readers must reject: - 10-leading-zeros: 0123 — JSON forbids leading zeros on integers - 11-bare-dot-number: .5 — JSON requires an integer part - 12-trailing-dot-number: 1. — JSON requires a fractional digit - 13-nan: NaN — not a JSON value - 14-infinity: Infinity — not a JSON value - 15-uppercase-true: True — JSON keywords are lowercase only - 16-uppercase-null: Null — JSON keywords are lowercase only - 17-single-quoted-string: 'hello' — JSON strings require double quotes - 18-unescaped-control-char: literal U+0001 inside string — RFC 8259 §7 forbids unescaped U+0000–U+001F in strings BOM-at-file-start from PLAN.md's list is intentionally NOT included: §1's foundation decision adopts RFC 8259 §8.1, which says parsers MAY ignore a leading BOM rather than treat it as an error. A reader that strips the BOM is conforming, so BOM cannot live in must-reject. Vectors numbered 10+ to avoid collision with the §1-decisions PR's 05–09 reject set (#2). All nine vectors are confirmed rejected by gocsvj@master, which delegates value parsing to encoding/json.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds nine
must-reject/vectors covering token-level RFC 8259 errors thatreal CSV producers sometimes emit. Per §1's foundation decision, CSVJ values
are JSON values per RFC 8259, so any input that violates JSON tokenization
must be rejected by a conforming CSVJ reader.
10-leading-zeros.csvj0123— leading zeros forbidden on integers11-bare-dot-number.csvj.5— integer part required12-trailing-dot-number.csvj1.— fractional digit required13-nan.csvjNaN— not a JSON value14-infinity.csvjInfinity— not a JSON value15-uppercase-true.csvjTrue— keywords are lowercase only16-uppercase-null.csvjNull— keywords are lowercase only17-single-quoted-string.csvj'hello'— strings require double quotes18-unescaped-control-char.csvjU+0001inside a string — RFC 8259 §7 forbids unescapedU+0000–U+001FWhy BOM is not in this set
PLAN.md's task list mentions "BOM at file start" but §1 explicitly adopts
RFC 8259 §8.1's BOM rule: writers
MUST NOTadd a BOM, but parsersMAYignore one. A parser that strips a leading BOM is conforming, so BOM at
file start cannot live in
must-reject/. (gocsvj happens to strip BOMvia
strings.TrimSpace, which is also conforming.)Numbering
These vectors are numbered 10+ to avoid collision with #2's §1-decisions
reject set (05–09).
Reference impl
All nine vectors are confirmed rejected by
gocsvj@master, which delegatesvalue tokenization to
encoding/json. CI on this branch should land greenwithout needing entries in #2's
referencePendingskip-list.Test plan
go test ./...passes locally with the new vectors