Language-agnostic conformance test vectors for the CSVJ format.
An implementation claims CSVJ conformance by accepting every input under
inputs/ and reproducing the matching JSON shape under expected/, and by
rejecting every input under must-reject/. Vectors are deliberately
language-agnostic — bytes on disk, no harness assumed.
inputs/<case>.csvj bytes the parser MUST accept
expected/<case>.json parsed result as JSON: array-of-arrays, header row first
must-reject/<case>.csvj bytes the parser MUST reject (parse error)
For every inputs/<case>.csvj there is a expected/<case>.json with the same
stem. The expected JSON encodes the parsed table as an array of rows, where
each row is an array of cell values. The first inner array is the header row
(strings). Subsequent rows hold values per the CSVJ value grammar (string,
number, true, false, null).
must-reject/ cases have no companion file — the only assertion is "parser
fails."
The repo ships a Go runner (under conformance_test.go) that exercises every
vector against the reference Go implementation
(gocsvj). CI runs it on every push and
PR.
go test ./...
Implementers in other languages should run the same vectors using their own parser. The reference flow is:
- For each
inputs/<case>.csvj, parse the file; serialize the parsed table as a JSON array-of-arrays; compare byte-for-semantic-equivalence toexpected/<case>.json. (JSON whitespace/key-order do not matter; values and structure do.) - For each
must-reject/<case>.csvj, parse the file; the parse MUST fail.
The vectors are normative; the Go runner is a tool. Where the reference
implementation does not yet enforce a spec rule, the corresponding vectors
are listed in the referencePending map at the top of conformance_test.go
and skipped with a reason. The map is the inventory of remaining gocsvj
gaps — retiring an entry requires the matching reader/writer fix in
gocsvj. Other implementations are expected to pass every vector
regardless of this list.
Each vector should target one observable behavior. Name files with a
two-digit prefix plus a short hyphen-separated slug describing what's being
exercised (07-ragged-row-short.csvj). Keep input files minimal — one row of
data is usually enough.
Open questions, spec ambiguities, or anything that would require a normative decision belong on csvj.org (the spec) first, not here.
MIT. See LICENSE.