The one CSV format to rule them all - simple, expressive, precise.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
CNAME
LICENSE
README.md
_config.yml
csvjson-line.png
json-array.png

README.md

Introducing CSVJSON

The comma separated values (CSV) format is a simple textual data format that has been used for exchanging tabluar data between programs and systems for many years. There is even an RFC attempting to standardize this format - RFC 4180.

Still, the CSV format suffers from major shotcomings which make it hard to use reliably across programs and systems even today. These shotcommings include:

  • Character encoding of string values is not specified
  • There is no standard way to specify null values
  • There are no standard escaping rules, in particular for values containing newlines
  • There is no way to have object values, array values

All of that and more is solved by a trivially specified format CSVJSON

The CSVJSON format is based on the well-known and highly popular JSON format. For 'simple' data, CSVJSON is even compatible with the regular CSV format.

The definition of the CSVJSON format is trivial. Given the definition of a JSON array:

Reference to the JSON array definition

Each line in a CSVJSON file is defined as follows:

The JSON array definition without left bracket and with newline replacing right bracket

That is, each line in a CSVJSON file is actually just like a JSON array, stripped of the openning left square bracket ([) and with a newline replacing the closing right bracket (]). Newline may only appear after the last value while non-new-line whitespaces around values are ignored. Lines with nothing but whitespaces are ignored.

Samples

Below are examples of valid CSVJSON content

Regular CSV

1,"John","12 Totem Rd. Aspen",true
2,"Bob",null,false
3,"Sue","Bigsby, 345 Carnival, WA 23009",false

CSV with headers row

"id","name","address","regular"
1,"John","12 Totem Rd. Aspen",true
2,"Bob",null,false
3,"Sue","Bigsby, 345 Carnival, WA 23009",false

CSV with data containing quotes and commas

"id","name","address","regular"
1,"John","12 Totem Rd., Aspen",true
2,"Bob",null,false
3,"Sue","\"Bigsby\", 345 Carnival, WA 23009",false

CSV with complex headers

{"field":"id","type":"int"},{"field":"name","type":"string"},{"field":"address","type":"string"},{"field":"regular","type":"boolean"}
1,"John","12 Totem Rd. Aspen",true
2,"Bob",null,false
3,"Sue","Bigsby, 345 Carnival, WA 23009",false

CSV with array data

1,"directions",["north","south","east","west"]
2,"colors",["red","green","blue"]
3,"drinks",["soda","water","tea","coffe"]
4,"spells",[]

CSV with all kinds of data

"index","value1","value2"
"number",1,2
"boolean",false,true
"null",null,"non null"
"array of numbers",[1],[1,2]
"simple object",{"a": 1},{"a":1, "b":2}
"array with mixed objects",[1,null,"ball"],[2,{"a": 10, "b": 20},"cube"]
"string with quotes","a\"b","alert(\"Hi!\")"
"string with bell&newlines","bell is \u0007","multi\nline\ntext"

Use Cases

Common textual database table export format

CSVJSON is an ideal format for exporting database tables to text files. Here are some benefits:

  1. Being based on UTF-8 it can reliably maintain text from different languages.
  2. It has a standard concept of nulls.
  3. It can deal with modern database features like rows with embedded objects and arrays.
  4. Being based on JSON, there is large variety of high quality formatters and parsers in virtually every programming language.
  5. The CSVJSON header line can be used to store each column definition like: {"name":"ID","type":"number"},{"name":"TITLE","type":"string"}}

Questions and Answers

How is CSVJSON related to JSON Lines JSON Lines.
The JSON Lines format is a textual format for tabular data where each line is a valid JSON value. The CSVJSON format is a textual format for tabular data where each line has zero or more valid JSON values separated by commas. One can view JSON Lines as a special case of CSVJSON. CSVJSON has the benefit over JSON Lines in that for numbers and 'simple' strings, it aligns with the regular CSV format.
What is the recommended file type extension for a CSVJSON file?
The recommended file format for CSVJSON files is .csvj although one can also use the .csv file extension because a valid CSVJSON file is also a comma-seperated-values file (with JSON rules).
Can CSVJSON data contain complex JSON objects?
Sure. As long as the included JSON object does not have newlines. Note that when including JSON objects in a CSVJSON line, there is little chance a regular CSV parser would be able to read it.
Is there a recommended mime type for CSVJSON?
The recommended file format for CSVJSON is text/csvjson. Note that this mime type is not yet formally registered.
How are newlines defined?
Both \n and \r\n can serve as newlines meaning the format should work regardless of whether the file was generated on a Windows or a Linux/Unix based system.
Is there actually software out there that support CSVJSON?
I am not currently aware of parsers/serializers that support the CSVJSON variant - as such parsers, serializers and software tools supporting CSVJSON appear, I will list them below. If you have created or are aware of such software, please open an issue on Github issues.
What configuration items may be expected in CSVJSON parsers?
The CSVJSON is well defined without complex parsing instructions a typical CSV parser needs. However it would be useful to have several common CSVJSON-related configuration options:
  • csvjson={true|false} - for a general CSV parser, this option will configure it to parse CSVJSON input
  • withObjects={true|false} - parsing JSON object and array values in a CSVJSON file can complicate the parser. For high-performance parsers where there is no need for parsing objects or array values, this option can tell the parser it does not need to deal with that additional complexity, or otherwise let the parser throw an error in case it does not support parsing objects or arrays but it is still asked to.
  • withHeader={true|false} - indicate whether the first line contain comma-seperated column names

Software that supports CSVJSON

CSVJSON is more expressive than CSV (whose common use is documented by RFC-4180). As a result, there are many cases where products and libraries that can read CSV would fail to read CSVJSON due, for example, to escaping rules and embedded objects. Given CSVJSON's simplicity and utility more tools and libraries will support it over time.

csvjson.com - Online tools to convert popular data formats
The csvjson.com web application offers conversions between JSON and CSV, and offers support for the CSVJSON format as well. Thanks Martin Drapeau.