New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TSV for arbitrary data #5
Comments
linear tsv looks like an interesting piece in this area. In particular, the linear tsv motivation captures reasons why CSV is not an appropriate solution to this problem. |
cf https://github.com/w3c/csvw Would it be useful to contribute our use cases into the CSV on the web activity? Do we need a "TSV on the web" activity? |
I think the CSV-on-the-web WG leans quite heavily towards turning CSV into RDF Tuples and sometimes JSON, so it may not suit the 'processed by naive command-line utilities' use-case. Linear TSV looks very interesting, but I wonder whether it would break non-naive tools (such as Python's CSV module) that do know how to work with TSV but wouldn't know to interpret the \t escape codes in the output (as they would be 2 individual characters, not a representation of 0x09). x.txt
python repl >>> f = open('x.txt', 'r')
>>> f.read()
'\\tvalue\n'
>>> f[0]
'\\' |
There is also http://dataprotocols.org/tabular-data-package which recommends simple CSV with some additional metadata. It was the basis for the W3C CSV effort but more generic (e.g. Data Package can wrap arbitrary data) and, I would say, is simpler and easier to use. |
I'd definitely prefer to use something like tabular-data-package that isn't laden with RDF baggage, and has already seen some use in the real world. @rgrp Do you have a feeling about how it might fit the original use-case above for working with it in a command-line environment? |
I've had a look at csvw's published documents tabular data model and metadata vocabulary for tabular data and it looks like it's not possible to use these standards to describe linear tsv, or more generally any tabular data model which uses escaping but no quoting. In particular, the dialect descriptions section of the metadata vocabulary explicitly states that if you have no quote character, then you also have no escape character. There is no possibility to use an escape character to escape newlines and other whitespace directly, without using a quote character for anything. |
Superseded by #40 |
The IANA definition for tsv is great but somewhat limited. In particular it says:
One of the benefits of TSV is that it is sufficiently simple that it can be processed by naive command-line utilities: splitting a TSV line on a tab character is more robust than splitting a CSV line on a comma, because CSV has all sorts of quoting rules.
However we may sometimes have a need to actually render data that contains tabs, newlines, and other such things. Is there a good way of doing this?
The text was updated successfully, but these errors were encountered: