Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSV for arbitrary data #5

Closed
philandstuff opened this issue Dec 14, 2015 · 7 comments
Closed

TSV for arbitrary data #5

philandstuff opened this issue Dec 14, 2015 · 7 comments

Comments

@philandstuff
Copy link

The IANA definition for tsv is great but somewhat limited. In particular it says:

Note that fields that contain tabs are not allowable in this encoding.

One of the benefits of TSV is that it is sufficiently simple that it can be processed by naive command-line utilities: splitting a TSV line on a tab character is more robust than splitting a CSV line on a comma, because CSV has all sorts of quoting rules.

However we may sometimes have a need to actually render data that contains tabs, newlines, and other such things. Is there a good way of doing this?

@philandstuff
Copy link
Author

linear tsv looks like an interesting piece in this area. In particular, the linear tsv motivation captures reasons why CSV is not an appropriate solution to this problem.

@torgo
Copy link
Contributor

torgo commented Dec 14, 2015

cf https://github.com/w3c/csvw Would it be useful to contribute our use cases into the CSV on the web activity? Do we need a "TSV on the web" activity?

@rossjones
Copy link
Contributor

I think the CSV-on-the-web WG leans quite heavily towards turning CSV into RDF Tuples and sometimes JSON, so it may not suit the 'processed by naive command-line utilities' use-case.

Linear TSV looks very interesting, but I wonder whether it would break non-naive tools (such as Python's CSV module) that do know how to work with TSV but wouldn't know to interpret the \t escape codes in the output (as they would be 2 individual characters, not a representation of 0x09).

x.txt

\tvalue

python repl

>>> f = open('x.txt', 'r')
>>> f.read()
'\\tvalue\n'
>>> f[0]
'\\'

@rufuspollock
Copy link

There is also http://dataprotocols.org/tabular-data-package which recommends simple CSV with some additional metadata. It was the basis for the W3C CSV effort but more generic (e.g. Data Package can wrap arbitrary data) and, I would say, is simpler and easier to use.

@rossjones
Copy link
Contributor

I'd definitely prefer to use something like tabular-data-package that isn't laden with RDF baggage, and has already seen some use in the real world.

@rgrp Do you have a feeling about how it might fit the original use-case above for working with it in a command-line environment?

@philandstuff
Copy link
Author

I've had a look at csvw's published documents tabular data model and metadata vocabulary for tabular data and it looks like it's not possible to use these standards to describe linear tsv, or more generally any tabular data model which uses escaping but no quoting. In particular, the dialect descriptions section of the metadata vocabulary explicitly states that if you have no quote character, then you also have no escape character. There is no possibility to use an escape character to escape newlines and other whitespace directly, without using a quote character for anything.

@Lawrence-G Lawrence-G added question and removed Web labels Mar 7, 2017
@edent
Copy link
Contributor

edent commented May 19, 2017

Superseded by #40

@edent edent closed this as completed May 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants