New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments on 'Writing Data' #10

Open
BillMills opened this Issue Mar 6, 2016 · 3 comments

Comments

Projects
None yet
2 participants
@BillMills
Owner

BillMills commented Mar 6, 2016

Let me know your thoughts on my Writing Data post here.

@khinsen

This comment has been minimized.

Show comment
Hide comment
@khinsen

khinsen Mar 10, 2016

A really nice overview!

I'd add a few sentences about XML to the "hiearchical data" section. Depending on which field of science you are in, XML can be much more important than JSON, so it deserves to be mentioned if only to point out that it's basically the same idea.

I also miss the good general advice to use community-standard formats as much as possible but not more. In other words, if you can't use an existing format without torturing it (very good reminder at the end of your post), then better make up your own.

khinsen commented Mar 10, 2016

A really nice overview!

I'd add a few sentences about XML to the "hiearchical data" section. Depending on which field of science you are in, XML can be much more important than JSON, so it deserves to be mentioned if only to point out that it's basically the same idea.

I also miss the good general advice to use community-standard formats as much as possible but not more. In other words, if you can't use an existing format without torturing it (very good reminder at the end of your post), then better make up your own.

@BillMills

This comment has been minimized.

Show comment
Hide comment
@BillMills

BillMills Mar 10, 2016

Owner

@khinsen thanks for the comment! re: tortuous standards, I think it depends on the torture :) - if there's something really pathological about the standard, then you're right - time to leave the mistakes of the past in the past. But, sometimes excruciating data formats (see character stanzas) can be obfuscated behind good tooling; taking the time to write a nice, stand alone packer/unpacker is (a) something you'll mostly have to do anyway to consume the data you got from your colleagues, and (b) something that might help the standard be less excruciating in future. All that said, I only cling so fiercely to community standards because it is so darn hard to get people to agree standards in the first place; if that mountain of diplomacy has already been scaled, it might be worth a bit of pain to keep the kingdom from fragmenting again.

Owner

BillMills commented Mar 10, 2016

@khinsen thanks for the comment! re: tortuous standards, I think it depends on the torture :) - if there's something really pathological about the standard, then you're right - time to leave the mistakes of the past in the past. But, sometimes excruciating data formats (see character stanzas) can be obfuscated behind good tooling; taking the time to write a nice, stand alone packer/unpacker is (a) something you'll mostly have to do anyway to consume the data you got from your colleagues, and (b) something that might help the standard be less excruciating in future. All that said, I only cling so fiercely to community standards because it is so darn hard to get people to agree standards in the first place; if that mountain of diplomacy has already been scaled, it might be worth a bit of pain to keep the kingdom from fragmenting again.

@khinsen

This comment has been minimized.

Show comment
Hide comment
@khinsen

khinsen Mar 11, 2016

@BillMills I fully agree with the intention of maintaining good community standard. What I meant by "torture" is the kind of misuse that actually causes technical fragmentation, although superficially a single standard is maintained. My favorite example is the PDB file format. It looks like a well-defined standard, and plenty of programs claim to support it, but interoperability is low except for the most basic use cases. One reason for the fragmentation is that the old format from the 1970s was becoming insufficient, so people worked around its limitations. But the main reason was that few software developers even cared to look at its definition. People just took a few examples and wrote a subroutine that produced something that looked similar.

The PDB format is also an interesting case study because of its unintended longevity. Its inventor, the PDB designed and published a replacement (mmCIF) more than 15 years ago, because the limitations already were a problem. But the definitive transition to the new format was announced only last year. The reason was an enormous resistance to the change, because of the many programs that needed fixing, many of them being no longer actively maintained.

khinsen commented Mar 11, 2016

@BillMills I fully agree with the intention of maintaining good community standard. What I meant by "torture" is the kind of misuse that actually causes technical fragmentation, although superficially a single standard is maintained. My favorite example is the PDB file format. It looks like a well-defined standard, and plenty of programs claim to support it, but interoperability is low except for the most basic use cases. One reason for the fragmentation is that the old format from the 1970s was becoming insufficient, so people worked around its limitations. But the main reason was that few software developers even cared to look at its definition. People just took a few examples and wrote a subroutine that produced something that looked similar.

The PDB format is also an interesting case study because of its unintended longevity. Its inventor, the PDB designed and published a replacement (mmCIF) more than 15 years ago, because the limitations already were a problem. But the definitive transition to the new format was announced only last year. The reason was an enormous resistance to the change, because of the many programs that needed fixing, many of them being no longer actively maintained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment