Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOML vs YAML vs JSON #25

Closed
ysbaddaden opened this issue Sep 3, 2015 · 19 comments
Closed

TOML vs YAML vs JSON #25

ysbaddaden opened this issue Sep 3, 2015 · 19 comments

Comments

@ysbaddaden
Copy link
Contributor

Continuing from crystal-lang/crystal#1357 and crystal-lang/crystal#220 we may decide to use another format than YAML. I tried to summarize the different pros & cons:

  • YAML:
    • pro: stable standard + available in Crystal's stdlib
    • pro: simple to write/read manually
    • pro: canonical serialization for automated tooling
      • con: but likely to wreck the original formatting
    • pro: yaml_mapping is likely to happen
    • con: some strings must be quoted (eg: ~>)
    • con: format much more complex than required
  • TOML:
    • pro: used by Rust's Cargo
    • con: standard not quite stable + not in crystal's stdlib (thought there is crystal-toml)
    • con: format can be weird for the neophyte
    • con: toml_mapping is unlikely to happen
    • con: requires to quote all strings
  • JSON:
    • pro: simple enough format
    • pro: standard serialization for automated tooling (not likely to wreck original formatting)
    • pro: json_mapping
    • con: complex to write/read manually
@trans
Copy link

trans commented Sep 3, 2015

I think it helps to understand that YAML is a superset of JSON designed for human readability/writability. If we expect humans to read these files, and especially write these files, then it makes abundant sense to use YAML over JSON.

As for the additional complexity of YAML, most of that is hardly even known to the lay YAML user so it makes little difference to them. And the parser can (and should) parse in a safe mode which only recognizes the basic node types, which helps keep things simpler.

Plus, YAML has seen widespread adoption and I expect that only to grow in the future.

@jtarchie
Copy link

jtarchie commented Sep 4, 2015

I work on a rather large project where YAML is used for manifest files (well over 2000 lines). The difficulty with a data language being used for large configuration is validation.

A Ruby Gemfile benefits from the syntax checker, along with the runtime, to validate what you've written.

With YAML, you have to load up, validate it, then interrupt it. It can work, but if the argument is to be made for human readability/writability I will take code(Crystal) over data(YML).

I think it would be possible to solve the problem with Crystal with DSL.

@oprypin
Copy link
Member

oprypin commented Sep 5, 2015

Let's not tag YAML as "simple" and TOML as "weird" just because one is typical in Ruby's ecosystem and the other one isn't.

I am of the opposite opinion.

@ysbaddaden
Copy link
Contributor Author

Ruby can use ruby code, because its all dynamic, but we can't with crystal, which would require to compile every single file for every single version, which can quickly become a lot in conflict resolution.

@kostya
Copy link

kostya commented Sep 5, 2015

why you cannot write manifest in crystal. and when you release package, it converts to yml and packed with other sources in one gzip.

@ysbaddaden
Copy link
Contributor Author

That would be doable, but would require to build, host, sign and distribute archives, which we'd like to avoid by leveraging CVS like GIT.

It would also require to built into a canonical format... So why not use it directly?

@oprypin
Copy link
Member

oprypin commented Sep 5, 2015

Using Crystal for config just isn't an option (unless it's a specific tiny subset that can be parsed like a config file)

YAML works fine. The format scares me a bit; writing it by hand seems error-prone. But I don't think there are unarguably better options. The discussion of config format is mostly bikeshedding...

Supporting multiple formats is just obviously not a good idea.

@ysbaddaden
Copy link
Contributor Author

I'm closing this question. I think it was debated enough here, in crystal-lang/crystal#220 and crystal-lang/crystal#1357

I think YAML does the job, without too much burden. See SPEC for details the format. Please do not hesitate to open issues/pull requests if you have concerns or think some things need to be clarified.

@asterite
Copy link
Member

asterite commented Sep 9, 2015

@ysbaddaden 👍

@j8r
Copy link
Contributor

j8r commented Dec 4, 2018

Being unsatisfied too with the current status-co of JSON/YAML/TOML, I've created a new data serialization format: CON.
The specs are globally finalized, remains some details I'm unsure (like exponents and \u escapes).
It's more in-line with the language: not indentation-based, bracket arrays and brace hashes.

Independent of this, if a pure Crystal solution is adopted for data serialization, shards won't depend on external libraries like libyaml anymore.
3 years later, I understand if the format can't be changed.

@straight-shoota
Copy link
Member

YAML is totally fine. It's easy to read/write for human and most importantly has a large userbase and is supported by many languages and toolkits. I don't see any reasonable incentive to even consider something else.

@j8r
Copy link
Contributor

j8r commented Dec 4, 2018

A bit old know, but there are CVEs for libyaml.
Sure JSON/CON implemented in Crystal can have CVEs too, but less likely due to the language itself and their simple designs.

Edit: see also https://github.com/yaml/libyaml/issues?utf8=%E2%9C%93&q=stack+OR+leak

@jtarchie
Copy link

jtarchie commented Dec 4, 2018

To be fair, CVEs aren’t bad. It means someone found a problem and corrected it. The fact that one hasn’t been found in your custom format does not imply it is inheritly safer.

@j8r
Copy link
Contributor

j8r commented Dec 4, 2018

@jtarchie DoS are bad. Still, the attack surface is smaller when using stdlib's JSON vs LibYAML because the implementation is much smaller, furthermore Crystal is safer than C.

@j8r
Copy link
Contributor

j8r commented Dec 4, 2018

The main point is to get rid of LibYAML, because it's written in C and this prevent shards to be self-hosted - we need this library's dev package installed in our system. C is also unsafe by nature.
Solution are:

  • using an other serialization format implemented in Crystal like INI, JSON...
  • implementing a subset of YAML
  • implementing YAML in Crystal (good lock)

For me only the two first options can be doable, in a limited amount of time.

@straight-shoota
Copy link
Member

Why is getting rid of libyaml a goal? At least in the short term, I see no strong reason. Yes, it's written in C not Crystal, but who cares? It's a battle tested and widely used library, available almost anywhere.

Btw. implementing a YAML parser in Crystal shouldn't actually be too complicated. Prominent Rust and Go implementations are about 4-5k LoC. That's doable. It's just not the most important thing to do at the moment. We can live nicely with libyaml for now.

@jtarchie
Copy link

jtarchie commented Dec 5, 2018

@j8r, I'm not familiar with the evidence that crystal if safer than C. Can you provide any literature so that I can read up on it more? This will help educate me and others in the future.

@j8r
Copy link
Contributor

j8r commented Dec 5, 2018

@jtarchie see https://crystal-lang.org/docs/syntax_and_semantics/unsafe.html.
Crystal has safe abstractions, C doesn't.
Maybe I'm too Go-minded on this topic – avoid unnecessary complexities when possible, KISS - that's not really Crystal-minded which is full-featured 😅

@ysbaddaden
Copy link
Contributor Author

This issue is closed. shard.yml is now ubiquitous.

Please implement a pure Crystal YAML parser instead of pushing yet another format —however how fun and interesting it must have been to invent and implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants