Parsing JSON which isn't conforming to RFC #100

Closed
akamaus opened this Issue Dec 12, 2012 · 14 comments

Comments

Projects
None yet
6 participants

akamaus commented Dec 12, 2012

There is a bunch of places on the web there you can find JSON which deviate from the standard.
A rather typical example would be { a: 42, b: 'hello world' }

I'd like to see Aeson's parser relaxed to cover such bad examples. Your thoughts?

Contributor

nikita-volkov commented Dec 12, 2012

+1

Owner

bos commented Dec 17, 2012

Is the problem you are describing the fact that the keys don't have quotes around them?

akamaus commented Dec 18, 2012

Yes. Also, string is enclosed in single quotes not, double.

I'd like this to not be the default. I wasn't aware of the strict RFC requirements and aeson refusing to accept a few of them made my app RFC-compliant.

Please make this an option somehow so that we can turn it on for legacy stuff, but have the strict compliance for code we write ourselves.

An alternative could be to provide a clean-up function:

jsonishToJson :: ByteString -> ByteString

Contributor

nikita-volkov commented Feb 21, 2013

@alexanderkjeldaas Why would anyone want the application to break when it can safely and unambiguously solve the problem? There is no doubt about whether Aeson should render to strict RFC, but the more standards it's able to parse without any performance penalties - it's only the better.

My vote is: update the parser without introducing any new functions in the API and leave the renderer as it is.

Owner

bos commented Feb 21, 2013

I might be willing to consider this if you can tell me a bit more about these broken JSON generators. Presumably since they're generating bad JSON in the first place, they could generate keys that contain embedded colon or quote characters? It would be helpful if you had some examples to indicate exactly what kind of brokenness we should be considering.

@nikita-volkov In my case I was generating invalid json on the client side.

Contributor

nikita-volkov commented Feb 21, 2013

@bos
You can't expect to get a valid result from parsing an arbitrary HTML page with a strict XML parser. IMO the same applies here: we can't rely on specific generators and standards, because for most use-cases the JSON input comes from network, where it may very well be generated with custom libraries and even typed in manually by a human, so you simply can't trust it. Therefore I believe we just need to find a middleground where we could extend the parser to accept maximum standards without making any radical sacrifices in its performance, and I think I have an idea.

Most problems of parsing JSON stem from the fact that people apply the more relaxed JavaScript syntax rules to it, which simply boil down to the following:

  1. Single quotation marks can be used to specify string values, e.g.: 'a\'b"c' is the same as "a'b\"c"
  2. The same applies to object keys
  3. Object keys which make up valid JavaScript identifiers may be specified without quotation marks, e.g. {a_21: 0} is valid, {a-21: 0} is invalid

That's it. We get simple and standard rules to follow - i.e., JavaScript syntax. And I don't think that these changes may introduce any noticeable parser performance reduction.

P.S. Need examples? Here's a partial example of crap I get from a quite serious player - www.vk.com:

{"all":[['126257070','150084772','http://cs1-4.userapi.com/d33/859470469ef948.mp3','221','3:41','Michel Teló','Bara Bará Bere Berê','24208242','26444338','0','1','0',''],['30439310','169406762','http://cs1-34.userapi.com/d7/ac07b26bf0acb6.mp3','289','4:49','Adele','Skyfall (OST 007: Координаты «Скайфолл»)'

@nikita-volkov You have a good point and parsing that broken json is important.
Maybe it should be the default mode.

I just want to be able to access a strict mode that can yell when I generate borken json myself.

Here is a good list of JSON deviations supported by Jackson
http://wiki.fasterxml.com/JacksonFeaturesNonStandard

And the philosophy
http://www.cowtowncoder.com/blog/archives/2009/08/entry_310.html

Collaborator

hvr commented Mar 30, 2013

@nikita-volkov

I've tried feeding the relaxed JSON you described to the following parsers

and both didn't accept it either; so even the popular Python and Ruby platforms don't support "relaxed JSON" out-of-the-box...

However, I'm with @alexanderkjeldaas that should support for relaxed JSON be implemented in Aeson, we'd also want to have a RFC4627-compliant strict mode, for those applications were we want to enforce RFC-compliance...

akamaus commented Apr 1, 2013

I think one should do the right way instead of imitating others. In reality sometimes you stumble upon a non-standard JSON snippets and have to do something. The last thing I want to do is to write my-own parser just because someone discovered his JS object can be fed to eval even without quoting the member names.

Owner

bos commented Apr 3, 2013

So here's my position on this.

IF:

  • someone else submits a patchset (I am not going to do the work myself);
  • and the code is well written and minimally intrusive;
  • and the patches do not try to make "accept any old crap" the default;
  • and the code doesn't affect performance of the default parsers;
  • and it's got test coverage;

then I'd be willing to review, and consider accepting, the patch(es).

Why take a somewhat hardish line like this? Because unless it's done carefully, handling junk input can make the code much less maintainable, and I care far more about my sanity than I do about data being generated by some clown who couldn't read the JSON spec.

Owner

bos commented Apr 11, 2014

Closing due to lack of activity.

bos closed this Apr 11, 2014

jsdw commented Nov 16, 2015

I know this issue is closed, but just my 2 pence; I am using Aeson's json parser directly to parse JSON in as arguments to commands in a CLI I'm writing. In this case, it's simply a little better for the users who are typing the JSON to not have to worry about quotes around keys when providing the relevant input.

This feature isn't super high on my current todo list but I may look to submit a patch when I get around to it if I can see a simple enough solution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment