New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "strict mode" for RFC7159 parsing #646

Open
jskeet opened this Issue Sep 10, 2015 · 16 comments

Comments

Projects
None yet
8 participants
@jskeet

jskeet commented Sep 10, 2015

Currently Json.NET parses in a more lenient manner than RFC7159. For example:

{
  "foo": Infinity
}

... will be parsed by Json.NET, but isn't valid under RFC7159.

I'm sure that's generally useful, but it makes it harder to build on Json.NET when strict parsing is required. Other features such as date detection can be disabled already, but I don't think there's currently any way of disabling this leniency.

This is only one example - there may well be others that James is aware of, given that he's rather more deeply "in" this world than I am :) (Comments, possibly the quote characters used, etc.)

@JamesNK

This comment has been minimized.

Show comment
Hide comment
@JamesNK

JamesNK Sep 10, 2015

Owner

What you could do is inherit from JsonTextReader, override Read and check the incoming data after each base.Read(). If QuoteChar is ' then throw an error, if TokenType is a Float and Value is double.Infinity or double.NaN then throw an error, if TokenType is Comment then throw an error, etc.

Owner

JamesNK commented Sep 10, 2015

What you could do is inherit from JsonTextReader, override Read and check the incoming data after each base.Read(). If QuoteChar is ' then throw an error, if TokenType is a Float and Value is double.Infinity or double.NaN then throw an error, if TokenType is Comment then throw an error, etc.

@jskeet

This comment has been minimized.

Show comment
Hide comment
@jskeet

jskeet Sep 10, 2015

Thanks, that's definitely a practical workaround. I think this would be generally useful functionality to be in Json.NET itself at some point, but I'll give that a try for now. (I'll try to identify the ways in which Json.NET is more lenient than the RFC, and note what I find here for future readers...)

jskeet commented Sep 10, 2015

Thanks, that's definitely a practical workaround. I think this would be generally useful functionality to be in Json.NET itself at some point, but I'll give that a try for now. (I'll try to identify the ways in which Json.NET is more lenient than the RFC, and note what I find here for future readers...)

@JamesNK JamesNK closed this Nov 28, 2015

@CyrusNajmabadi

This comment has been minimized.

Show comment
Hide comment
@CyrusNajmabadi

CyrusNajmabadi Jan 9, 2018

(I'll try to identify the ways in which Json.NET is more lenient than the RFC, and note what I find here for future readers...

@jskeet I'm adding JSON editor support for Roslyn over here: dotnet/roslyn#24110

I've been using json.net to base how parsing will work (as that's likely the json parser that most of them wil be using in practice). However, i think it would be nice to allow a strict mode as well. There are very obvious ways that json.net deviates from standard JSON. But i'm concerned i may miss other cases. Do you happen to have the list that identifies all the ways Json.Net is more lenient? That would be very helpful for ensuring i don't miss anything.

CyrusNajmabadi commented Jan 9, 2018

(I'll try to identify the ways in which Json.NET is more lenient than the RFC, and note what I find here for future readers...

@jskeet I'm adding JSON editor support for Roslyn over here: dotnet/roslyn#24110

I've been using json.net to base how parsing will work (as that's likely the json parser that most of them wil be using in practice). However, i think it would be nice to allow a strict mode as well. There are very obvious ways that json.net deviates from standard JSON. But i'm concerned i may miss other cases. Do you happen to have the list that identifies all the ways Json.Net is more lenient? That would be very helpful for ensuring i don't miss anything.

@JamesNK

This comment has been minimized.

Show comment
Hide comment
@JamesNK

JamesNK Jan 9, 2018

Owner

Off the top of my head:

  • Properties with single quotes instead of double quotes
  • Properties without quotes
  • NaN, Infinity, -Infinity
  • JavaScript constructors, e.g. new DateTime(12233434)
  • Single and multi-line comments
  • Trailing commas
  • Empty commas, e.g. [,,,]
  • Octal numbers (number starts with a zero)
  • Hexadecimal numbers (I think...)
  • New lines in strings
  • undefined

Probably the most commonly used "invalid" JSON would be properties with single quotes or no quotes, single and multi-line comments, new lines in strings, and trailing commas.

Owner

JamesNK commented Jan 9, 2018

Off the top of my head:

  • Properties with single quotes instead of double quotes
  • Properties without quotes
  • NaN, Infinity, -Infinity
  • JavaScript constructors, e.g. new DateTime(12233434)
  • Single and multi-line comments
  • Trailing commas
  • Empty commas, e.g. [,,,]
  • Octal numbers (number starts with a zero)
  • Hexadecimal numbers (I think...)
  • New lines in strings
  • undefined

Probably the most commonly used "invalid" JSON would be properties with single quotes or no quotes, single and multi-line comments, new lines in strings, and trailing commas.

@CyrusNajmabadi

This comment has been minimized.

Show comment
Hide comment
@CyrusNajmabadi

CyrusNajmabadi Jan 10, 2018

@JamesNK Thanks for the list. Fortunately, reading the actual json spec was simple enough that i was able to find all that as well. Other things i can think of are leniency around numbers and whitespace. Also, there are stricter rules around what is allowed in a string and what needs to be escaped.

CyrusNajmabadi commented Jan 10, 2018

@JamesNK Thanks for the list. Fortunately, reading the actual json spec was simple enough that i was able to find all that as well. Other things i can think of are leniency around numbers and whitespace. Also, there are stricter rules around what is allowed in a string and what needs to be escaped.

@CyrusNajmabadi

This comment has been minimized.

Show comment
Hide comment
@CyrusNajmabadi

CyrusNajmabadi Jan 10, 2018

Interestingly enough, i haven't found an actual .net json parser that tries to match the spec precisely. I've found deviations in several (including MS's own DataContractJsonSerializer and JavaScriptSerializer).

@JamesNK
Note: i'm a big fan of json.net and how it's addressed this space (including accepting a superset of json). Do you think there's a place in json.net to have an option for more strict processing of json? If so, i'd be happy to contribute. But i don't know if such an option fits your vision for the project.

Thanks!

CyrusNajmabadi commented Jan 10, 2018

Interestingly enough, i haven't found an actual .net json parser that tries to match the spec precisely. I've found deviations in several (including MS's own DataContractJsonSerializer and JavaScriptSerializer).

@JamesNK
Note: i'm a big fan of json.net and how it's addressed this space (including accepting a superset of json). Do you think there's a place in json.net to have an option for more strict processing of json? If so, i'd be happy to contribute. But i don't know if such an option fits your vision for the project.

Thanks!

@CyrusNajmabadi

This comment has been minimized.

Show comment
Hide comment
@CyrusNajmabadi

CyrusNajmabadi Jan 10, 2018

Hexadecimal numbers (I think...)

Yup. Not allowed in json. Octal isn't allowed either. Leading zeros are also verboten. Fun stuff.

CyrusNajmabadi commented Jan 10, 2018

Hexadecimal numbers (I think...)

Yup. Not allowed in json. Octal isn't allowed either. Leading zeros are also verboten. Fun stuff.

@JamesNK

This comment has been minimized.

Show comment
Hide comment
@JamesNK

JamesNK Jan 11, 2018

Owner

Do you think there's a place in json.net to have an option for more strict processing of json?

I'm not sure. I get the feeling that if I add a strict mode as just a bool flag on JsonTextReader then people will start asking for more options. "I want to error on comments but still accept trailing commas", etc. I don't want to add lots of fields to JsonTextReader to configure how strict it should be.

Owner

JamesNK commented Jan 11, 2018

Do you think there's a place in json.net to have an option for more strict processing of json?

I'm not sure. I get the feeling that if I add a strict mode as just a bool flag on JsonTextReader then people will start asking for more options. "I want to error on comments but still accept trailing commas", etc. I don't want to add lots of fields to JsonTextReader to configure how strict it should be.

@imPRAGMA

This comment has been minimized.

Show comment
Hide comment
@imPRAGMA

imPRAGMA Jan 13, 2018

+1 This is needed. The reason case on my end is that I use a "Repository" for json files that the public can upload and download. PHP uses the proper RFC implementation so if the user makes a JSON that works fine with the C# tester application, they think there wont be any issues, but in fact, there will be when its trying to be parsed and used on my PHP upload script.

So a strict mode here, would be insanely useful to tell the user on my C# end that its not proper syntax, and they can google or lint check to figure out there issue. Otherwise they just say my PHP script is broken plz fix m8 bla bla annoying shit when in fact, its there JSON thats broken.

imPRAGMA commented Jan 13, 2018

+1 This is needed. The reason case on my end is that I use a "Repository" for json files that the public can upload and download. PHP uses the proper RFC implementation so if the user makes a JSON that works fine with the C# tester application, they think there wont be any issues, but in fact, there will be when its trying to be parsed and used on my PHP upload script.

So a strict mode here, would be insanely useful to tell the user on my C# end that its not proper syntax, and they can google or lint check to figure out there issue. Otherwise they just say my PHP script is broken plz fix m8 bla bla annoying shit when in fact, its there JSON thats broken.

@JonHanna

This comment has been minimized.

Show comment
Hide comment
@JonHanna

JonHanna Jan 13, 2018

Contributor

I get the feeling that if I add a strict mode as just a bool flag on JsonTextReader then people will start asking for more options.

I think there's a different dimension of change between "valid by the standard" and "valid by the standard except for in the following cases…" that makes this request different to a request like that.

What I could see happening though is a request for validity according to some RFC 8259bis that replaces RFC 8259 (indeed, the relevant RFC was 7159 when this issue was opened, though I think the only change in validity was that UTF-8 is required for interchange which is at a different level than parsing anyway) and/or a later version of ECMA 404 / ISO/IEC 21778 (presumably the three standards will continue to track each other, but the timings of versions could differ). It might be wise future-proofing to define validity through an enum that could have later not-fully-compatible standard versions added to it, but not to offer such validation for anything other than a published standard.

But a validating reader could also be built as a separate package.

Contributor

JonHanna commented Jan 13, 2018

I get the feeling that if I add a strict mode as just a bool flag on JsonTextReader then people will start asking for more options.

I think there's a different dimension of change between "valid by the standard" and "valid by the standard except for in the following cases…" that makes this request different to a request like that.

What I could see happening though is a request for validity according to some RFC 8259bis that replaces RFC 8259 (indeed, the relevant RFC was 7159 when this issue was opened, though I think the only change in validity was that UTF-8 is required for interchange which is at a different level than parsing anyway) and/or a later version of ECMA 404 / ISO/IEC 21778 (presumably the three standards will continue to track each other, but the timings of versions could differ). It might be wise future-proofing to define validity through an enum that could have later not-fully-compatible standard versions added to it, but not to offer such validation for anything other than a published standard.

But a validating reader could also be built as a separate package.

@imPRAGMA

This comment has been minimized.

Show comment
Hide comment
@imPRAGMA

imPRAGMA Jan 13, 2018

Just add a bool flag, honestly if people ask for more options, we can discuss if it will be required or not.

imPRAGMA commented Jan 13, 2018

Just add a bool flag, honestly if people ask for more options, we can discuss if it will be required or not.

@TylerBrinkley

This comment has been minimized.

Show comment
Hide comment
@TylerBrinkley

TylerBrinkley Jan 13, 2018

Contributor

You could add a flag enum to represent the allowable deviations. If you make it int based it would support up to 32 different options but 64 different options if you make it long based.

Contributor

TylerBrinkley commented Jan 13, 2018

You could add a flag enum to represent the allowable deviations. If you make it int based it would support up to 32 different options but 64 different options if you make it long based.

@CyrusNajmabadi

This comment has been minimized.

Show comment
Hide comment
@CyrusNajmabadi

CyrusNajmabadi Jan 14, 2018

Personally, i'd have no problem just adding a strict mode without any flexibility. If you want any form of looseness, you don't ask for strict. If you want strict, you get it. But i'm not the project owner :)

CyrusNajmabadi commented Jan 14, 2018

Personally, i'd have no problem just adding a strict mode without any flexibility. If you want any form of looseness, you don't ask for strict. If you want strict, you get it. But i'm not the project owner :)

@sixlettervariables

This comment has been minimized.

Show comment
Hide comment
@sixlettervariables

sixlettervariables Feb 9, 2018

@CyrusNajmabadi Manatee.Json only supports strict mode (I believe).

sixlettervariables commented Feb 9, 2018

@CyrusNajmabadi Manatee.Json only supports strict mode (I believe).

@elgonzo

This comment has been minimized.

Show comment
Hide comment
@elgonzo

elgonzo Mar 5, 2018

I'm not sure. I get the feeling that if I add a strict mode as just a bool flag on JsonTextReader then people will start asking for more options. "I want to error on comments but still accept trailing commas", etc. I don't want to add lots of fields to JsonTextReader to configure how strict it should be.

You don't need to add lots of fields just because someone is asking. ;)

Introducing a option for enabling strict mode would help many users. It makes sense, as it allows users to read and validate RFC-conform Json data without the hassle of building and maintaining unwieldy JsonTextReader.Read() overrides.

For anyone else wanting an individual blend of a little strictness here and a little looseness there, just refer them to your comment from Sept. 2015:

What you could do is inherit from JsonTextReader, override Read and check the incoming data after each base.Read().

elgonzo commented Mar 5, 2018

I'm not sure. I get the feeling that if I add a strict mode as just a bool flag on JsonTextReader then people will start asking for more options. "I want to error on comments but still accept trailing commas", etc. I don't want to add lots of fields to JsonTextReader to configure how strict it should be.

You don't need to add lots of fields just because someone is asking. ;)

Introducing a option for enabling strict mode would help many users. It makes sense, as it allows users to read and validate RFC-conform Json data without the hassle of building and maintaining unwieldy JsonTextReader.Read() overrides.

For anyone else wanting an individual blend of a little strictness here and a little looseness there, just refer them to your comment from Sept. 2015:

What you could do is inherit from JsonTextReader, override Read and check the incoming data after each base.Read().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment