Add a decoding flag to get all numbers as strings #10

Open
akheron opened this Issue Dec 20, 2010 · 11 comments

Comments

Projects
None yet
3 participants
Owner

akheron commented Dec 20, 2010

All numbers or only overflowing numbers, or maybe two separate flags?

keenerd commented Sep 1, 2013

Bump.

I've gotten a bug report in my program (jshon) from this one. For example

# test case
jshon -e test <<< '{"test":200.123456789}'
# ver >= 2.1 (JSON_ENCODE_ANY)
200.12345678899999
# ver < 2.1
200.123457

Neither of these outcomes are good.

Owner

akheron commented Sep 2, 2013

So you'd like to decode the value as a string? Or is the imprecise decimal representation the problem?

keenerd commented Sep 2, 2013

I would like to skip decoding altogether and access the original string. The imprecise decimal notation is only a symptom of why it is needed. Jshon does not do any math - it is only supposed to extract and spit out chunks of json. (Arguably one could attach a precision field to the json_t struct and dig this value out when it comes time to printf and generate a printf format string on the fly. But that seems silly.)

Owner

akheron commented Sep 2, 2013

Ok. But this decoding flag would give you a string, i.e. "200.123456789", not a special value that's a number but whose contained value is a string. So you would lose the information that the value in JSON input was a number.

keenerd commented Sep 2, 2013

That seems not good. And breaks a chunk of the json spec, since the type information would be screwed up. Two ideas of the top of my head...

Two more int fields in the structs. These int fields would contain the index and length of substring from the original input that the decoded value was derived from. Probably accessed by a json_get_source() function? Hacky, but less hacky than denying that numbers exist at all.

A new non-numerical number type. It would be neither int nor float, just javascript-style "number". Internally represented as a string, up to the user to decode however they want. This also makes the people who want a single JS-style numerical type happy. Probably enabled by a parse flag?

Owner

akheron commented Sep 2, 2013

Adding a precision field to the json_real_t struct sounds the best option this far, only used by the decoder and encoder (if set).

keenerd commented Sep 2, 2013

Darn. I was hoping for a cleaner fix, something that did not involve throwing more patches onto a string -> number -> string conversion, by way of using sprintf to dynamically generate format string for printf. Yuck.

To that end, I've got half of the code put together for a new SNUMBER type at https://github.com/keenerd/jansson

Only the code for basic loading and dumping has been written. It is missing the almost all the helper functions (setting, deleting, etc) and is a little kludgy because the parse flags are not available in the lexing stage (can't have a TOKEN_SNUMBER). It does build, but I have not been able to test it properly because cmake refuses to build shared .so libraries. Figuring out how to make cmake build these was more complicated that writing the prototype SNUMBER code.

Owner

akheron commented Sep 3, 2013

Introducing a new type for this use case doesn't sound so good. The ultimately correct fix would be to replace sprintf("%.17g") with the algorithm in David Gay's dtoa.c or similar.

keenerd commented Jan 31, 2016

For an example of how another library does this: yajl-tree simply stores the original string and lets you access it directly: https://github.com/lloyd/yajl/blob/master/src/api/yajl_tree.h#L81

Owner

akheron commented Feb 1, 2016

Yeah, this is also an option. But allocating the extra memory to store input strings of every number doesn't sound good for people using this on embedded devices. It could of course be enabled by a decoding flag to make it optional.

Would if be beneficial to do this also for strings? They have many possible input forms because any Unicode code point can be escaped with \u or represented directly in UTF-8.

Because of the decimal precision issue we made a change to our copy of Jansson to allow all numbers to be treated as strings. This might not be efficient for many use cases but for ours it's actually more efficient. In brief:

  • New flag JSON_DECODE_NUMBER_AS_STRING causes json_load* to decode all numbers as strings.
  • New int field in json_string_t to indicate if a string is a number. Set to true when decoding if the source field is a number. When encoding, if true json_dump* will encode the string as a number (no quotes).
  • New function json_string_is_number() returns true if a string is number.
  • New function json_string_set_is_number() to set the "string is a number" indicator.

If there's interest in adding this to Jansson let me know and I'll create a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment