Erroneously rounds long integer numbers #26

mpnovikova · 2015-10-05T15:53:44Z

SO that
{"organizerKey": 7161093205057351174} becomes {"organizerKey": 7161093205057352000}

creationix · 2015-10-05T16:26:34Z

I'm afraid this is a limitation of JavaScript itself. The number type only had 53 bits of precision total.

Try to simply type the number in a node repl or the browser console and you'll see the echoed back value is the same thing jsonparse gives you.

I would recommend storing the key as a string (maybe hex encoded to save space) if you wish to keep all the digits exactly.

mpnovikova · 2015-10-05T16:31:04Z

I am aware of those limitations in JavaScript. The problem is I am reading this data from the 3-rd party API Citrix to be precise. And this is how they give it to users. Would you be able to track the scenario when a sequence on numbers is longer than JavaScript can correctly parse as a Int and parse that sequence as a string?

creationix · 2015-10-05T17:20:54Z

It is possible to check the number after parsing and if it's greater than the highest possible 53-bit integer, I could return a string or something, but that would break for people expecting a number. There are many cases where an imprecise number is better than a precise string.

Do you feel confident enough to add the check to the number parser? We could maybe add it as a flag so that you could opt-in to the new behavior? The full list of characters would need to be recorded like is done for strings and then released if the number is used, or used if the number is dropped.

creationix · 2015-10-05T17:27:19Z

Actually, it's probably a good idea to buffer the number and then use parseFloat(str). The memory overhead of the buffered number is minimal, and surely parseFloat is much faster than my manual implementation. It would shrink the code a bit too.

Once the new number parser is in, it would be trivial to store the string in case the parsed version is above the 53-bit threshold.

mpnovikova · 2015-10-05T17:29:54Z

We use JSONStream library that relies on this parser. There are many cases where it just uses a number. However in this particular case these long Int IDs are critical for the business functionality. The workaround I had was to omit the JSONStream all together, collect the data in the buffer, then use JSONbig.parse() to parse long int's; but then I run into performance issues and timeouts because instead of streaming the data I am waiting for it.

To answer you question, for the purposes of our project I could fork this repo and add your workaround. I would appreciate if you could point where and how this fix should be done.

Thank you for being so response, highly appreciate it

creationix · 2015-10-05T17:40:22Z

So the basic idea is to accumulate the characters in the number as a string. You can re-use the same property string parsing uses https://github.com/creationix/jsonparse/blob/master/jsonparse.js#L176

You don't need to worry about utf-8 decoding in the number code, all the digits are in the ascii 7-bit range.

Then at all the lines that emit NUMBER tokens (look like this.onToken(NUMBER...), you'll need to check if you should store the string or the number (probably factor this out into a common helper function).

creationix · 2015-10-05T17:42:04Z

The parseFloat idea is separate, but does depend on the string being buffered. If you add parseFloat in the emitting helper, you can remove all the lines that calculate the value piecemeal. The many number states are still required to know when the number has ended however.

mpnovikova · 2015-10-05T17:43:50Z

Thank you, I will give it a try

mpnovikova · 2015-10-06T15:34:22Z

This fragment https://github.com/creationix/jsonparse/blob/master/jsonparse.js#L183-L307 can be replaced by this

 } else if (this.tState === NUMBER1 || this.tState == NUMBER2 || this.tState === NUMBER3) {
        n = buffer[i];
        switch (n) {
          case 0x30: // 0
          case 0x31: // 1
          case 0x32: // 2
          case 0x33: // 3
          case 0x34: // 4
          case 0x35: // 5
          case 0x36: // 6
          case 0x37: // 7
          case 0x38: // 8
          case 0x39: // 9
          case 0x2e: // .
          case 0x65: // e
          case 0x45: // E
          case 0x2b: // +
          case 0x2d: // -
            this.magnatude += String.fromCharCode(n);
            this.tState = NUMBER3;
            break;
          default:
            this.tState = START;
            if (this.negative) {
              this.magnatude = '-' + this.magnatude;
            }
            this.onToken(NUMBER, parseFloat(this.magnatude));
            this.magnatude = undefined;
            i--;
            break;
        }
}

without loosing any existing functionality. Is that what you had in mind when you talked about parseFloat?

creationix · 2015-10-06T20:30:13Z

This is close, great work. Just a couple suggestions:

By merging the 3 states into one, you will be also allowing illegal patterns like --EE31e. It would be best I think to keep the separate states so as to keep the parser a little stricter. If parseFloat weren't so liberal, we could just depend on that to catch illegal patterns, but it will happily parse strings like "123this is not a number".
You can store the initial - in the string instead of setting the this.negative, then you don't need to prepend a "-" at the end.
I was initially confused by this.magnatude being used to store the string version of the number, I'd rather re-use the this.string property that string parsing uses. The parser is never parsing strings and numbers at the same time so there is no conflict and it makes sense semantically as well since we are storing a string.

You should send a pull request with the proposed changes and my suggested fixes. That way I can discuss inline and once it's good I can merge the PR and you'll get credit for the change and be listed as a collaborator in the repo.

mpnovikova · 2015-10-07T14:18:13Z

On point #1
"123this is not a number" won't go through 0-9eE+- mesh all parseFloat has to do after that is to give you NaN on cases like --EE31e; In case of 123Ee it will give you 123 both results are fine with me. Here is the branch fee free to comment inline https://github.com/mpnovikova/jsonparse/blob/bignum/jsonparse.js
Some unit tests are failing not but I am working on them

jeromew · 2016-07-25T09:56:31Z

I understand that this issue can be closed after 044b268

mmikhalko · 2019-09-13T08:40:39Z

The problem is still presenting if use a negative big int number like -9223372036854775808.
It had better to add in the RegExp '-' symbol in the line
https://github.com/creationix/jsonparse/blob/master/jsonparse.js#L270

There is a pull request for it case #38

mpnovikova changed the title ~~Erroneously rounds long integers numbers~~ Erroneously rounds long integer numbers Oct 5, 2015

wesleytodd mentioned this issue Dec 15, 2016

Status of this Library #30

Closed

mmikhalko mentioned this issue Sep 16, 2019

Fix parsing negative big int numbers as a string #38

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Erroneously rounds long integer numbers #26

Erroneously rounds long integer numbers #26

mpnovikova commented Oct 5, 2015

creationix commented Oct 5, 2015

mpnovikova commented Oct 5, 2015

creationix commented Oct 5, 2015

creationix commented Oct 5, 2015

mpnovikova commented Oct 5, 2015

creationix commented Oct 5, 2015

creationix commented Oct 5, 2015

mpnovikova commented Oct 5, 2015

mpnovikova commented Oct 6, 2015

creationix commented Oct 6, 2015

mpnovikova commented Oct 7, 2015

jeromew commented Jul 25, 2016

mmikhalko commented Sep 13, 2019 •

edited

Erroneously rounds long integer numbers #26

Erroneously rounds long integer numbers #26

Comments

mpnovikova commented Oct 5, 2015

creationix commented Oct 5, 2015

mpnovikova commented Oct 5, 2015

creationix commented Oct 5, 2015

creationix commented Oct 5, 2015

mpnovikova commented Oct 5, 2015

creationix commented Oct 5, 2015

creationix commented Oct 5, 2015

mpnovikova commented Oct 5, 2015

mpnovikova commented Oct 6, 2015

creationix commented Oct 6, 2015

mpnovikova commented Oct 7, 2015

jeromew commented Jul 25, 2016

mmikhalko commented Sep 13, 2019 • edited

mmikhalko commented Sep 13, 2019 •

edited