Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON precision loss on copyCurrentEvent() for floats that require greater than double precision #730

Closed
htmldoug opened this issue Dec 18, 2021 · 10 comments

Comments

@htmldoug
Copy link
Contributor

htmldoug commented Dec 18, 2021

Feature Request

I'd like a way to determine if getNumberType() is guaranteed to return an exact response or might lose precision if parsed to the corresponding native type.

Motivation

The JSON JsonParsers always return NumberType.DOUBLE -- never FLOAT or BIGDECIMAL. Consequently, I've been (incorrectly) parsing all JSON JsonToken.VALUE_NUMBER_FLOAT as doubles, losing precision for anything that doesn't fit. These comments suggest that the reason for always returning DOUBLE is that BigDecimal parsing is expensive, so it's up to the jackson-core consumer to ask for one if they want it.

As a user of the JsonParser I don't know if I want a BigDecimal at the point I interact with the JsonParser. I'm pushing the value downstream to be interpreted later. I'd like to get either a primitive float/double or else a CharSequence. This is the approach I'm taking in rallyhealth/weePickle#102.

However, converting all floats to strings is suboptimal when used with jackson-dataformats-binary. SmileParser knows that a double-tagged value (0x29) is 100% guaranteed to fit in a primitive double, but I have to convert it to a base10 string (and probably back to a base2 double later), since I can't trust getNumberType's answer of DOUBLE.

I've found #631, but for JSON, it appears always to return BigDecimal even for values like 0.0 that would fit in a double. I've benched about a 50% throughput penalty for sending all floats through BigDecimal.

AFAICT, it is not possible to determine whether or not to trust getNumberType() to avoid precision loss for any given JsonParser without hardcoding to its known behavior ahead of time (which may change in subsequent releases).

Proposal

I don't see a way to update the JSON JsonParser to return exact getNumberType() values without increasing upfront parsing cost, but if I had a signal that getNumberType() is trustworthy for a Smile JsonParser, but not for a JSON JsonParser, I could work around it. I would be able to ask for Strings for only non-exact values.

I think some method like boolean isNumberTypeGuaranteedExact() would satisfy my needs. Being able to ask for each token seems ideal. For the JSON JsonParser, it might return false for all JsonToken.VALUE_NUMBER_FLOAT.

WDYT? And especially, is there some other existing option that I've overlooked?

Thanks!

@htmldoug
Copy link
Contributor Author

It just occurred to me that this causes precision loss in jackson-core as well when going from JSON => JSON (e.g. when pretty printing). Failing test in #731.

@htmldoug
Copy link
Contributor Author

htmldoug commented Dec 18, 2021

A simpler but more disruptive option might be to introduce a new NumberType.FLOAT_UNKNOWN_PRECISION.

@cowtowncoder
Copy link
Member

Not commenting on earlier parts, but I will say that addition of a new NumberType enum value would lead to significant compatibility breakage.

I also thought I had added something in this space, but seems there's only JsonParser.getNumberValueExact() which is not quite it.

I think that since nature of exactness is mostly binary (exact usually) vs textual (usually not), it'd be per-parser feature: if so, could add new StreamReadCapability so caller could check if parser can provide exact value or not.
Or something along those lines.

@htmldoug
Copy link
Contributor Author

caller could check if parser can provide exact value or not.

That'd work.

@htmldoug htmldoug changed the title Efficient way to determine if getNumberType() returns guaranteed exact values? JSON precision loss on copyCurrentEvent() for floats that require greater than DOUBLE precision Dec 19, 2021
@cowtowncoder
Copy link
Member

@pjfanning Forgot this one, related to work to try to retain best representation. Failing test accidentally referenced #733 (PR) so changed that name.

@pjfanning
Copy link
Member

@cowtowncoder I'm not up to speed on this issue. Have we broken something?

@cowtowncoder
Copy link
Member

@pjfanning No, not at all -- it's an older issue; basically value that is outside of double gets converted to infinity and not retained in its full accuracy. I do not remember too many details but there is a reproduction. So was hoping it might have been resolved by other work, but does not seem to be the case.

@cowtowncoder cowtowncoder changed the title JSON precision loss on copyCurrentEvent() for floats that require greater than DOUBLE precision JSON precision loss on copyCurrentEvent() for floats that require greater than double precision Apr 6, 2023
@cowtowncoder
Copy link
Member

Hmmh. @pjfanning I think I may need to consider reverting functional change here, due to concerns wrt CBOR backend (and generally binary ones). As long as TokenBuffer can reliably retain accuracy, it is not 100% necessary for copyCurrentEvent() to try to do that -- I'll see how we changed that (with "Deferred" access, I think).

I am also thinking of adding new "copyCurrentEventExact()" method that would use the new mechanism, but then leave "copyCurrentEvent()" to use the old/existing (2.14) and before logic.
It is too bad we don't have tests that check for write sequences (although ones would be easy enough to add, either by Mocking or just sub-classing JsonGeneratorDelegate).

@cowtowncoder
Copy link
Member

Replaced by #984; closing as duplicate.

@cowtowncoder
Copy link
Member

Note: besides 2.15 adds lots of functionality to help (including #984), it's worth noting there is JsonParser.getNumberValueExact() (added in 2.12) that can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants