support LazyNumber #893

pjfanning · 2023-01-14T19:42:11Z

needs some tests
relates to Add support in TokenBuffer for lazily decoded (big) numbers jackson-databind#3730

src/main/java/com/fasterxml/jackson/core/io/LazyBigDecimal.java

pjfanning · 2023-01-20T17:26:31Z

@cowtowncoder could you re-review?

cowtowncoder · 2023-01-23T00:48:51Z

src/main/java/com/fasterxml/jackson/core/base/ParserBase.java

+        LazyNumber lazyNumber = null;
+        if ((_numTypesValid & NR_BIGDECIMAL) == 0) {
+            if (_numTypesValid == NR_UNKNOWN) {
+                _parseNumericValue(NR_BIGDECIMAL);


This seems wrong: if the underlying number is integer, it should not be coerced into BigDecimal?
Or do I misremember logic in _parseNumericValue
I guess I should have a look at unit test here first. :)

the if stmt is if ((_numTypesValid & NR_BIGDECIMAL) == 0)

Hmmh ok. What is needed here are unit tests of seeing which LazyNumber is actually returned in various cases. They should match getNumber(), 1-to-1 I think (contents that is, Float/LazyFloat, etc etc).

While there are various possible edge cases (if number value is first accessed with non-lazy, then lazy), but the most important one to me is the one where nextToken() is immediately followed by getLazyNumber call -- that should work in a way that decoding is deferred if not yet done, but eagerly decoded value (for int and long) is used otherwise.
And I think we can actually count on "short" (int, long) integers to be eagerly decoded in that way I think.

Ok, from later comments it does look like getLazyNumber() would always give LazyBigInteger for VALUE_NUMBER_INT token. If so, test should verify that (assuming we don't want to add LazyInteger / LazyLong).

@pjfanning Right, but I think that just means that no BigDecimal value has been decoded for current token -- but we do not automatically always want BigDecimal.

I think this should become clear with unit tests checking expected behavior: what I am worried about is that what is a short integral number (int) would become LazyBigDecimal here.

src/main/java/com/fasterxml/jackson/core/base/ParserBase.java

cowtowncoder · 2023-01-23T01:19:31Z

@pjfanning Looks good wrt lazy value types. I think the important part to add would be unit tests to check behavior for basic json backend (can just test against String parser as logic is the same across backends), so that, at things point:

If getNumber() would return Integer, Long or BigInteger, getLazyNumber() will return LazyBigInteger
For getNumber() return of Float / Double / BigDecimal, getLazyNumber() will return matching LazyXxx

For second part, setting that determines if BigDecimal or Double is may need to be tested.
Actually I don't think JSON parser would ever return Float at all, only either Double or BigDecimal...

But now that I write this I realize that maybe we do not even have JsonReadFeature/StreamReadFeature for deciding "natural" type for floating-point values: there is only DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS.
Perhaps we need StreamReadFeature.USE_BIG_DECIMAL_FOR_FLOATS to actually force this behavior from low level; it would affect getNumberType() and so on.

(there is probably also need to then copy and modify this test for binary backends, later on -- but that can wait for now)

Hmmh. Ok, Double vs BigDecimal is tricky. I wonder if we really only should specifically have one LazyDecimal value which would store one of:

BigDecimal (if already decoded, perhaps for binary format)
Double (ditto)
String -- for decoding into one (or even both) of above

This would be more complex, but would actually (I think!) defer decision on "is it Double or BigDecimal" until caller indicates their wish by calling specific accessor.
And so FINALLY @JsonCreator annotated BigDecimal (and Double) would be decoded from String only when actually needed.

WDYT? I know it sounds complicated but... I think that'd work.

btw also HUGE thank you for tackling this: I have thought about this for long time, but without getting started.

…d though

pjfanning · 2023-01-23T01:59:37Z

@cowtowncoder I've made some of the changes you suggested. It does break a jackson-databind test case.

getNumberValue (and now getLazyNumber) start with:

        if (_numTypesValid == NR_UNKNOWN) {
            _parseNumericValue(NR_UNKNOWN); // will also check event type
        }

_parseNumericValue(NR_UNKNOWN) calls _parseSlowFloat and if you pass in NR_UNKNOWN, that code defaults to the lossy Double parsing approach.

Can I suggest that _parseSlowFloat needs some logic for the NR_UNKNOWN where the length of the _textBuffer.contentsAsString() matters - so we would use Doubles for small strings and lazily parsed BigDecimals for longer strings?

cowtowncoder · 2023-01-23T02:12:41Z

@cowtowncoder I've made some of the changes you suggested. It does break a jackson-databind test case.

getNumberValue (and now getLazyNumber) start with:
        if (_numTypesValid == NR_UNKNOWN) {
            _parseNumericValue(NR_UNKNOWN); // will also check event type
        }
_parseNumericValue(NR_UNKNOWN) calls _parseSlowFloat and if you pass in NR_UNKNOWN, that code defaults to the lossy Double parsing approach.

Can I suggest that _parseSlowFloat needs some logic for the NR_UNKNOWN where the length of the _textBuffer.contentsAsString() matters - so we would use Doubles for small strings and lazily parsed BigDecimals for longer strings?

While I can see why such handling would have benefits, I think it would be confusing for users, when actual type we get is unpredictable. So I would be hesitant to add more specialized logic here with default settings.
However, as usual if there was a StreamReadFeature for opt-in, that would be fine.

I guess there are different use cases here, and I have been more focused in making sure that explicit target type will work with buffering as well as directly: that is, if target is indicated as BigDecimal, then that must be the handling; if Double, then that.

But handling of JsonNode, Object and Number is different now that I think about it.
Whether lazy/deferred or eager, there's the problem to consider -- Double or BigDecimal? It should be somewhat dynamic -- to use exact type from Binary formats -- but then again configurable for Textual case where there is no exact type.

This is why I was thinking a StreamReadFeature would make sense, to sort of fix low-level number type.

But I would be open to alternate Enum, say, that could have 3 states:

Always expose Double if no native type known (textual formats)
Always expose BigDecimal (same as ^^^)
Use heuristics to decide ("small" FP as Double, "big" as BigDecimal)

and would only apply to Textual formats (or if binary formats do not have type, store FPs as text? If such format exists).

WDYT?

cowtowncoder · 2023-01-23T06:41:45Z

Looking a bit at existing implementation, I noticed that handling in most cases is divided first for integral numbers (JsonToken.VALUE_NUMBER_INT) vs floating point (VALUE_NUMBER_FLOAT); and:

For integrals, getNumberType() used to distinguish and then use specific accessor
For floats, see if access is forced to coerce into BigDecimal -- and if so, use getDecimalValue() -- otherwise use getNumberValue(). In some cases tho alternatively use JsonParser.getNumberValueExact() (which is either native type from binary or BigDecimal for textual)

This for "untyped" (Object), JsonNode, and to some degree TokenBuffer.

Not sure what design would keep it all working and improve... just collecting observations so far.

pjfanning · 2023-01-23T15:25:15Z

@cowtowncoder I was able to work around the test failure in jackson-databind by making a change in TokenBuffer.

I had started another jackson-core change that still might be worth merging into this PR. BigDecimal/BigInteger parsing is lazy in ParserBase. For instance, _parseSlowFloat just updates a _numberString variable instead of actually parsing the number and the actual parse is done later when needed. Locally, I've made Float/Double work the same way. Is that something that would be useful?

I still need to add more tests to this PR anyway.

cowtowncoder · 2023-01-26T02:25:26Z

@cowtowncoder I was able to work around the test failure in jackson-databind by making a change in TokenBuffer.

I had started another jackson-core change that still might be worth merging into this PR. BigDecimal/BigInteger parsing is lazy in ParserBase. For instance, _parseSlowFloat just updates a _numberString variable instead of actually parsing the number and the actual parse is done later when needed. Locally, I've made Float/Double work the same way. Is that something that would be useful?

Yes, I think that lazy parsing of float and double would make sense too, regardless of "lazy"-work here.
So I could merge separate PR and we could verify it works fine with textual formats.

cowtowncoder · 2023-01-26T04:06:24Z

Ok, so going with postpone decoding of double/Double as per above is a good addition (and maybe ditto for float?).

But beyond that I think that fundamentally handling of lazy wrappers is to be driven from TokenBuffer perspective, when it is buffering content without knowing eventual target types.
Method is TokenBuffer._copyBufferValue().

Process for individual number is divided between integers (JsonToken.VALUE_NUMBER_INT) and floating-point numbers (JsonToken.VALUE_NUMBER_FLOAT). Former is much simpler so I'll start with that.
(note: for some "untyped" formats like XML, CSV, no number tokens are found; they will buffer Strings etc; it's not problematic... or at least solution needs to be different for any problems).

Checking JsonParser.getNumberType() is called to figure out, reliably, whether we get Integer, Long or BigInteger. In first 2 cases there's no point in building wrapper and we should just leave code as-is.
But BigInteger is different: here we do not want to force decoding. But we also should not call JsonParser.getText() since CBOR/Smile/Protobuf/Avro will have binary representation to give,
So this is where we would want to call getLazyNumber() -- there's a question if we want common base type, or separate method for getLazyIntegralNumber() -- but for this discussion I'll assume shared base type.
Value we get, LazyBigInteger etc, only ever contains that type, containing either BigInteger or String that contains one, to be decoded on-demand. No need to have accessors for anything else but BigInteger value (and I guess getText() because why not).

The other part is more interesting: I think we must have "polymorphic" value type LazyFPNumber, from which float/double/BigDecimal can be decoded on-demand: not separate implementation types, but one that has String, Float (or float), Double (or double). This is trickier to implement and has state (wrt what has been decoded). It might need to contain NumberType if strict type (from binary formats) is to be detected?
TokenBuffer should always just call getLazyNumber() for all JsonToken.VALUE_NUMBER_FLOATs: JsonParser will create and return LazyFPNumber.

Accessors from TokenBuffer will now need check for Number (for eager numbers from binary codecs, and for int/long from JSON, YAML).

I don't know if this is helpful, but basically I think that LazyNumber subtypes from this PR should be combined into just 2 implementations, and doing that will allow solving the problem efficiently and reliably.
I'll try to think it through some more and might add some more notes. :)

cowtowncoder · 2023-01-26T05:41:59Z

Ok, here's another thought that occurred to me: instead of adding wrapper type(s) in here, we could alternatively add method like:

     public Object getNumberDeferred() { 
          // return either eagerly decoded `Number` OR `String` for lazy/deferred decoding
          // default impl in base class simply does:
          return getNumber(); // that is, if not overridden calls eager access method
     }

and let caller (TokenBuffer) handle it the best it seems, using optimal wrapper.

This may seem crude but with logic outlined above (with TokenBuffer separating calls for integral vs floating-point types) would work and could be slightly simpler.

cowtowncoder · 2023-01-27T05:49:02Z

@pjfanning sorry I ask all these things but... would it be possible to have the "lazier double/float parsing" (with _numberString) as a separate PR? I could review and merge that sooner -- I am not sure LazyNumber functionality here is yet ready. But would be good to get the other part in.

pjfanning · 2023-01-27T10:21:00Z

@cowtowncoder I created #899

pjfanning · 2023-01-27T13:23:14Z

@cowtowncoder would it be possible to go back to a TokenBuffer internal solution and abandon putting any of these changes in jackson-core (other than #899)?

I can build on FasterXML/jackson-databind#3751 (which is useful in its own right) and do a follow up to 3751 which makes the TokenBuffer code that writes numbers to the buffer to store them as strings (ints and longs would be left to work as is) - TokenBuffer has the code to convert strings to numbers when reading them back off the buffer.

cowtowncoder · 2023-01-28T01:04:34Z

@pjfanning One thing I could do is show what I mean by getNumberDeferred() by... well, implementing it.

The basic idea is just to return:

Number just like getNumber() if (and only if!) number has already been decoded (or is in format from which one might as well do that, like byte array or buffer for binary formats)
String otherwise: caller will know whether this integral or floating-point from JsonToken type

and then on TokenBuffer side there is need for handling deferred operation, using new wrapper type that can distinguish between deferred BigInteger (int and long are never deferred I think; I think this can be considered a hard constraint at this point in time) or one of floating-point types.
I think it is reasonable to fix NumberType on typed access (if specific number type is requested), but for getNumberType() to use heuristics otherwise, using DeserializationFeature to determine between DOUBLE and BIG_DECIMAL (FLOAT would only be indicated if pre-decode value gotten, or getFloatValue() called).

pjfanning · 2023-01-29T10:16:17Z

closing due to #903

pjfanning added 4 commits January 14, 2023 20:41

support LazyNumber

cf07830

Update ParserBase.java

309533f

Update ParserBase.java

9e2acef

Update ParserBase.java

a4991f8

pjfanning marked this pull request as draft January 17, 2023 14:04

cowtowncoder reviewed Jan 19, 2023

View reviewed changes

src/main/java/com/fasterxml/jackson/core/io/LazyBigDecimal.java Show resolved Hide resolved

pjfanning added 11 commits January 19, 2023 14:10

more lazy types

eccf61d

Update ParserBase.java

dba596c

Update ParserBase.java

25caadd

Create TestLazyNumbers.java

1958e55

Update TestLazyNumbers.java

767a905

Update AsyncReaderWrapper.java

3f58f0a

issue with ints

136ef78

Update ParserBase.java

0f449b1

Update ParserBase.java

8cfe786

extend tests

55b4a52

Update NumberParsingTest.java

377e531

pjfanning changed the title ~~WIP: support LazyNumber~~ support LazyNumber Jan 20, 2023

pjfanning marked this pull request as ready for review January 20, 2023 17:22

use switch stmt

eeb2fae

pjfanning added 2 commits January 20, 2023 17:27

Update JsonParser.java

aa6f970

Update LazyBigInteger.java

36dbb8e

cowtowncoder reviewed Jan 23, 2023

View reviewed changes

src/main/java/com/fasterxml/jackson/core/base/ParserBase.java Outdated Show resolved Hide resolved

cowtowncoder reviewed Jan 23, 2023

View reviewed changes

src/main/java/com/fasterxml/jackson/core/base/ParserBase.java Outdated Show resolved Hide resolved

pjfanning marked this pull request as draft January 23, 2023 01:31

pjfanning added 2 commits January 23, 2023 01:48

refactor code based on review - breaks a test case in jackson-databin…

4be7298

…d though

Merge branch '2.15' into lazy-num

cf85132

pjfanning added 3 commits January 23, 2023 02:50

Update ParserBase.java

6a4c17e

Update ParserBase.java

4181a06

Update ParserBase.java

83d1fcc

Update ParserBase.java

964aa6e

pjfanning added 3 commits January 23, 2023 15:36

Update ParserBase.java

0c09f35

extend tests

b3a671a

Merge branch '2.15' into lazy-num

d2e1bc3

pjfanning added 2 commits January 26, 2023 06:22

Merge branch 'lazy-num' into lazier-num

2b18941

Merge branch '2.15' into lazy-num

2a2d930

pjfanning marked this pull request as ready for review January 26, 2023 18:43

pjfanning marked this pull request as draft January 27, 2023 13:23

Merge branch '2.15' into lazy-num

adec11a

This was referenced Jan 28, 2023

Add Object JsonParser.getNumberValueDeferred() method to allow for deferred decoding in some cases #902

Closed

Implement JsonParser.getNumberValueDeferred(), add tests #903

Merged

pjfanning closed this Jan 29, 2023

pjfanning deleted the lazy-num branch February 3, 2023 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support LazyNumber #893

support LazyNumber #893

pjfanning commented Jan 14, 2023 •

edited

pjfanning commented Jan 20, 2023

cowtowncoder Jan 23, 2023

pjfanning Jan 23, 2023

cowtowncoder Jan 23, 2023

cowtowncoder Jan 23, 2023

cowtowncoder Jan 23, 2023

cowtowncoder commented Jan 23, 2023

pjfanning commented Jan 23, 2023

cowtowncoder commented Jan 23, 2023

cowtowncoder commented Jan 23, 2023

pjfanning commented Jan 23, 2023

cowtowncoder commented Jan 26, 2023

cowtowncoder commented Jan 26, 2023

cowtowncoder commented Jan 26, 2023

cowtowncoder commented Jan 27, 2023

pjfanning commented Jan 27, 2023 •

edited

pjfanning commented Jan 27, 2023

cowtowncoder commented Jan 28, 2023

pjfanning commented Jan 29, 2023

support LazyNumber #893

support LazyNumber #893

Conversation

pjfanning commented Jan 14, 2023 • edited

pjfanning commented Jan 20, 2023

cowtowncoder Jan 23, 2023

Choose a reason for hiding this comment

pjfanning Jan 23, 2023

Choose a reason for hiding this comment

cowtowncoder Jan 23, 2023

Choose a reason for hiding this comment

cowtowncoder Jan 23, 2023

Choose a reason for hiding this comment

cowtowncoder Jan 23, 2023

Choose a reason for hiding this comment

cowtowncoder commented Jan 23, 2023

pjfanning commented Jan 23, 2023

cowtowncoder commented Jan 23, 2023

cowtowncoder commented Jan 23, 2023

pjfanning commented Jan 23, 2023

cowtowncoder commented Jan 26, 2023

cowtowncoder commented Jan 26, 2023

cowtowncoder commented Jan 26, 2023

cowtowncoder commented Jan 27, 2023

pjfanning commented Jan 27, 2023 • edited

pjfanning commented Jan 27, 2023

cowtowncoder commented Jan 28, 2023

pjfanning commented Jan 29, 2023

pjfanning commented Jan 14, 2023 •

edited

pjfanning commented Jan 27, 2023 •

edited