-
-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support LazyNumber #893
support LazyNumber #893
Conversation
@cowtowncoder could you re-review? |
LazyNumber lazyNumber = null; | ||
if ((_numTypesValid & NR_BIGDECIMAL) == 0) { | ||
if (_numTypesValid == NR_UNKNOWN) { | ||
_parseNumericValue(NR_BIGDECIMAL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong: if the underlying number is integer, it should not be coerced into BigDecimal
?
Or do I misremember logic in _parseNumericValue
I guess I should have a look at unit test here first. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the if stmt is if ((_numTypesValid & NR_BIGDECIMAL) == 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmh ok. What is needed here are unit tests of seeing which LazyNumber
is actually returned in various cases. They should match getNumber()
, 1-to-1 I think (contents that is, Float
/LazyFloat
, etc etc).
While there are various possible edge cases (if number value is first accessed with non-lazy, then lazy), but the most important one to me is the one where nextToken()
is immediately followed by getLazyNumber
call -- that should work in a way that decoding is deferred if not yet done, but eagerly decoded value (for int
and long
) is used otherwise.
And I think we can actually count on "short" (int, long) integers to be eagerly decoded in that way I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, from later comments it does look like getLazyNumber()
would always give LazyBigInteger
for VALUE_NUMBER_INT
token. If so, test should verify that (assuming we don't want to add LazyInteger
/ LazyLong
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pjfanning Right, but I think that just means that no BigDecimal
value has been decoded for current token -- but we do not automatically always want BigDecimal
.
I think this should become clear with unit tests checking expected behavior: what I am worried about is that what is a short integral number (int
) would become LazyBigDecimal
here.
@pjfanning Looks good wrt lazy value types. I think the important part to add would be unit tests to check behavior for basic json backend (can just test against
For second part, setting that determines if But now that I write this I realize that maybe we do not even have (there is probably also need to then copy and modify this test for binary backends, later on -- but that can wait for now) Hmmh. Ok,
This would be more complex, but would actually (I think!) defer decision on "is it Double or BigDecimal" until caller indicates their wish by calling specific accessor. WDYT? I know it sounds complicated but... I think that'd work. btw also HUGE thank you for tackling this: I have thought about this for long time, but without getting started. |
@cowtowncoder I've made some of the changes you suggested. It does break a jackson-databind test case. getNumberValue (and now getLazyNumber) start with:
_parseNumericValue(NR_UNKNOWN) calls _parseSlowFloat and if you pass in NR_UNKNOWN, that code defaults to the lossy Double parsing approach. Can I suggest that _parseSlowFloat needs some logic for the NR_UNKNOWN where the length of the _textBuffer.contentsAsString() matters - so we would use Doubles for small strings and lazily parsed BigDecimals for longer strings? |
While I can see why such handling would have benefits, I think it would be confusing for users, when actual type we get is unpredictable. So I would be hesitant to add more specialized logic here with default settings. I guess there are different use cases here, and I have been more focused in making sure that explicit target type will work with buffering as well as directly: that is, if target is indicated as But handling of This is why I was thinking a But I would be open to alternate Enum, say, that could have 3 states:
and would only apply to Textual formats (or if binary formats do not have type, store FPs as text? If such format exists). WDYT? |
Looking a bit at existing implementation, I noticed that handling in most cases is divided first for integral numbers (JsonToken.VALUE_NUMBER_INT) vs floating point (VALUE_NUMBER_FLOAT); and:
This for "untyped" (Object), JsonNode, and to some degree Not sure what design would keep it all working and improve... just collecting observations so far. |
@cowtowncoder I was able to work around the test failure in jackson-databind by making a change in TokenBuffer. I had started another jackson-core change that still might be worth merging into this PR. BigDecimal/BigInteger parsing is lazy in ParserBase. For instance, _parseSlowFloat just updates a _numberString variable instead of actually parsing the number and the actual parse is done later when needed. Locally, I've made Float/Double work the same way. Is that something that would be useful? I still need to add more tests to this PR anyway. |
Yes, I think that lazy parsing of |
Ok, so going with postpone decoding of But beyond that I think that fundamentally handling of lazy wrappers is to be driven from Process for individual number is divided between integers ( Checking The other part is more interesting: I think we must have "polymorphic" value type Accessors from I don't know if this is helpful, but basically I think that |
Ok, here's another thought that occurred to me: instead of adding wrapper type(s) in here, we could alternatively add method like: public Object getNumberDeferred() {
// return either eagerly decoded `Number` OR `String` for lazy/deferred decoding
// default impl in base class simply does:
return getNumber(); // that is, if not overridden calls eager access method
} and let caller ( This may seem crude but with logic outlined above (with |
@pjfanning sorry I ask all these things but... would it be possible to have the "lazier double/float parsing" (with _numberString) as a separate PR? I could review and merge that sooner -- I am not sure LazyNumber functionality here is yet ready. But would be good to get the other part in. |
@cowtowncoder I created #899 |
@cowtowncoder would it be possible to go back to a TokenBuffer internal solution and abandon putting any of these changes in jackson-core (other than #899)? I can build on FasterXML/jackson-databind#3751 (which is useful in its own right) and do a follow up to 3751 which makes the TokenBuffer code that writes numbers to the buffer to store them as strings (ints and longs would be left to work as is) - TokenBuffer has the code to convert strings to numbers when reading them back off the buffer. |
@pjfanning One thing I could do is show what I mean by The basic idea is just to return:
and then on |
closing due to #903 |
TokenBuffer
for lazily decoded (big) numbers jackson-databind#3730