Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the default float/double parser as fallback #779

Closed
wants to merge 1 commit into from

Conversation

grcevski
Copy link

@grcevski grcevski commented Jun 24, 2022

The FastDoubleParser introduced via #747 and #766 will fail on parsing certain esoteric OpenJDK Double/Float supported inputs.

This PR tries to remedy this by introducing a try/catch around the FastDoubleParser and then using the Java default Double/Float parsers as a fall-back.

Relates to #778 and wrandelshofer/FastDoubleParser#19

@plokhotnyuk
Copy link

@grcevski Have you measured slowdown of parsing additional formats with turned on fast parse routines?

I bet that for deep stacks it could be vulnerable for DoS/DoW attacks when non-trusted input is used.

@grcevski
Copy link
Author

I don't have performance numbers for the invalid formats, obviously the exception handling will introduce some overhead, but assuming these are exceptional cases the overall impact on performance should be minimal.

The number of frames when an exception is hit for these unsupported formats is 5 at most, which isn't deeper than what FloatingDecimal would do in OpenJDK to parse the numbers.

@pjfanning
Copy link
Member

@grcevski could you report this issue to https://github.com/wrandelshofer/FastDoubleParser ?

My gut feeling is that jackson-core shouldn't have the try-catch. As @plokhotnyuk, there are DoS/DoW risks with relying on exceptions. The fast parser support is optional and disabled by default. So users who want to parse JSON from untrusted sources would be better off not enabling the fast parser support.

@grcevski
Copy link
Author

@grcevski could you report this issue to https://github.com/wrandelshofer/FastDoubleParser ?

My gut feeling is that jackson-core shouldn't have the try-catch. As @plokhotnyuk, there are DoS/DoW risks with relying on exceptions. The fast parser support is optional and disabled by default. So users who want to parse JSON from untrusted sources would be better off not enabling the fast parser support.

Sure thing, I'll open an issue there and link back.

@grcevski
Copy link
Author

If the first example is fixed for 1.1e-23f, I guess an alternative fix would be to avoid using the FastDoubleParser for hexadecimal floating point numbers. I'll double check to see if I can find other non-hexadecimal error patterns.

@pjfanning
Copy link
Member

If the first example is fixed for 1.1e-23f, I guess an alternative fix would be to avoid using the FastDoubleParser for hexadecimal floating point numbers. I'll double check to see if I can find other non-hexadecimal error patterns.

I understand your concern but every check in the code will slow down the throughput. So far, the issues appear to be with edge cases that are unlikely to appear in real data sets - at least ones that have not been manipulated by a malicious actor.

We may still accept this PR or a something similar but I would be interested to see if @wrandelshofer has any ideas on a solution to supporting these edge cases in his parser code.

@grcevski
Copy link
Author

My gut feeling is that jackson-core shouldn't have the try-catch. As @plokhotnyuk, there are DoS/DoW risks with relying on exceptions. The fast parser support is optional and disabled by default. So users who want to parse JSON from untrusted sources would be better off not enabling the fast parser support.

I do understand the general concern about try and catch, but I'm not sure if it's an issue here, the default OpenJDK implementation for parseDouble will try catch:

https://github.com/openjdk/jdk/blob/160944bc6bd94d2927f398cf7732027c1b836a42/src/java.base/share/classes/jdk/internal/math/FloatingDecimal.java#L1837

So I don't think this approach is making things worse in this respect.

@pjfanning
Copy link
Member

@grcevski the use of try/catch in the existing JDK parser code is one of the reasons it is slower than the fast parser support.

@grcevski
Copy link
Author

I apologize, I wasn't clear in what I meant to say. I meant to say that the default OpenJDK implementation will involve the exception handler for bad input, meaning that any concerns about DoS/DoW risks with relying on exceptions are not made worse in any way by wrapping the fast double parser code in an exception handler. The fast double parser doesn't internally try and catch, but the default one already is, not in Jackson's code, but downstream.

@plokhotnyuk
Copy link

plokhotnyuk commented Jun 24, 2022

@grcevski Please, just measure ;)

The problem is not in try/catch itself but in the frequency of throwing of exceptions that will unwind a whole stack trace.

Here is a benchmark that can be easily modified to have hexadecimal mantissas.

You can run it with the following command for a locally published artifacts from your branch and compare results for the latest code from 2.14 branch:

sbt 'jsoniter-scala-benchmarkJVM/jmh:run -p size=128 ArrayOf(Doubles|Floats)Reading.jacksonScala'

Good w/a would be using of some static exception (see https://shipilev.net/blog/2014/exceptional-performance/) or special NaN value with encoded error type in the internal representation.

@ChrisHegarty
Copy link

@plokhotnyuk To clarify, is the concern related to the additional cost of materialising the stack, with fillInStackTrace, for the initial NFE that will be swallowed? Or is the concern related to the general runtime performance of the JIT'ed code for the regular non-throwing cases (which should not be negatively affected by this change)?

@grcevski
Copy link
Author

Based on the comment by @wrandelshofer wrandelshofer/FastDoubleParser#19 (comment) on the upstream issue this is likely a non-issue with jackson, since the number parsing differences are only when format specifiers are used. I'll run the full suite of the JDK float tests without format specifiers and if it's all good I'll close this PR and the related issue.

@wrandelshofer
Copy link

I can easily refactor the back-end classes of FastDoubleParser to better suit your needs.

The back-end classes are DoubleFromByteArray, DoubleFromCharArray
DoubleFromCharSequence, FloatFromByteArray, FloatFromCharArray,
FloatFromCharSequence.

They internally return a primitive long-value that they will then convert into a double or a float using Double.longBitsToDouble(long) or Float.intBitsToFloat(int).

So, instead of throwing an Exception, I can make them encode the error message in the long-value, by using some unused NaN bits.
(Note: unused NaN bits can not be reliably used in a primitive double-value or a float-value, because Double.longBitsToDouble(long) and Float.intBitsToFloat(int) may throw them away! Quote from Double.longBitsToDouble(long): "Note that this method may not be able to return a double NaN with exactly same bit pattern as the long argument."

Also, it might be possible to provide multiple entry points in the back-end classes. For example one entry point that accepts the full lexical grammar of java.util.Double.parseDouble, one that accepts XML Schema grammar, one for JSON grammar, and so on.

@pjfanning
Copy link
Member

@wrandelshofer thanks for the offer to refactor your code. At the moment, all jackson-core needs are parseDouble and parseFloat methods that are as close as possible to being drop-in replacements for the JDK methods. Whatever you choose to focus on will be of great interest to us nonetheless.

@wrandelshofer
Copy link

I bet that for deep stacks it could be vulnerable for DoS/DoW attacks when non-trusted input is used.

I honestly haven't looked at this yet from a vulnerability perspective.

In an earlier version of the FastDoubleParser library I had a complete code path for all cases in there, which would have allowed to fully assess and control this. Then - for the sake of code size (and performance) - I decided to fall back to the JDK code for the "slow path". 🤔
For example, we never need to pass more than 768 mantissa digits to the JDK to get the correct results. However, the current implementation of FastDoubleParserFromByteArray/CharArray, will naïvely allocate a String of the same size as the input... The maximal supported input size is 2^31 - 1 characters (!)

@cowtowncoder
Copy link
Member

I know that the Javadocs in helper methods do not explanation rationale well, but note that these parse methods are only meant to support floating-point values that Jackson parser/decoder (JsonParser) accepts -- which by default is only JSON spec fps, although with various JsonReadFeature flags potentially with some deviations. But trailing type qualifier (f and d) are not accepted ever as far I know.

So, it'd first be necessary to have a problem to fix; I am not sure this PR makes sense at jackson-core level.

There may be other related questions about jackson-databind which may (... unfortunately) allow wider selection of floating-point values for coercion cases -- that is, JSON String that contains number representation may allow more lenient handling. This could be something we have to address, although even there the expected behavior is probably not documented at this point.
(... which means users will invariably assume that the observed behavior is the Correct and Expected behavior :) ).

@grcevski
Copy link
Author

I'm closing this PR, I ran all OpenJDK Double tests with stripped trailing 'f' and 'd' characters, no problems at all, except for other invalid values like '+inifinity' and lowercase 'nan'.

@grcevski grcevski closed this Jun 24, 2022
@grcevski grcevski deleted the fastdoubleparse/try_catch branch June 24, 2022 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants