Catch numeric overflow errors and parse the numbers as strings instead. #152

jgallagher · 2016-03-24T16:46:56Z

Closes #76.

Makes no changes to the "fast path" of numbers that don't overflow. If a numeric overflow is detected (via a thrown NumberOverflow, which is now a private error), we go back to the beginning of the number and re-parse it for validity only, returning it as a .String.

The good: Returns overflows as strings (e.g., see the unit tests) without slowing down the parser.

~~The bad: We would now have two nearly identical groups of methods that parse numbers - one accumulates values and the other just looks for the end.~~

If we want to close #76 as part of 2.1, we could merge this and open a separate issue to clean up the internals. Not sure how to do that without sacrificing performance at the moment, but I'm sure we can come up with something.

Update from last couple commits - extracted number parsing out into a common struct with an internal state machine. Much less duplication between parsing numbers as numbers and parsing numbers as strings. Minimal slowdown (~1% on benchmarking branch).

Also also:

Avoid fatalError when json.int is used with a Double that exceeds Int.max.
Avoid overflow crash when using NSJSONSerialization on a 32-bit platform.
Ensure NSJSONSerialization large number handling is consistent across 32-bit and 64-bit (you get Double).
- NOTE: This differs from Freddy's handling, which promotes a number whose JSON representation had no decimal part to String rather than lose precision by promoting to Double.

jgallagher · 2016-03-24T16:48:53Z

Also, consider the ignoring-whitespace diff which kills most of the apparent changes to JSONParser.swift.

jeremy-w · 2016-04-26T15:32:11Z

Sources/JSONParser.swift

+        // This would be more natural as `while true { ... }` with a meaningful .Done case,
+        // but that causes compile time explosion in Swift 2.2. :-|
+        while parser.state != .Done {
+            switch parser.state {


It seems really weird to me that we're switching on parser.state directly here, but none of the cases directly updates parser.state. It feels like we've got half the parser implementation in this loop, and the other half inside the parser itself.

Is anything preventing moving all the number parsing logic into the parser itself?

jeremy-w · 2016-04-26T19:13:05Z

This fails to address issues with 32-bit overflow on 32-bit platforms:

diff --git a/Tests/JSONParserTests.swift b/Tests/JSONParserTests.swift
index 8aeee46..1141157 100644
--- a/Tests/JSONParserTests.swift
+++ b/Tests/JSONParserTests.swift
@@ -44,6 +44,14 @@ private func ==(lhs: JSONParser.Error, rhs: JSONParser.Error) -> Bool {

 class JSONParserTests: XCTestCase {

+    func testParsingOutALargeNumberEspeciallyOn32BitPlatforms() {
+        let val = 1466719200000.0
+        let str = "{\"startDate\": 1466719200000}"
+        let data = str.dataUsingEncoding(NSUTF8StringEncoding)!
+        let json = try? (sizeof(Int) == sizeof(Int64)) ? JSON(data: data) : JSON(data: data, usingParser: NSJSONSerialization.self)
+        XCTAssertEqual(try? json!.double("startDate"), val)
+    }
+
     private func JSONFromString(s: String) throws -> JSON {
         var parser = JSONParser(string: s)
         return try parser.parse()

That's checking the path with NSJSONSerialization parser, but if you swap to true ?, it exhibits the same error with our custom parser.

jeremy-w · 2016-04-26T19:22:00Z

Failure in the custom parser case is because it gets treated as a string, so that's expected behavior. It's only with the NSJSONSerialization parser that things still go awry:

diff --git a/Tests/JSONParserTests.swift b/Tests/JSONParserTests.swift
index 8aeee46..f8c0412 100644
--- a/Tests/JSONParserTests.swift
+++ b/Tests/JSONParserTests.swift
@@ -44,6 +44,33 @@ private func ==(lhs: JSONParser.Error, rhs: JSONParser.Error) -> Bool {

 class JSONParserTests: XCTestCase {

+    func test32BitOverflowResultsInDoubleWithNSJSONSerializationParser() {
+        let jsonString = "{\"startDate\": 1466719200000}"
+        let expectedValue = 1466719200000.0
+
+        let data = jsonString.dataUsingEncoding(NSUTF8StringEncoding)!
+        guard let json = try? JSON(data: data, usingParser: NSJSONSerialization.self) else {
+            XCTFail("Failed to even parse JSON: \(jsonString)")
+            return
+        }
+
+        XCTAssertEqual(try? json.double("startDate"), expectedValue)
+    }
+
+    func test32BitOverflowResultsInStringWithFreddyParser() {
+        let jsonString = "{\"startDate\": 1466719200000}"
+        let expectedValue = "1466719200000"
+
+        let data = jsonString.dataUsingEncoding(NSUTF8StringEncoding)!
+        guard let json = try? JSON(data: data) else {
+            XCTFail("Failed to even parse JSON: \(jsonString)")
+            return
+        }
+
+        XCTAssertEqual(try? json.double("startDate"), nil)
+        XCTAssertEqual(try? json.string("startDate"), expectedValue)
+    }
+
     private func JSONFromString(s: String) throws -> JSON {
         var parser = JSONParser(string: s)
         return try parser.parse()
@@ -224,7 +251,7 @@ class JSONParserTests: XCTestCase {
                     let json = try JSONFromString(string)

                     // numbers overflow, but we should be able to get them out as strings
-                    XCTAssertEqual(try json.string(), string)
+                    XCTAssertEqual(try? json.string(), string)
                 } catch {
                     XCTFail("Unexpected error \(error)")
                 }

Prior to this, we would not crash when using the NSJSONSerialization parser, but we would mangle the value by incorrectly assuming "not a char and not a float means it fits in an Int", even when it was properly an Int64. We might want to force this overflow case to String for parity with the Freddy parser.

jeremy-w · 2016-04-26T19:43:07Z

I've resolved that. We might want to push overflowing 32-bit values into String rather than the "promote to Double" approach I've taken here. That would be a simple fix after this.

I had to switch from a switch/case to a series of ifs because Swift Build Configuration statements are statements in their own right, so I couldn't just conditionalize a case - it had to be a statement in its own right.

jeremy-w · 2016-04-26T20:48:50Z

Travis failed the build because my tests depend on architecture, and I only wrote them for 32-bit.

TODO:

Switch to promoting to String, so NSJSONSerialization parser acts just like Freddy
- Ended up switching back to Double, for consistency in output type between 32-bit and 64-bit builds with NSJSONSerialization.
Switch to using the string version of LONG_MAX + 1 as the test value, so it might also be useful for 64-bit builds – I suspect that NSJSONSerialization might use a long long value if it's spec'd as an integer rather than float, which we would also mis-handle on 64-bit, the same as we mishandled "not floating but bigger than Int" on 32-bit.
- This avoids the architecture-specific-ness of the tests, which is a cleaner fix than compiling them out on 64-bit builds.

This standardizes behavior across the Freddy parser and the NSJSONSerialization parser. The new tests also pass under 64-bit devices, rather than just 32-bit devices, which should make Travis significantly happier. A commented out assertion captures an interesting fatal error, however, which should also be fixed as part of this bounds-handling escapade.

Otherwise, you get a String on 32-bit, and a Double on 64-bit. So our parser differ in overflowing value handling, but are at least now consistent across architectures.

zwaldowski · 2016-04-27T19:15:55Z

Sources/JSONParsing.swift

@@ -57,20 +57,49 @@ extension NSJSONSerialization: JSONParserType {
    private static func makeJSON(object: AnyObject) -> JSON {
        switch object {
        case let n as NSNumber:
-            switch n {
-            case _ where CFNumberGetType(n) == .CharType || CFGetTypeID(n) == CFBooleanGetTypeID():
+            let numberType = CFNumberGetType(n)


At least as of 2.2, CFNumberType is an enum. We should switch over it exhaustively.

Might get ugly with the way the #if stuff works, but I'll give it a shot.

I think wrapping an entire case in #if is still valid.

@zwaldowski

Once, this wasn't imported as an enum type, but it is now! [Thanks @zwaldowski.](https://github.com/bignerdranch/Freddy/pull/152/files/8e1e0626215d3c2a689619199020d4429e280c5f#r61319108)

The CFNumber switch is exhaustive, so there should be no need for it. Plus, after nuking it, no tests failed, so we must be good to go. Right?

kas-kad · 2016-08-16T11:33:39Z

TL;DR
Am I getting it right that it is not a bug to parse the number as string if the number overflows?
How do I parse the millisecond timestamp on 32-bit devices on which the following test fails?

func testLongIntParsing() {
        let val = 1470438000000.0
        let jsonStr = "{\"big\":1470438000000}"
        let json = try? JSON(jsonString: jsonStr)
        XCTAssertEqual(try? json!.double("big"), val)
}

should I use this workaround?

let json = try? (sizeof(Int) == sizeof(Int64)) ? JSON(data: data) : JSON(data: data, usingParser: NSJSONSerialization.self)

or this is better?

do {
    try self.reallyBigIdNumber = json.int("big")
} catch {
   try self.reallyBigIdNumber = json.string("big")
}

Either way, it is weird, that I have to use any workarounds.

jgallagher · 2016-08-16T14:37:57Z

The first technique will let you pull it out as a double, but will be slower than the second technique (whether it's noticeably slower depends on your data).

As far as having to work around it - we had to make a choice about what to do with numbers that don't fit in Int, without giving up the niceness of having Int be one of the cases of the JSON enum. NSJSONSerialization chooses to give them back as Doubles, but that will give you the wrong answer if the number is large enough. For example:

let bigNum = "36028797018963969".dataUsingEncoding(NSUTF8StringEncoding)!
let json = try! JSON(data: bigNum)
let double = try! json.double() // gives you 36028797018963968, off by one

Other large numbers can be off by more. We chose to go the route that requires slightly more work but will not give you incorrect values; e.g., if you're hitting an API that returns large numbers as IDs where loss of precision is unacceptable, you still have the full original number available to do with what you will (e.g., putting it into an Int64 instead of a Double, putting it into some big number type, or even keeping it as a String).

Catch numeric overflow errors and parse the numbers as strings instead.

9329ad6

jgallagher added the bug label Mar 24, 2016

jgallagher added this to the 2.1 milestone Mar 24, 2016

hydhknn and others added 6 commits March 24, 2016 15:06

Extract common number parsing logic.

8d8b3ee

Use NumberParser for backtracking overflow-number-as-string too

ccd90d3

Forgot trailing nul byte for string decoding

5d0426a

Work around insane Swift 2.2 compile times.

aa8a970

Merge branch 'master' into jgallagher/return-overflows-as-strings

6eac798

Merge branch 'master' into jgallagher/return-overflows-as-strings

ef1673b

jgallagher mentioned this pull request Apr 26, 2016

NumberOverflow error when parsing JSON with big numbers on 32bit devices #159

Closed

zwaldowski mentioned this pull request Apr 26, 2016

Large integers cause overflows, especially on 32-bit hosts #76

Closed

jeremy-w reviewed Apr 26, 2016
View reviewed changes

jeremy-w added 3 commits April 27, 2016 14:35

NSJSONSerialization: Switch back to Double for cross-arch consistency

cdbacd8

Otherwise, you get a String on 32-bit, and a Double on 64-bit. So our parser differ in overflowing value handling, but are at least now consistent across architectures.

Avoid fatalError when converting out of range Double to Int

8e1e062

zwaldowski reviewed Apr 27, 2016
View reviewed changes

jeremy-w added 3 commits April 27, 2016 15:32

Switch over CFNumberType enum

406e06f

Once, this wasn't imported as an enum type, but it is now! [Thanks @zwaldowski.](https://github.com/bignerdranch/Freddy/pull/152/files/8e1e0626215d3c2a689619199020d4429e280c5f#r61319108)

Fold Boolean TypeID check into number switch

e7d479d

Eliminate oddball CFBoolean TypeID check

19b1539

The CFNumber switch is exhaustive, so there should be no need for it. Plus, after nuking it, no tests failed, so we must be good to go. Right?

zwaldowski merged commit c608942 into master Apr 27, 2016

zwaldowski deleted the jgallagher/return-overflows-as-strings branch April 27, 2016 20:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catch numeric overflow errors and parse the numbers as strings instead. #152

Catch numeric overflow errors and parse the numbers as strings instead. #152

jgallagher commented Mar 24, 2016 •

edited by jeremy-w

Loading

jgallagher commented Mar 24, 2016

jeremy-w Apr 26, 2016

jeremy-w commented Apr 26, 2016

jeremy-w commented Apr 26, 2016

jeremy-w commented Apr 26, 2016

jeremy-w commented Apr 26, 2016 •

edited

Loading

zwaldowski Apr 27, 2016

jeremy-w Apr 27, 2016

zwaldowski Apr 27, 2016

kas-kad commented Aug 16, 2016 •

edited

Loading

jgallagher commented Aug 16, 2016

Catch numeric overflow errors and parse the numbers as strings instead. #152

Catch numeric overflow errors and parse the numbers as strings instead. #152

Conversation

jgallagher commented Mar 24, 2016 • edited by jeremy-w Loading

jgallagher commented Mar 24, 2016

jeremy-w Apr 26, 2016

Choose a reason for hiding this comment

jeremy-w commented Apr 26, 2016

jeremy-w commented Apr 26, 2016

jeremy-w commented Apr 26, 2016

jeremy-w commented Apr 26, 2016 • edited Loading

zwaldowski Apr 27, 2016

Choose a reason for hiding this comment

jeremy-w Apr 27, 2016

Choose a reason for hiding this comment

zwaldowski Apr 27, 2016

Choose a reason for hiding this comment

kas-kad commented Aug 16, 2016 • edited Loading

jgallagher commented Aug 16, 2016

jgallagher commented Mar 24, 2016 •

edited by jeremy-w

Loading

jeremy-w commented Apr 26, 2016 •

edited

Loading

kas-kad commented Aug 16, 2016 •

edited

Loading