Skip to content
This repository has been archived by the owner on Sep 19, 2018. It is now read-only.

Catch numeric overflow errors and parse the numbers as strings instead. #152

Merged
merged 14 commits into from
Apr 27, 2016

Conversation

jgallagher
Copy link
Contributor

@jgallagher jgallagher commented Mar 24, 2016

Closes #76.

Makes no changes to the "fast path" of numbers that don't overflow. If a numeric overflow is detected (via a thrown NumberOverflow, which is now a private error), we go back to the beginning of the number and re-parse it for validity only, returning it as a .String.

The good: Returns overflows as strings (e.g., see the unit tests) without slowing down the parser.

The bad: We would now have two nearly identical groups of methods that parse numbers - one accumulates values and the other just looks for the end.

If we want to close #76 as part of 2.1, we could merge this and open a separate issue to clean up the internals. Not sure how to do that without sacrificing performance at the moment, but I'm sure we can come up with something.

Update from last couple commits - extracted number parsing out into a common struct with an internal state machine. Much less duplication between parsing numbers as numbers and parsing numbers as strings. Minimal slowdown (~1% on benchmarking branch).

Also also:

  • Avoid fatalError when json.int is used with a Double that exceeds Int.max.
  • Avoid overflow crash when using NSJSONSerialization on a 32-bit platform.
  • Ensure NSJSONSerialization large number handling is consistent across 32-bit and 64-bit (you get Double).
    • NOTE: This differs from Freddy's handling, which promotes a number whose JSON representation had no decimal part to String rather than lose precision by promoting to Double.

@jgallagher jgallagher added the bug label Mar 24, 2016
@jgallagher jgallagher added this to the 2.1 milestone Mar 24, 2016
@jgallagher
Copy link
Contributor Author

Also, consider the ignoring-whitespace diff which kills most of the apparent changes to JSONParser.swift.

// This would be more natural as `while true { ... }` with a meaningful .Done case,
// but that causes compile time explosion in Swift 2.2. :-|
while parser.state != .Done {
switch parser.state {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems really weird to me that we're switching on parser.state directly here, but none of the cases directly updates parser.state. It feels like we've got half the parser implementation in this loop, and the other half inside the parser itself.

Is anything preventing moving all the number parsing logic into the parser itself?

@jeremy-w
Copy link
Member

This fails to address issues with 32-bit overflow on 32-bit platforms:

diff --git a/Tests/JSONParserTests.swift b/Tests/JSONParserTests.swift
index 8aeee46..1141157 100644
--- a/Tests/JSONParserTests.swift
+++ b/Tests/JSONParserTests.swift
@@ -44,6 +44,14 @@ private func ==(lhs: JSONParser.Error, rhs: JSONParser.Error) -> Bool {

 class JSONParserTests: XCTestCase {

+    func testParsingOutALargeNumberEspeciallyOn32BitPlatforms() {
+        let val = 1466719200000.0
+        let str = "{\"startDate\": 1466719200000}"
+        let data = str.dataUsingEncoding(NSUTF8StringEncoding)!
+        let json = try? (sizeof(Int) == sizeof(Int64)) ? JSON(data: data) : JSON(data: data, usingParser: NSJSONSerialization.self)
+        XCTAssertEqual(try? json!.double("startDate"), val)
+    }
+
     private func JSONFromString(s: String) throws -> JSON {
         var parser = JSONParser(string: s)
         return try parser.parse()

That's checking the path with NSJSONSerialization parser, but if you swap to true ?, it exhibits the same error with our custom parser.

@jeremy-w
Copy link
Member

Failure in the custom parser case is because it gets treated as a string, so that's expected behavior. It's only with the NSJSONSerialization parser that things still go awry:

diff --git a/Tests/JSONParserTests.swift b/Tests/JSONParserTests.swift
index 8aeee46..f8c0412 100644
--- a/Tests/JSONParserTests.swift
+++ b/Tests/JSONParserTests.swift
@@ -44,6 +44,33 @@ private func ==(lhs: JSONParser.Error, rhs: JSONParser.Error) -> Bool {

 class JSONParserTests: XCTestCase {

+    func test32BitOverflowResultsInDoubleWithNSJSONSerializationParser() {
+        let jsonString = "{\"startDate\": 1466719200000}"
+        let expectedValue = 1466719200000.0
+
+        let data = jsonString.dataUsingEncoding(NSUTF8StringEncoding)!
+        guard let json = try? JSON(data: data, usingParser: NSJSONSerialization.self) else {
+            XCTFail("Failed to even parse JSON: \(jsonString)")
+            return
+        }
+
+        XCTAssertEqual(try? json.double("startDate"), expectedValue)
+    }
+
+    func test32BitOverflowResultsInStringWithFreddyParser() {
+        let jsonString = "{\"startDate\": 1466719200000}"
+        let expectedValue = "1466719200000"
+
+        let data = jsonString.dataUsingEncoding(NSUTF8StringEncoding)!
+        guard let json = try? JSON(data: data) else {
+            XCTFail("Failed to even parse JSON: \(jsonString)")
+            return
+        }
+
+        XCTAssertEqual(try? json.double("startDate"), nil)
+        XCTAssertEqual(try? json.string("startDate"), expectedValue)
+    }
+
     private func JSONFromString(s: String) throws -> JSON {
         var parser = JSONParser(string: s)
         return try parser.parse()
@@ -224,7 +251,7 @@ class JSONParserTests: XCTestCase {
                     let json = try JSONFromString(string)

                     // numbers overflow, but we should be able to get them out as strings
-                    XCTAssertEqual(try json.string(), string)
+                    XCTAssertEqual(try? json.string(), string)
                 } catch {
                     XCTFail("Unexpected error \(error)")
                 }

Prior to this, we would not crash when using the NSJSONSerialization
parser, but we would mangle the value by incorrectly assuming "not
a char and not a float means it fits in an Int", even when it was
properly an Int64.

We might want to force this overflow case to String for parity with the
Freddy parser.
@jeremy-w
Copy link
Member

I've resolved that. We might want to push overflowing 32-bit values into String rather than the "promote to Double" approach I've taken here. That would be a simple fix after this.

I had to switch from a switch/case to a series of ifs because Swift Build Configuration statements are statements in their own right, so I couldn't just conditionalize a case - it had to be a statement in its own right.

@jeremy-w
Copy link
Member

jeremy-w commented Apr 26, 2016

Travis failed the build because my tests depend on architecture, and I only wrote them for 32-bit.

TODO:

  • Switch to promoting to String, so NSJSONSerialization parser acts just like Freddy
    • Ended up switching back to Double, for consistency in output type between 32-bit and 64-bit builds with NSJSONSerialization.
  • Switch to using the string version of LONG_MAX + 1 as the test value, so it might also be useful for 64-bit builds – I suspect that NSJSONSerialization might use a long long value if it's spec'd as an integer rather than float, which we would also mis-handle on 64-bit, the same as we mishandled "not floating but bigger than Int" on 32-bit.
    • This avoids the architecture-specific-ness of the tests, which is a cleaner fix than compiling them out on 64-bit builds.

This standardizes behavior across the Freddy parser and the
NSJSONSerialization parser.

The new tests also pass under 64-bit devices, rather than just 32-bit
devices, which should make Travis significantly happier.

A commented out assertion captures an interesting fatal error, however,
which should also be fixed as part of this bounds-handling escapade.
Otherwise, you get a String on 32-bit, and a Double on 64-bit.

So our parser differ in overflowing value handling, but are at least now
consistent across architectures.
@@ -57,20 +57,49 @@ extension NSJSONSerialization: JSONParserType {
private static func makeJSON(object: AnyObject) -> JSON {
switch object {
case let n as NSNumber:
switch n {
case _ where CFNumberGetType(n) == .CharType || CFGetTypeID(n) == CFBooleanGetTypeID():
let numberType = CFNumberGetType(n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least as of 2.2, CFNumberType is an enum. We should switch over it exhaustively.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might get ugly with the way the #if stuff works, but I'll give it a shot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think wrapping an entire case in #if is still valid.

The CFNumber switch is exhaustive, so there should be no need for it.

Plus, after nuking it, no tests failed, so we must be good to go.

Right?
@zwaldowski zwaldowski merged commit c608942 into master Apr 27, 2016
@zwaldowski zwaldowski deleted the jgallagher/return-overflows-as-strings branch April 27, 2016 20:23
@kas-kad
Copy link

kas-kad commented Aug 16, 2016

TL;DR
Am I getting it right that it is not a bug to parse the number as string if the number overflows?
How do I parse the millisecond timestamp on 32-bit devices on which the following test fails?

func testLongIntParsing() {
        let val = 1470438000000.0
        let jsonStr = "{\"big\":1470438000000}"
        let json = try? JSON(jsonString: jsonStr)
        XCTAssertEqual(try? json!.double("big"), val)
}

should I use this workaround?

let json = try? (sizeof(Int) == sizeof(Int64)) ? JSON(data: data) : JSON(data: data, usingParser: NSJSONSerialization.self)

or this is better?

do {
    try self.reallyBigIdNumber = json.int("big")
} catch {
   try self.reallyBigIdNumber = json.string("big")
}

Either way, it is weird, that I have to use any workarounds.

@jgallagher
Copy link
Contributor Author

The first technique will let you pull it out as a double, but will be slower than the second technique (whether it's noticeably slower depends on your data).

As far as having to work around it - we had to make a choice about what to do with numbers that don't fit in Int, without giving up the niceness of having Int be one of the cases of the JSON enum. NSJSONSerialization chooses to give them back as Doubles, but that will give you the wrong answer if the number is large enough. For example:

let bigNum = "36028797018963969".dataUsingEncoding(NSUTF8StringEncoding)!
let json = try! JSON(data: bigNum)
let double = try! json.double() // gives you 36028797018963968, off by one

Other large numbers can be off by more. We chose to go the route that requires slightly more work but will not give you incorrect values; e.g., if you're hitting an API that returns large numbers as IDs where loss of precision is unacceptable, you still have the full original number available to do with what you will (e.g., putting it into an Int64 instead of a Double, putting it into some big number type, or even keeping it as a String).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Large integers cause overflows, especially on 32-bit hosts
5 participants