JSONSerialization: Use NSDecimalNumber for parsing json numbers. #1655

spevans · 2018-08-06T06:46:47Z

Replace strol()/strtod() parsing with Scanner.scanDecimal().
This allows parsing of Integers > Int64.max (upto UInt64.max)
without losing information due to the fallback parsing provided
by parsing as a Double.
Also allows parsing larger integers into a Decimal.

Requires PR #1653 for some NSDecimalNumber and NSNumber fixes

- Replace strol()/strtod() parsing with Scanner.scanDecimal(). This allows parsing of Integers > Int64.max (upto UInt64.max) without losing information due to the fallback parsing provided by parsing as a Double.

alblue · 2018-08-07T11:59:22Z

@swift-ci please test

spevans · 2018-08-07T12:07:46Z

@alblue This will currently fail tests until #1653 is merged

itaiferber

I'm concerned about three main things here:

The cost of parsing as NSDecimalNumber every time, especially by spinning up a new Scanner for every single read number
Potential new differences in precision and representation. I haven't followed the implementation of NSDecimalNumber on swift-corelibs-foundation, but is its implementation significantly different from NSDecimalNumber on Darwin? Because on Darwin, NSDecimalNumber has a significantly lower possible magnitude than Double does (IIRC 10^±127 vs. 10^±308), with the tradeoff being that it allows for higher precision. On Darwin, we prefer to parse as a Double if the value fits both within the magnitude and precision of a Double — the C standard guarantees this to be true if the string representation of the mantissa is less than DBL_DECIMAL_DIG in length, so this is what we check. If we cannot fit the number in a Double without loss of precision, then we prefer NSDecimalNumber if the value fits within its magnitude; otherwise we're willing to lose precision to fit in a Double. If the number is too large to fit in either, we bail. Here we'd be giving up on the more efficient representation in an Int/Double, along with the greater range that Double provides (unless I'm missing something and NSDecimalNumber is more capable here than it is on Darwin)
@millenomi — this one is more for you. With all of the bridging work you did, do we have any additional testing/coverage for NSDecimalNumber and how that works out? Folks probably have a lot of code which currently checks as? Int and as? Double since for a long time those were the only two supported options here — I'd hate for an NSDecimalNumber to accidentally slip through with loss of precision or representation or something. Even so, the representation in NSDecimalNumber is not good enough for Ints. On Darwin, the following code prints nil due to loss of precision making the number no longer "cleanly" representable in an Int:
```
let dec = Decimal(1123123123)
let num = NSDecimalNumber(decimal: dec)
let i = (num as NSNumber) as? Int
print(i)
```

Overall I think the changes here are good in spirit but we'll likely need to maintain some of the existing representations we already have unless my concerns are unfounded due to my lack of experience here (which I would happily accept!). What we might be able to do is transition over to more of what we have on Darwin — prefer Double unless we can get away with an NSDecimalNumber.

itaiferber · 2018-08-07T16:35:35Z

Foundation/JSONSerialization.swift

+        // Validate the first few characters look like a JSON encoded number:
+        // Optional '-' sign at start only 1 leading zero if followed by a decimal point.
+        var index = input
+        func nextDigit() -> UInt8? {


Should this more clearly be called nextASCII()/nextCharacter()/similar?

itaiferber · 2018-08-07T16:36:58Z

Foundation/JSONSerialization.swift

+        let MINUS = UInt8(ascii: "-")
+
+        func parseTypedNumber(_ string: String) throws -> (Any, IndexDistance)? {
+            let scan = Scanner(string: string)


It's worth testing the performance of creating a new Scanner for every single number we try to parse — I suspect that this is a potentially prohibitively expensive operation.

itaiferber · 2018-08-07T19:42:46Z

Note also that this conflicts with the work in #1654 so we might need to coordinate something here.

spevans · 2018-08-07T20:21:50Z

@itaiferber Thanks for your feedback. I hadn't actually considered the cost of the Scanner startup at all to be honest so that may be a point against this solution.

As to the difference between Double v Decimal I was mostly aiming to hold more information for integers > UInt64.max and also to allow parsing of numbers into Decimal as well as Double.
Of course these types both have there advantages and disadvantages and no one type is better over all, and I hadn't considered the large range of Double being more important.

As to the solution in #1654, I did originally implement that same solution and there are a number of issues with it: Firstly it can involve upto 3 separate string parses. Also care must be taken with stroull as it will return positive values for numbers between Int64.min and -UInt64.max ie it effectively runs abs() on its result unless a - sign is checked for separately.

As for the bridging issues, my tests showed up some problems but these should be resolved in #1653 .

However thinking about this problem further, I realised the real issue is that we parse the string before knowing how we want it use it. So I came up with the following solution which is to store the number as a String and only parse it when it unboxed:

class NSJSONNumber: NSNumber {
let value: String

  init?(_ str: String) {
    var _value = ""
    for ch in str {
      // Validate this matches  JSON number format and return nil otherwise
      .
      .
      _value += ch   
    }
    value = str
  }

  var valueAsDouble :Double? { return Double(value) }
  var valueAsDecimal: Decimal? { return Decimal(value) }
  var valueAsUInt64: UInt64? { return UInt64(value) }
  etc..
}

This would ensure that full JSON number validation is performed, which currently doesn't happen, and the value is only parsed once using the correct parser. Does this seem reasonable or am missing anything here?

itaiferber · 2018-08-07T20:28:08Z

@spevans I was quickly putting together an approach similar to #1654 — I agree that it can involve up to 3 parses but they should be relatively short and quick as the input is limited in length. Otherwise, though: how do we know how we want to use a parsed number? Aren't these values just returned to the developer directly? Or are you proposing a further deferral that will only parse the value once it's read and cast out (e.g. a number that totally defers evaluation until you try to access it)?

(If so, it doesn't necessarily improve things as it's totally valid to write if let num as? Int /* fail */ { ... } else if let num as? UInt64 /* fail */ { ... } else if let num as? Double /* success */ { ... } which doesn't decrease the number of attempted parses.)

spevans · 2018-08-07T20:57:51Z

I was thinking out not parsing until its accessed and with Codable doesn't it know the type it wants? With the #1654 solution every Double is parsed 3 times, which only matches the worst case you listed above.

Also, if it is a subclass of NSNumber the other properties are still available (eg intValue) which is currently lossy anyway for out of range numbers

spevans · 2018-08-07T20:59:20Z

FYI, here is the version I had using the strto* functions:

func parseNumber(_ input: Index, options opt: JSONSerialization.ReadingOptions) throws -> (Any, Index)? {
        func parseTypedNumber(_ address: UnsafePointer<UInt8>, count: Int) throws -> (Any, IndexDistance)? {
            let temp_buffer_size = 64
            var temp_buffer = [CChar](repeating: 0, count: temp_buffer_size)

            return try temp_buffer.withUnsafeMutableBufferPointer { (buffer: inout UnsafeMutableBufferPointer<CChar>) throws -> (Any, IndexDistance)? in
                memcpy(buffer.baseAddress!, address, min(count, temp_buffer_size - 1)) // ensure null termination

                var chIndex = 0
                var ch = buffer[chIndex]
                let isNegative: Bool
                if ch == CChar(UInt8(ascii: "-")) {
                    isNegative = true
                    chIndex = chIndex + 1
                    ch = buffer[chIndex]
                } else {
                    isNegative = false
                }

                guard ch >= CChar(UInt8(ascii: "0")) && ch <= CChar(UInt8(ascii: "9")) else {
                    return nil
                }

                // Only allow 1 leading zero and it must be followed by a decimal point
                if ch == CChar(UInt8(ascii: "0")) && buffer[chIndex + 1] == CChar(UInt8(ascii: "0")) {
                    throw NSError(domain: NSCocoaErrorDomain, code: CocoaError.propertyListReadCorrupt.rawValue,
                                  userInfo: ["NSDebugDescription" : "Number with leading zero around character 16." ])
                }

                let startPointer = buffer.baseAddress!
                let endPointer = UnsafeMutablePointer<UnsafeMutablePointer<Int8>?>.allocate(capacity: 1)
                defer { endPointer.deallocate() }

                func parseAsDouble() -> (Double, IndexDistance) {
                    let doubleResult = strtod(startPointer, endPointer)
                    let distance = startPointer.distance(to: endPointer[0]!)
                    return (doubleResult, distance)
                }

                func parseAsInt64() -> (Int64?, IndexDistance) {
                    errno = 0
                    let int64Result = strtoll(startPointer, endPointer, 10)
                    let outOfRange = ((int64Result == Int64.max || int64Result == Int64.min) && errno == ERANGE)
                    let invalid = (int64Result == 0 && errno == EINVAL)
                    let distance = startPointer.distance(to: endPointer[0]!)

                    if distance == 0 || outOfRange || invalid {
                        return (nil, distance)
                    } else {
                        return (int64Result, distance)
                    }
                }

                func parseAsUInt64() -> (UInt64?, IndexDistance) {
                    // strtoull ignores a leading '-' and always returns a positive value
                    // so check against the flag set earlier
                    guard !isNegative else { return (nil, 0) }
                    errno = 0
                    let uint64Result = strtoull(startPointer, endPointer, 10)
                    let outOfRange = (uint64Result == UInt64.max && errno == ERANGE)
                    let invalid = (uint64Result == 0 && errno == EINVAL)
                    let distance = startPointer.distance(to: endPointer[0]!)

                    if distance == 0 || outOfRange || invalid {
                        return (nil, distance)
                    } else {
                        return (uint64Result, distance)
                    }
                }

                // Try parsing as a Double, checks that it is some sort of number
                let (doubleResult, doubleDistance) = parseAsDouble()
                guard doubleDistance > 0 else { return nil }

                let (int64Result, int64distance) = parseAsInt64()
                if let result = int64Result {
                    if int64distance == doubleDistance {
                        // The parsed number can be represented as an Int64
                        return (NSNumber(value: result), int64distance)
                    } else {
                        return (NSNumber(value: doubleResult), doubleDistance)
                    }
                }

                // Int64 wasnt sufficient, try parsing as a UInt64
                let (uint64Result, uint64distance) = parseAsUInt64()
                if let result = uint64Result {
                    return (NSNumber(value: result), uint64distance)
                }

                // This double result will be 0 or infinity for out of range values. How should they be
                // treated?
                return (NSNumber(value: doubleResult), doubleDistance)
            }
        }

        if source.encoding == .utf8 {
            return try parseTypedNumber(source.buffer.baseAddress!.advanced(by: input), count: source.buffer.count - input).map { return ($0.0, input + $0.1) }
        }
        else {
            var numberCharacters = [UInt8]()
            var index = input
            while let (ascii, nextIndex) = source.takeASCII(index), JSONReader.numberCodePoints.contains(ascii) {
                numberCharacters.append(ascii)
                index = nextIndex
            }
            numberCharacters.append(0)

            return try numberCharacters.withUnsafeBufferPointer {
                try parseTypedNumber($0.baseAddress!, count: $0.count)
                }.map { return ($0.0, index) }
        }
    }

spevans · 2018-08-08T14:12:16Z

Closed in favour of #1657

JSONSerialization: Use NSDecimalNumber for parsing json numbers.

cc0b117

- Replace strol()/strtod() parsing with Scanner.scanDecimal(). This allows parsing of Integers > Int64.max (upto UInt64.max) without losing information due to the fallback parsing provided by parsing as a Double.

spevans requested a review from itaiferber August 6, 2018 06:48

itaiferber suggested changes Aug 7, 2018

View reviewed changes

xwu mentioned this pull request Aug 8, 2018

[JSONSerialization] Use Int64 and UInt64 for integer parsing #1654

Closed

spevans closed this Aug 8, 2018

itaiferber mentioned this pull request Aug 10, 2018

JSONSerialization: Improve parsing of numbers #1657

Merged

This was referenced Aug 6, 2018

[SR-6440] JSONDecoder behaves differently on macOS vs. Linux #3782

Closed

[SR-12974] JSONEncoder encoding non fractional double value as Int64 #3253

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSONSerialization: Use NSDecimalNumber for parsing json numbers. #1655

JSONSerialization: Use NSDecimalNumber for parsing json numbers. #1655

spevans commented Aug 6, 2018

alblue commented Aug 7, 2018

spevans commented Aug 7, 2018

itaiferber left a comment •

edited

itaiferber Aug 7, 2018

itaiferber Aug 7, 2018

itaiferber commented Aug 7, 2018

spevans commented Aug 7, 2018

itaiferber commented Aug 7, 2018

spevans commented Aug 7, 2018

spevans commented Aug 7, 2018

spevans commented Aug 8, 2018

JSONSerialization: Use NSDecimalNumber for parsing json numbers. #1655

JSONSerialization: Use NSDecimalNumber for parsing json numbers. #1655

Conversation

spevans commented Aug 6, 2018

alblue commented Aug 7, 2018

spevans commented Aug 7, 2018

itaiferber left a comment • edited

Choose a reason for hiding this comment

itaiferber Aug 7, 2018

Choose a reason for hiding this comment

itaiferber Aug 7, 2018

Choose a reason for hiding this comment

itaiferber commented Aug 7, 2018

spevans commented Aug 7, 2018

itaiferber commented Aug 7, 2018

spevans commented Aug 7, 2018

spevans commented Aug 7, 2018

spevans commented Aug 8, 2018

itaiferber left a comment •

edited