Skip to content

3 Digging Deeper: JSON Parser

davedufresne edited this page Nov 25, 2015 · 3 revisions

Introduction

In this tutorial we will build a JSON parser using a tokenizer. SwiftParsec provides a type, GenericTokenParser, that helps us to build tokenizers. The JSON format is defined in RFC 4627. The parser will generate a data structure built from an enum representing the different JSON values. Here is the complete code (you can skip to the explanation if you feel dazed and confused):

import SwiftParsec

public enum JSONValue {
    
    case JString(String)
    case JNumber(Double)
    case JBool(Bool)
    case JNull
    case JArray([JSONValue])
    case JObject([String: JSONValue])
    case Error
    
    public static let parser: GenericParser<String, (), JSONValue> = {
        
        let json = LanguageDefinition<()>.json
        let lexer = GenericTokenParser(languageDefinition: json)
        
        let symbol = lexer.symbol
        let stringLiteral = lexer.stringLiteral
        
        let jstring = JSONValue.JString <^> stringLiteral
        let jnumber = JSONValue.JNumber <^>
            (lexer.float.attempt <|> lexer.integerAsFloat)
        
        let trueValue = symbol("true") *> GenericParser(result: true)
        let falseValue = symbol("false") *> GenericParser(result: false)
        let jbool = JSONValue.JBool <^> (trueValue <|> falseValue)
        
        let jnull = symbol("null") *> GenericParser(result: JSONValue.JNull)
        
        var jarray: GenericParser<String, (), JSONValue>!
        var jobject: GenericParser<String, (), JSONValue>!
        
        GenericParser.recursive { (jvalue: GenericParser<String, (), JSONValue>) in
            
            let jarrayValues = lexer.commaSeparated(jvalue)
            jarray = JSONValue.JArray <^> lexer.brackets(jarrayValues)
            
            let nameValue: GenericParser<String, (), (String, JSONValue)> =
            stringLiteral >>- { name in
                
                symbol(":") *> jvalue.map { value in (name, value) }
                
            }
            
            let dictionary: GenericParser<String, (), [String: JSONValue]> =
            (symbol(",") *> nameValue).manyAccumulator { (assoc, var dict) in
                
                let (name, value) = assoc
                dict[name] = value
                
                return dict
                
            }
            
            let jobjectDict: GenericParser<String, (), [String: JSONValue]> =
            nameValue >>- { assoc in
                
                dictionary >>- { (var dict) in
                    
                    let (name, value) = assoc
                    dict[name] = value
                    
                    return GenericParser(result: dict)
                    
                }
                
            }
            
            let jobjectValues = jobjectDict <|> GenericParser(result: [:])
            jobject = JSONValue.JObject <^> lexer.braces(jobjectValues)
            
            return jstring <|> jnumber <|> jbool <|> jnull <|> jarray <|> jobject
            
        }
        
        return lexer.whiteSpace *> (jobject <|> jarray)
        
    }()
    
    public init(data: String) throws {
        
        try self = JSONValue.parser.run(sourceName: "", input: data)
        
    }
    
    public var string: String? {
        
        guard case .JString(let str) = self else { return nil }
        
        return str
        
    }
    
    public var double: Double? {
        
        guard case .JNumber(let dbl) = self else { return nil }
        
        return dbl
        
    }
    
    public var bool: Bool? {
        
        guard case .JBool(let b) = self else { return nil }
        
        return b
        
    }
    
    public var isNull: Bool {
        
        if case .JNull = self { return true }
        
        return false
        
    }
    
    public subscript(name: String) -> JSONValue {
        
        guard case .JObject(let dict) = self,
            let value = dict[name] else { return .Error }
        
        return value
        
    }
    
    public subscript(index: Int) -> JSONValue {
        
        guard case .JArray(let arr) = self where
            index >= 0 && index < arr.count else { return .Error }
        
        return arr[index]
        
    }
    
}

Following is an explanation of the parser line by line.

let json = LanguageDefinition<()>.json
let lexer = GenericTokenParser(languageDefinition: json)

We start by selecting the JSON language definition that will be used to parameterize the tokenizer. SwiftParsec provides other language definitions and an empty definition. The empty definition is used as a the basis for all other definitions.

let symbol = lexer.symbol
let stringLiteral = lexer.stringLiteral

symbol parses symbols and skip any trailing white space. stringLiteral applies to strings and takes care of any escaped characters.

let jstring = JSONValue.JString <^> stringLiteral

This line of code builds a parser that will parse a string literal and return a value of JSONValue.JString associated with the parsed string. The operator '<^>' is a synonym for the map function of the GenericParser type. We could have achieved the same goal with stringLiteral.map { JSONValue.JString($0) }.

let jnumber = JSONValue.JNumber <^>
    (lexer.float.attempt <|> lexer.integerAsFloat)

The representation of numbers in JSON contains an integer component that may be prefixed with an optional minus sign, which may be followed by a fraction part and/or an exponent part. We comply to this requirement by using two parsers provided by GenericTokenParser: float and integerAsFloat. float parses the string representation of a floating point number and converts the result to a double. integerAsFloat parses the string representation of an integer and converts the result to a double. We combine these two parsers with the '<|>' operator and map the result by applying the JSONValue.JNumber initializer.

let trueValue = symbol("true") *> GenericParser(result: true)
let falseValue = symbol("false") *> GenericParser(result: false)
let jbool = JSONValue.JBool <^> (trueValue <|> falseValue)

This parser applies to Bool values and is constructed in a similar way to the previous one. Two parsers, one applying to "true" and another one applying to "false", are combined and mapped using JSONValue.JBool.

let jnull = symbol("null") *> GenericParser(result: JSONValue.JNull)

This is a simple one. It parses the string "null" and return a JSONValue.JNull as result.

var jarray: GenericParser<String, (), JSONValue>!
var jobject: GenericParser<String, (), JSONValue>!

Objects and arrays are trickier because they can contain all the possible JSON values, including themselves. That is why we will have to create recursive parsers. For now we only define the variables for these two parsers and we will use them later.

GenericParser.recursive { (jvalue: GenericParser<String, (), JSONValue>) in

The GenericParser.recursive method can be used to build recursive parsers. It takes as parameter a function that will be passed a placeholder parser. This placeholder is initialized with the result returned by the function.

let jarrayValues = lexer.commaSeparated(jvalue)
jarray = JSONValue.JArray <^> lexer.brackets(jarrayValues)

Our first recursive parser will parse JSON arrays. We first use the commaSeparated parser to parse zero or more occurrence of JSONValues separated by commas. Any trailing white space after each comma is skipped. Then we use the brackets parser to enclose jarrayValues in brackets.

let nameValue: GenericParser<String, (), (String, JSONValue)> =
stringLiteral >>- { name in

    symbol(":") *> jvalue.map { value in (name, value) }

}

JSON objects are a bit more complicated to parse. We start by defining a parser that will take care of name/value pairs. stringLiteral parses the name, then we use the '>>-' operator (pronounce 'bind') to combine it with another parser. '>>-' is an infix operator for GenericParser.flatMap. It takes a parser on its left side and a function returning a parser on its right side. The function receives the result of the left parser as a parameter value. Our unnamed closure returns a parser applying to a colon followed by any JSON value mapped to a tuple containing the name and the value. In a few words, the nameValue parser will parse a name/value pair and return a tuple containing the name/value as result.

let dictionary: GenericParser<String, (), [String: JSONValue]> =
(symbol(",") *> nameValue).manyAccumulator { (assoc, var dict) in

    let (name, value) = assoc
    dict[name] = value

    return dict

}

Here we build a parser that will parse all the name/value pairs of a JSON object except the first pair. The manyAccumulator combinator is passed a function that process the result of the combined parser. This function has two parameters, the first is the result of the current parse and the second the accumulated result of the previous parses returned by the function itself.

let jobjectDict: GenericParser<String, (), [String: JSONValue]> =
nameValue >>- { assoc in

        dictionary >>- { (var dict) in

        let (name, value) = assoc
        dict[name] = value

        return GenericParser(result: dict)

    }

}

This parser will apply to a name/value pair followed by zero or more name/value pairs.

let jobjectValues = jobjectDict <|> GenericParser(result: [:])
jobject = JSONValue.JObject <^> lexer.braces(jobjectValues)

The JSON specification states that an object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). But jobjectDict parses one or more name/value pairs. We fix this problem by combining it with a parser that always succeeds and returns an empty JSON object.

return jstring <|> jnumber <|> jbool <|> jnull <|> jarray <|> object

Finally, the closure passed to the recursive function returns a parser that parses all the possible JSON values. This is the parser assigned to the jvalue placeholder.

return lexer.whiteSpace *> (jobject <|> array)

Another requirement of the specification is that a JSON text is a serialized object or array. That is why we parse either an object or an array after having skipped any white space.

Given the following JSON text:

{
    "Image": {
        "Width":  800,
        "Height": 600,
        "Title":  "View from 15th Floor",
        "Thumbnail": {
            "Url":    "http://www.example.com/image/481989943",
            "Height": 125,
            "Width":  "100"
        },
        "IDs": [116, 943, 234, 38793]
    }
}

The parser will output:

JObject([
    "Image": JSONValue.JObject([
        "Title": JSONValue.JString("View from 15th Floor"),
        "Height": JSONValue.JNumber(600.0),
        "Thumbnail": JSONValue.JObject([
            "Width": JSONValue.JString("100"),
            "Height": JSONValue.JNumber(125.0),
            "Url": JSONValue.JString("http://www.example.com/image/481989943")
        ]),
        "Width": JSONValue.JNumber(800.0),
        "IDs": JSONValue.JArray([
            JSONValue.JNumber(116.0),
            JSONValue.JNumber(943.0),
            JSONValue.JNumber(234.0),
            JSONValue.JNumber(38793.0)
        ])
    ])
])

And to retrieve individual values you could do something like:

if let thumbnailHeight = result["Image"]["Thumbnail"]["Height"].double {

    print(thumbnailHeight)

}

Of course, you can add many functionalities to the data structure representing the JSON text. As an example: better error reporting when a path does not exist, properties to retrieve values as NSURL, Int, float, etc.

SwiftParsec provides many more parsers than what we have seen. It also provides an expression parser and a library for parsing permutation phrases. You are encouraged to browse the code and the unit tests to understand how to use them.

Clone this wiki locally