Skip to content

Proposal: Unified JSON API #268

Open
@samoconnor

Description

@samoconnor

This issue follows Discourse comments about unifying the APIs of JSON.jl, LazyJSON.jl and JSON2.jl.

1. Define julia types for JSON Values

The current API uses Base.String to represent encoded JSON and Base.Dict etc to represent decoded JSON. The functions JSON.parse and JSON.json are used to convert between the two representations. This API restricts the implementation to be non-lazy. It also precludes the possibility of implementing short-cut methods for JSON derived values.

JavaScript Object Notation defines 6 value types.
Defining Julia types to represent these JSON value types will enable us to hide the implementation details (lazy vs eager parsing, encoded string representation vs decoded AST representation, etc).

Treating JSON values as first class types (rather than as something that must be converted to a Base type) allows dispatch on these types and transparent implementation of short-cut methods as needed for efficiency.

const JSON.Value = Union{
    JSON.Object,
    JSON.Array,
    JSON.Number,
    JSON.String,
    JSON.Bool,
    JSON.Null
}

2. Construct JSON Value objects from strings

The implementation might immediately parse encoding strings into Julia collection types, or it might parse to intermediate AST types, or it might just lazily wrap the encoded string. That implementation detail would be hidden from the user (unless there are compelling use cases where user-supplied implementation hints are a big performance win, e.g. a lazy=false option). It seems likely that a combination of sensible defaults and heuristics can achieve good performance in most cases without any need for the user to fiddle with options.

"""
    JSON.Value(::AbstractString)::JSON.Value

Create a JSON object from a JSON formatted string.
"""
julia> x = JSON.Value("""{
           "object": {"field": "value"},
           "array": [1,2,3],
           "number": 43,
           "bool": true,
           "null": null
       }""")

julia> x.object.field
"value"

julia> x.array[1]
1

2. Construct JSON Value objects from julia objects.

The implementation might immediately encode the julia objects to a JSON string, or it might
just wrap them and do nothing, or it might convert them to an intermediate representation.
That detail is hidden from the user.

"""
    JSON.Value(o)::JSON.Value

Create a JSON object from a Julia object.
"""
julia> x = JSON.Value(Dict(
           "object" => Dict("field" => "value"),
           "array" => [1,2,3]
           "number" => 43,
           "bool" => true,
           "null" => nothing
       ))

julia> x.object.field
"value"

julia> x.array[1]
1

3. Use Base.string to produce JSON encoded strings.

Rather than using JSON.json to produce encoded strings, just use Base.string.
Depending on the JSON.Value implementation, string might just return a preexisting encoded string, or it might have to produce an encoded string from an internal representation.

    Base.string(o::JSON.Value)::AbstractString

JSON formatted string representation of a JSON object.
julia> x = JSON.Value(Dict(
           "object" => Dict("field" => "value"),
           "array" => [1,2,3]
           "number" => 43,
           "bool" => true,
           "null" => nothing
       ))
julia> string(x)
"{\"object\":{\"field\":\"value\"},\"array\":[1,2,3],\"number\":43,\"bool\":true,\"null\":null}"

4. Use Base.convert to do direct-to-struct parsing.

e.g. like the direct-to-string parsing feature first implemented in JSON2:

julia> struct MyType
           field
       end
julia> convert(MyType, JSON.Value("""{"field": "value"}""")
MyType("value")

The convert methods would be @generated.

5. Use Base.convert in cases when specific Base types are needed.

Most of the time JSON.Value types that implement AbstractDict, AbstractArray, Base.Real, AbstractString etc are all that the user will need.

In cases where the user wants a specific type, they can use convert:

julia> convert(Vector{Float64}, JSON.Value("[0.25, 0.5, 1, 2, 4, 8]"))
6-element Array{Float64,1}:
 0.25
 0.5
 1.0
 2.0
 4.0
 8.0

6. Backwards compatibility

The existing API could be maintained as follows:

JSON.parse(x; kw...) = JSON.Value(x; kw...)
JSON.json(x) = string(JSON.Value(x))

We could implement parse so that it produces lazy value objects by default and produces non-lazy values only when the dicttype= or inttype= options are supplied. Or we could disable laziness entirely for the JSON.parse interface and say "if you want the new lazy thing, use JSON.Value".

If returning an AbstractDict from parse instead of a Dict causes breakage (or performance regression) in existing code, then we should start out by returning Dict.

7. Implementation

We can cherry-pick implementation detail from the various existing JSON codebases. i.e. Use the fast float decoder from over here, but use the more robust UTF-16 decoder from there.

It might turn out that using the lazy parser is just as fast as the non-lazy one for doing a full non-lazy parse. In that case we may only need one parser. Or, if there are cases where the existing non-lazy parser has big wins, we can keep both. The user should not be able to tell the difference.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions