Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for parsing multiple JSON objects in a single string/stream #344

Open
mcognetta opened this issue May 10, 2022 · 2 comments
Open

Allow for parsing multiple JSON objects in a single string/stream #344

mcognetta opened this issue May 10, 2022 · 2 comments

Comments

@mcognetta
Copy link

Some APIs that accept batch requests return a sequence of separate JSON objects that are not delimited in any way, but by parsing them you can tell they are separate as when one complete JSON object is parsed, the next non-whitespace character will start the next object

For example, you might see an string like {"name":"Marco"} {"name":"Julia"}, representing two distinct JSON objects.

Currently, JSON.jl does not parse this correctly. It errors for the string case, and only parses the first object in the streaming case (without any indication that the stream was not exhausted).

julia> s = "{\"name\":\"Marco\"} {\"name\":\"Julia\"}"
"{\"name\":\"Marco\"} {\"name\":\"Julia\"}"

julia> JSON.parse(s)
ERROR: Expected end of input
Line: 0
Around: ...{"name":"Marco"} {"name":"Julia"}...
                            ^

Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] _error(message::String, ps::JSON.Parser.MemoryParserState)
   @ JSON.Parser ~/.julia/packages/JSON/QXB8U/src/Parser.jl:140
 [3] parse(str::String; dicttype::Type, inttype::Type{Int64}, allownan::Bool, null::Nothing)
   @ JSON.Parser ~/.julia/packages/JSON/QXB8U/src/Parser.jl:453
 [4] parse(str::String)
   @ JSON.Parser ~/.julia/packages/JSON/QXB8U/src/Parser.jl:448
 [5] top-level scope
   @ REPL[8]:1

julia> JSON.parse(IOBuffer(s))
Dict{String, Any} with 1 entry:
  "name" => "Marco"

Under the assumption that all JSON objects in the string have the same dicttype, I believe this can be extended to return a list of parsed objects. My first attempt is:

function parsemany(str::AbstractString;
               dicttype=Dict{String,Any},
               inttype::Type{<:Real}=Int64,
               allownan::Bool=true,
               null=nothing)
    out = Vector{dicttype}()
    pc = _get_parsercontext(dicttype, inttype, allownan, null)
    ps = MemoryParserState(str, 1)
    v = parse_value(pc, ps)
    push!(out, v)
    chomp_space!(ps)
    while hasmore(ps)
        pc = _get_parsercontext(dicttype, inttype, allownan, null)
        v = parse_value(pc, ps)
        push!(out, v)
        chomp_space!(ps)        
    end
    out
end

Example:

julia> JSON.parsemany(s)
2-element Vector{Dict{String, Any}}:
 Dict("name" => "Marco")
 Dict("name" => "Julia")

# correctly errors on a malformed JSON object
julia> JSON.parsemany(s[1:end-1])
ERROR: Unexpected end of input
Line: 0
Around: ...":"Julia"...
                    ^

Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] _error(message::String, ps::JSON.Parser.MemoryParserState)
   @ JSON.Parser ~/.julia/dev/JSON/src/Parser.jl:140
 [3] byteat
   @ ~/.julia/dev/JSON/src/Parser.jl:49 [inlined]
 [4] parse_object(pc::JSON.Parser.ParserContext{Dict{String, Any}, Int64, true, nothing}, ps::JSON.Parser.MemoryParserState)
   @ JSON.Parser ~/.julia/dev/JSON/src/Parser.jl:233
 [5] parse_value(pc::JSON.Parser.ParserContext{Dict{String, Any}, Int64, true, nothing}, ps::JSON.Parser.MemoryParserState)
   @ JSON.Parser ~/.julia/dev/JSON/src/Parser.jl:166
 [6] parsemany(str::String; dicttype::Type, inttype::Type{Int64}, allownan::Bool, null::Nothing)
   @ JSON.Parser ~/.julia/dev/JSON/src/Parser.jl:472
 [7] parsemany(str::String)
   @ JSON.Parser ~/.julia/dev/JSON/src/Parser.jl:464
 [8] top-level scope
   @ REPL[10]:1

# notice the first object is not properly closed
julia> s = "{\"name\":\"Marco\" {\"name\":\"Julia\"}"
"{\"name\":\"Marco\" {\"name\":\"Julia\"}"

# fails to parse
julia> JSON.parsemany(s)
ERROR: Expected ',' here
Line: 0
Around: ...{"name":"Marco" {"name":"Julia"}...
                           ^

Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] _error(message::String, ps::JSON.Parser.MemoryParserState)
   @ JSON.Parser ~/.julia/dev/JSON/src/Parser.jl:140
 [3] _error_expected_char(c::UInt8, ps::JSON.Parser.MemoryParserState)
   @ JSON.Parser ~/.julia/dev/JSON/src/Parser.jl:83
 [4] skip!
   @ ~/.julia/dev/JSON/src/Parser.jl:80 [inlined]
 [5] parse_object(pc::JSON.Parser.ParserContext{Dict{String, Any}, Int64, true, nothing}, ps::JSON.Parser.MemoryParserState)
   @ JSON.Parser ~/.julia/dev/JSON/src/Parser.jl:234
 [6] parse_value(pc::JSON.Parser.ParserContext{Dict{String, Any}, Int64, true, nothing}, ps::JSON.Parser.MemoryParserState)
   @ JSON.Parser ~/.julia/dev/JSON/src/Parser.jl:166
 [7] parsemany(str::String; dicttype::Type, inttype::Type{Int64}, allownan::Bool, null::Nothing)
   @ JSON.Parser ~/.julia/dev/JSON/src/Parser.jl:467
 [8] parsemany(str::String)
   @ JSON.Parser ~/.julia/dev/JSON/src/Parser.jl:464
 [9] top-level scope
   @ REPL[16]:1

# note the second one is not properly opened
julia> s = "{\"name\":\"Marco\"} \"name\":\"Julia\"}"
"{\"name\":\"Marco\"} \"name\":\"Julia\"}"

# fails, though this case should have a better error message in the final version
julia> JSON.parsemany(s)
ERROR: MethodError: Cannot `convert` an object of type String to an object of type Dict{String, Any}
Closest candidates are:
  convert(::Type{T}, ::T) where T<:AbstractDict at abstractdict.jl:520
  convert(::Type{T}, ::AbstractDict) where T<:AbstractDict at abstractdict.jl:522
  convert(::Type{T}, ::T) where T at essentials.jl:205
  ...
Stacktrace:
 [1] push!(a::Vector{Dict{String, Any}}, item::String)
   @ Base ./array.jl:932
 [2] parsemany(str::String; dicttype::Type, inttype::Type{Int64}, allownan::Bool, null::Nothing)
   @ JSON.Parser ~/.julia/dev/JSON/src/Parser.jl:473
 [3] parsemany(str::String)
   @ JSON.Parser ~/.julia/dev/JSON/src/Parser.jl:464
 [4] top-level scope
   @ REPL[18]:1

# works even with no space
julia> s = "{\"name\":\"Marco\"}{\"name\":\"Julia\"}"
"{\"name\":\"Marco\"}{\"name\":\"Julia\"}"

julia> JSON.parsemany(s)
2-element Vector{Dict{String, Any}}:
 Dict("name" => "Marco")
 Dict("name" => "Julia")

Is this an acceptable addition to JSON.jl? One argument on its behalf is that, while a user could split the string themselves, that is basically the same as writing a JSON parser themselves, as they have to correctly handle all of the edge cases, nesting, etc in order to determine where the outermost opening and closing brackets are. Without access to the internal helper methods of JSON.jl, this is a bit of a big ask.

@mcognetta
Copy link
Author

I have noticed that wrapping multiple JSON objects in [ ] with a comma separator causes this to parse correctly:

julia> s = "[{\"name\":\"Marco\"}, {\"name\":\"Julia\"}]"
"[{\"name\":\"Marco\"}, {\"name\":\"Julia\"}]"

julia> JSON.parse(s)
2-element Vector{Any}:
 Dict{String, Any}("name" => "Marco")
 Dict{String, Any}("name" => "Julia")

This still leaves the problem of converting a non-delimited multiple-object JSON string to a comma separated one that can be wrapped in brackets.

@mcognetta
Copy link
Author

Sorry, one more thing. It works if you repeatedly read from a stream:

julia> s = "{\"name\":\"Marco\"} {\"name\":\"Julia\"}"
"{\"name\":\"Marco\"} {\"name\":\"Julia\"}"

julia> stream = IOBuffer(s)
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=33, maxsize=Inf, ptr=1, mark=-1)

julia> JSON.parse(stream)
Dict{String, Any} with 1 entry:
  "name" => "Marco"

julia> JSON.parse(stream)
Dict{String, Any} with 1 entry:
  "name" => "Julia"

I will open a PR to add an example like this to the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant