Skip to content

option to allow trailing characters while parsing#439

Merged
quinnj merged 3 commits intomasterfrom
tan/allowtrailing
Mar 17, 2026
Merged

option to allow trailing characters while parsing#439
quinnj merged 3 commits intomasterfrom
tan/allowtrailing

Conversation

@tanmaykm
Copy link
Copy Markdown
Member

@tanmaykm tanmaykm commented Mar 10, 2026

This exposes the existing internal isroot parameter of _lazy as a keyword argument on JSON.lazy (and by extension JSON.parse).

By default, isroot=true, which means the parser expects the entire buffer to be a single valid JSON value — trailing characters after the root value will raise an error. Setting isroot=false parses only the first JSON value from the buffer and silently ignores any trailing characters.

Useful for:

  • Parsing buffers that contain multiple concatenated JSON objects without a delimiter, e.g. {"a":1}{"b":2}
  • Parsing JSON followed by non-JSON content, e.g. {"a":1} : some annotation...
  • Restoring the pre-1.x behavior of this package
julia> JSON.parse("{\"hello\": \"world\"} extra stuff", isroot=false)
  JSON.Object with 1 entry:
  "hello" => "world"

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.26%. Comparing base (f4fbb5a) to head (5b0ddcd).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #439   +/-   ##
=======================================
  Coverage   90.26%   90.26%           
=======================================
  Files           7        7           
  Lines        1366     1366           
=======================================
  Hits         1233     1233           
  Misses        133      133           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This adds an option `allowtrailing` to tolerate additional trailing characters in the buffer while parsing json. It is off by default, which keeps the parser strict and tries to parse the entire buffer as json. But when it is switched on, it allows parsing a valid json from the beginning of the buffer and ignore any additional following characters if they are present.

This is useful in parsing scenarios that contain multiple json objects without a delimiter. E.g. `{"name": "value"}{"name": "value"}`. Or a json followed by other characters. E.g. `{"name": "value"} : this is...`.

This also matches the pre 1.x behavior of this package.
@tanmaykm tanmaykm force-pushed the tan/allowtrailing branch from 8b6b362 to 3c7c03c Compare March 10, 2026 03:16
@tanmaykm tanmaykm requested a review from quinnj March 15, 2026 03:47
@quinnj
Copy link
Copy Markdown
Member

quinnj commented Mar 16, 2026

I don't love introducing a new keyword argument/option for this, especially when we have the isroot property right there. I think I'd prefer allowing passing isroot=false as a keyword arg that would get passed down and then we'd have:

function lazy(buf::Union{AbstractVector{UInt8}, AbstractString}; isroot::Bool=true, kw...)
    if !applicable(pointer, buf, 1) || (buf isa AbstractVector{UInt8} && !isone(only(strides(buf))))
        if buf isa AbstractString
            buf = String(buf)
        else
            buf = Vector{UInt8}(buf)
        end
    end
    len = getlength(buf)
    if len == 0
        error = UnexpectedEOF
        pos = 0
        @goto invalid
    end
    pos = 1
    # detect and error on UTF-16LE BOM
    if len >= 2 && getbyte(buf, pos) == 0xff && getbyte(buf, pos + 1) == 0xfe
        error = InvalidUTF16
        @goto invalid
    end
    # detect and error on UTF-16BE BOM
    if len >= 2 && getbyte(buf, pos) == 0xfe && getbyte(buf, pos + 1) == 0xff
        error = InvalidUTF16
        @goto invalid
    end
    # detect and ignore UTF-8 BOM
    pos = (len >= 3 && getbyte(buf, pos) == 0xef && getbyte(buf, pos + 1) == 0xbb && getbyte(buf, pos + 2) == 0xbf) ? pos + 3 : pos
    @nextbyte
    return _lazy(buf, pos, len, b, LazyOptions(; kw...), isroot)

@label invalid
    invalid(error, buf, pos, Any)
end

the main differences being that we "capture" the isroot::Bool=true keyword arg (so it isn't passed down to _lazy and then we construct the LazyValue w/ user-provided isroot.

@tanmaykm
Copy link
Copy Markdown
Member Author

Thanks @quinnj , this is much cleaner. I have updated the PR with your suggestion.

@quinnj
Copy link
Copy Markdown
Member

quinnj commented Mar 17, 2026

Looking pretty good; will you also add this into the docs for JSON.parse in the parse.jl file? (we have several of the JSON.lazy keyword args repeated there). Then if you include a minor version bump, we can merge and release.

@tanmaykm
Copy link
Copy Markdown
Member Author

Done. Thanks!

@quinnj quinnj merged commit 4286a94 into master Mar 17, 2026
11 checks passed
@quinnj quinnj deleted the tan/allowtrailing branch March 17, 2026 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants