option to allow trailing characters while parsing#439
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #439 +/- ##
=======================================
Coverage 90.26% 90.26%
=======================================
Files 7 7
Lines 1366 1366
=======================================
Hits 1233 1233
Misses 133 133 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This adds an option `allowtrailing` to tolerate additional trailing characters in the buffer while parsing json. It is off by default, which keeps the parser strict and tries to parse the entire buffer as json. But when it is switched on, it allows parsing a valid json from the beginning of the buffer and ignore any additional following characters if they are present.
This is useful in parsing scenarios that contain multiple json objects without a delimiter. E.g. `{"name": "value"}{"name": "value"}`. Or a json followed by other characters. E.g. `{"name": "value"} : this is...`.
This also matches the pre 1.x behavior of this package.
8b6b362 to
3c7c03c
Compare
|
I don't love introducing a new keyword argument/option for this, especially when we have the function lazy(buf::Union{AbstractVector{UInt8}, AbstractString}; isroot::Bool=true, kw...)
if !applicable(pointer, buf, 1) || (buf isa AbstractVector{UInt8} && !isone(only(strides(buf))))
if buf isa AbstractString
buf = String(buf)
else
buf = Vector{UInt8}(buf)
end
end
len = getlength(buf)
if len == 0
error = UnexpectedEOF
pos = 0
@goto invalid
end
pos = 1
# detect and error on UTF-16LE BOM
if len >= 2 && getbyte(buf, pos) == 0xff && getbyte(buf, pos + 1) == 0xfe
error = InvalidUTF16
@goto invalid
end
# detect and error on UTF-16BE BOM
if len >= 2 && getbyte(buf, pos) == 0xfe && getbyte(buf, pos + 1) == 0xff
error = InvalidUTF16
@goto invalid
end
# detect and ignore UTF-8 BOM
pos = (len >= 3 && getbyte(buf, pos) == 0xef && getbyte(buf, pos + 1) == 0xbb && getbyte(buf, pos + 2) == 0xbf) ? pos + 3 : pos
@nextbyte
return _lazy(buf, pos, len, b, LazyOptions(; kw...), isroot)
@label invalid
invalid(error, buf, pos, Any)
endthe main differences being that we "capture" the |
|
Thanks @quinnj , this is much cleaner. I have updated the PR with your suggestion. |
|
Looking pretty good; will you also add this into the docs for JSON.parse in the parse.jl file? (we have several of the JSON.lazy keyword args repeated there). Then if you include a minor version bump, we can merge and release. |
|
Done. Thanks! |
This exposes the existing internal isroot parameter of
_lazyas a keyword argument onJSON.lazy(and by extensionJSON.parse).By default,
isroot=true, which means the parser expects the entire buffer to be a single valid JSON value — trailing characters after the root value will raise an error. Settingisroot=falseparses only the first JSON value from the buffer and silently ignores any trailing characters.Useful for:
{"a":1}{"b":2}{"a":1} : some annotation...