JSONStream is a streaming JSON parser for Go. It's useful if you want to search through a JSON input stream without parsing all of it, or if you want precise control over how the input stream is parsed.
Streaming parsers are more difficult to use than parsers which automatically construct a data structure from the input. This library is not recommended for general-purpose JSON processing.
- Simple iterator-based API.
- Line and column info for all tokens.
- Extensive test suite (including fuzz tests and the JSONTestSuite).
- Choice of behavior for numeric literals outside the range of
float64
orint
. - Optional support for JavaScript-style comments and trailing commas.
- Reports errors for all invalid JSON. You need only verify that the JSON has the required structure.
- Simple path API that can be used to search for values at a given path.
- Assumes UTF-8 input.
- Surrogate pair escape sequences decoded correctly (e.g.
"\uD834\uDD1E"
decodes to UTF-8[]byte{0xf0, 0x9d, 0x84, 0x9e}
, i.e. '𝄞').
Create a parser:
var p jsonstream.Parser
Optionally change the default configuration (default is strict JSON):
p.AllowComments = true
p.AllowTrailingCommas = true
Call the Tokenize
method with a byte slice to obtain an
iterator over a sequence of tokens:
for tok := range p.Tokenize(input) {
...
}
If you would prefer to pull tokens one-by-one rather than looping, you can use
iter.Pull
.
Errors are reported via error tokens, for which IsError(token.Kind)
is true
and token.AsError()
returns a non-nil error
value. These tokens have their
ErrorMsg
field set. JSONStream does not automatically halt on errors.
JSONStream always yields at least one error token for any input that is not
valid JSON. This includes input with mismatched {}[]
.
The JSON standard specifies only the syntactic format of numeric literals. The interpretation of very large and very small values may therefore vary. JSONStream does not automatically parse numeric literals and so does not force any particular handling of out of range literals or other edge cases.
The convenience methods AsInt
, AsInt32
, AsInt64
, AsFloat32
, and
AsFloat64
are provided for parsing numeric values. These methods add decode
errors to the associated Parser
object if a value is out of range. Decode
errors can be accessed and manipulated via the PopDecodeErrorIf
,
DecodeError
, and LastDecodeError
methods of Parser
.
If none of the As*
methods has the desired behavior, the Value
field of a
Token
struct may be accessed directly in order to implement custom parsing of
numeric values.
The sequence of tokens for the array [1,2,3]
is as follows:
Token{Kind: ArrayStart, ...}
Token{Kind: Number, Value: []byte("1"), ...}
Token{Kind: Number, Value: []byte("2"), ...}
Token{Kind: Number, Value: []byte("3"), ...}
Token{Kind: ArrayEnd, ...}
Within an object each token represents a value. The associated key is
obtained via the Key
field. The sequence of tokens for the object
{"foo": "bar", "baz": "amp"}
is as follows:
Token{Kind: ObjectStart, ...}
Token{Kind: String, Key: []byte("foo"), Value: []byte("bar"), ...}
Token{Kind: String, Key: []byte("baz"), Value: []byte("amp"), ...}
Token{Kind: ObjectEnd, ...}
The KeyAsString
method can be used to obtain a token's key as a string.
Each token has Line
and Col
fields for the start of the token, and Start
and End
fields giving the indices of the first and last byte of the token in
the input.
JSONStream is written in a simple and straightforward style. It should perform
acceptably for most purposes, but it is not intended to be an ultra high
performance parsing library (such as e.g.
json-iterator). Informal benchmarking
suggests that performance is a little better than encoding/json
(though much
depends on whether and how you construct a parsed representation of the input).
import (
"errors"
"github.com/addrummond/jsonstream"
)
func parseIntArray(input []byte) ([]int, error) {
state := 0
ints := make([]int, 0)
var p jsonstream.Parser
for t := range p.Tokenize(input) {
if err := t.AsError(); err != nil {
return nil, err
}
if state == 0 {
state++
if t.Kind != jsonstream.ArrayStart {
return nil, errors.New("Expected opening '['")
}
continue
}
if t.Kind == jsonstream.ArrayEnd {
return ints, nil
}
if t.Kind == jsonstream.Number {
ints = append(ints, t.AsInt())
continue
}
return nil, errors.New("Expected integer or closing ']'")
}
return ints, p.DecodeError()
}
import (
"errors"
"github.com/addrummond/jsonstream"
)
func parseObjectWithStringValues(input []byte) (map[string]string, error) {
state := 0
var p jsonstream.Parser
dict := make(map[string]string)
for t := range p.Tokenize(input) {
if err := t.AsError(); err != nil {
return nil, err
}
if state == 0 {
state++
if t.Kind != jsonstream.ObjectStart {
return nil, errors.New("Expected opening '{'")
}
continue
}
if t.Kind == jsonstream.ObjectEnd {
return dict, nil
}
if t.Kind == jsonstream.String {
dict[t.KeyAsString()] = t.AsString()
continue
}
return nil, errors.New("Expected string or closing '}'")
}
return dict, p.DecodeError()
}
import (
"errors"
"github.com/addrummond/jsonstream"
)
// Example call:
//
// findByPath(
// []byte(`{"a": {"b": {"c": 1}}}`),
// []any{"a", "b", "c"}
// ) // returns "1", nil
func findByPath(input []byte, path []any) (string, error) {
var p jsonstream.Parser
for twp := range jsonstream.WithPaths(p.Tokenize(input)) {
if err := twp.Token.AsError(); err != nil {
return "", err
}
if jsonstream.PathEquals(twp.Path, path) {
return string(twp.Token.Value), nil
}
}
return "", errors.New("path not found")
}