Skip to content

[MODULE] - Fast JSON lexer (tokenizer) with no memory footprint and no garbage collector pressure (zero-alloc). 5x faster compared to Go's default encoding/json tokenizer.

License

Notifications You must be signed in to change notification settings

dtgorski/jsonlex

Repository files navigation

Build Status Coverage Status Open Issues Report Card PkgGoDev

jsonlex

Fast JSON lexer (tokenizer) with no memory footprint and no garbage collector pressure (zero heap allocations).

Installation

go get -u github.com/dtgorski/jsonlex

Important

Using an io.Reader that directly uses system calls (e.g. os.File) will result in poor performance. Wrap your input reader with bufio.Reader or better bytes.Reader to achieve best results.

Usage A - iterating behaviour (Cursor)

package main

import (
    "bytes"
    "github.com/dtgorski/jsonlex"
)

func main() {
    reader := bytes.NewReader(
        []byte(`{ "foo": "bar", "baz": 42 }`),
    )

    cursor := jsonlex.NewCursor(reader, nil)

    println(cursor.Curr().String())
    println(cursor.Next().String())

    if !cursor.Next().Is(jsonlex.TokenEOF) {
        println("there is more ...")
    }
}

Usage B - emitting behaviour (Yield)

package main

import (
    "bytes"
    "github.com/dtgorski/jsonlex"
)

func main() {
    reader := bytes.NewReader(
        []byte(`{ "foo": "bar", "baz": 42 }`),
    )

    lexer := jsonlex.NewLexer(
        func(kind jsonlex.TokenKind, load []byte, pos uint) bool {

            save := make([]byte, len(load))
            copy(save, load)

            println(pos, kind, string(save))
            return true
        },
    )

    lexer.Scan(reader)
}

Please note, that the Scan() function is reentrant and subsequent invocations will continue to consume the available byte stream as long as you provide a reader that implements an UnreadByte() error interface, and you configure the Lexer with the LexerOptEnableUnreadBuffer option activated.

Emitted tokens

jsonlex Representation
TokenEOF signals end of file/stream
TokenERR error string (other than EOF)
TokenLIT literal (true, false, null)
TokenNUM float number
TokenSTR "...\"..."
TokenCOL : colon
TokenCOM , comma
TokenLSB [ left square bracket
TokenRSB ] right square bracket
TokenLCB { left curly brace
TokenRCB } right curly brace

Artificial benchmarks

Each benchmark consists of complete tokenization of a JSON document of a given size (2kB, 20kB, 200kB and 2000kB) using one CPU core. The unit doc/s means tokenized documents per second, so more is better. The comparison candidate is Go's encoding/json.Decoder.Token() implementation.

2kB 20kB 200kb 2000kB
encoding/json 9910 doc/s 1152 doc/s 126 doc/s 14 doc/s
dtgorski/jsonlex 71880 doc/s 7341 doc/s 753 doc/s 85 doc/s
cpus: 1 core (~8000 BogoMIPS)
goos: linux
goarch: amd64
pkg: github.com/dtgorski/jsonlex/bench

Benchmark_encjson_2kB              9910     120475 ns/op      36528 B/op      1963 allocs/op
Benchmark_encjson_20kB             1152    1040771 ns/op     318432 B/op     18231 allocs/op
Benchmark_encjson_200kB             126    9494534 ns/op    2877968 B/op    164401 allocs/op
Benchmark_encjson_2000kB             14   77593586 ns/op   23355856 B/op   1319126 allocs/op

Benchmark_jsonlex_lexer_2kB       71880      16691 ns/op          0 B/op         0 allocs/op
Benchmark_jsonlex_lexer_20kB       7341     163210 ns/op          0 B/op         0 allocs/op
Benchmark_jsonlex_lexer_200kB       753    1594025 ns/op          0 B/op         0 allocs/op
Benchmark_jsonlex_lexer_2000kB       85   14107866 ns/op          0 B/op         0 allocs/op

Benchmark_jsonlex_cursor_2kB      38002      31776 ns/op       3680 B/op       592 allocs/op
Benchmark_jsonlex_cursor_20kB      4058     300490 ns/op      25168 B/op      5446 allocs/op
Benchmark_jsonlex_cursor_200kB      422    2777058 ns/op     248816 B/op     49141 allocs/op
Benchmark_jsonlex_cursor_2000kB      50   23559879 ns/op    2254896 B/op    396298 allocs/op

Disclaimer

The implementation and features of jsonlex follow the YAGNI principle. There is no claim for completeness or reliability.

@dev

Try make:

$ make

 make help       Displays this list
 make clean      Removes build/test artifacts
 make test       Runs integrity test with -race
 make bench      Executes artificial benchmarks
 make prof-cpu   Creates CPU profiler output
 make prof-mem   Creates memory profiler output
 make sniff      Checks format and runs linter (void on success)
 make tidy       Formats source files, cleans go.mod

License

MIT - © dtg [at] lengo [dot] org

About

[MODULE] - Fast JSON lexer (tokenizer) with no memory footprint and no garbage collector pressure (zero-alloc). 5x faster compared to Go's default encoding/json tokenizer.

Resources

License

Stars

Watchers

Forks