Skip to content

andreyorst/cheat-json

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cheat JSON

If only edn/read was faster

A library that cheats and parses JSON as EDN.

Here's a typical JSON:

{"foo": "bar", "baz": [{"quux": 42}]}

And here's an EDN, describing the same thing:

{"foo" "bar", "baz" [{"quux" 42}]}

See the difference?

This library simply reads a stream of bytes and removes all : outside of the strings, then parses the result as EDN with clojure.edn/read.

Wait, that's illegal!

Yes! And also kinda slow. Some benchmarks show that this library is slightly faster than org.clojure/data.json version 2.5.0, but that's not the reason to use this library over any other implementation. I'd advise against using it in general.

Moreover, some things aren't parsed conventionally, e.g. null in JSON is parsed as a symbol null, and not as nil. Null handling can be implemented, but it requires more logic in the reader wrapper. Apart from what the EDN parser does, no validation or error handling is done by this library. A fun side effect of this approach, is that invalid JSON files, such as ones that are missing a colon or a comma are still parsed by this library.

Development

The library implements a thin wrapper around any given reader that tracks the state of whether the parser is currently in string or not. While outside the string, all : and spacious characters are ignored by the reader. All other characters are fed to the EDN reader. The wrapper is implemented in Java because doing the same via proxy is almost twice as slow, even with a mutable Java array holding state flags.

Before the library can be used in the REPL, the Java sources must be compiled with clojure -T:build compile-java.

The :dev alias provides some extra libraries, such as other JSON parsing libraries.

Benchmarks

You can run the benchmarks across many libraries in the andreyorst.cheat-json-benchmark namespace. Here are the results for parsing a 58M JSON on AMD Ryzen™ 7 3750H with Clojure 1.11.1, Java 17.0.9 via criterium 0.4.6. The JSON is generated from EDN, created with clojure.test.check generators, and transformed to JSON via the clojure.data.json library (all parsers produce the same EDN as a result):

Library/function Mean time Version Commentary
cheshire.core/parse-stream-strict 665.739489 ms 5.12.0 parse-stream is lazy, so the -strict version is used
jsonista.core/read-value 670.450012 ms 0.3.8
clj-json.core/parse-string 1.090168 sec 0.5.3 doesn't seem to have a stream parser
charred.api/read-json 1.141876 sec 1.034
clojure.edn/read 1.751539 sec 1.11.1 Reading raw EDN is faster than the conversion
cheat-json/read 2.598094 sec 0.2.3 cheat-json converts JSON to EDN.
clojure.data.json/read 3.656397 sec 2.5.0
pjson.core/read-str - 1.0.0 can't handle the generated JSON

A more conventional, and less nested JSON files show a bit different results:

Library/function Mean time (64KB JSON) Mean time (5MB JSON)
cheshire.core/parse-stream-strict 792.361051 µs 61.476305 ms
jsonista.core/read-value 720.986034 µs 51.473312 ms
clj-json.core/parse-string 317.119725 µs 27.490830 ms
charred.api/read-json 970.481054 µs 78.123860 ms
clojure.edn/read 2.414766 ms 207.465807 ms
cheat-json/read 2.548120 ms 230.911907 ms
clojure.data.json/read 1.773132 ms 159.568319 ms
pjson.core/read-str 88.381119 µs 7.027914 ms

The difference between raw EDN read and Cheat-JSON wrapper is less dramatic.

License

Copyright © 2022 Andrey Listopadov

Distributed under the Eclipse Public License version 1.0.

About

Cheating JSON parser

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published