MercuryJson: Multi-Threaded JSON Parsing with SIMD
This repository contains the source code for our course project in CMU 15-618: Parallel Computer Architecture and Programming.
The project is still a proof-of-concept. The currently supported functions are parsing/validating and pretty-printing. We plan to add an iterator interface soon.
MercuryJson is a fast JSON parser optimized for parsing very large documents. The idea is based mainly on the two-stage parsing framework of simdjson. Our main contribution is that we parallelized the second stage using multi-threading.
Benchmarks show that we achieve considerable speedup on large (> 500MB) documents and comparable performance on most small (< 3MB) documents.
For a detailed description of the algorithms and benchmarks, please refer to our report.
To build MercuryJson, you will need:
- CMake version 3.0 and after.
- C++ compiler supporting the C++17 standard.
- Linux or macOS. Windows is not yet supported.
- An Intel CPU supporting the AVX2 instruction set.
Building commands are:
git clone https://github.com/Somefive/MercuryJson cd MercuryJson mkdir build && cd build cmake .. make
This will generate a binary named
main under the
build directory. This program is used for benchmarking: it reports timing for parsing of the given document. Here is an example output:
$ ./main ../data/large/citylots.json File size: 189778220 Structural characters: 33395428 Iteration 0: stage 1 runtime: 0.197731 s, stage 2 runtime: 0.179965 s Iteration 1: stage 1 runtime: 0.208775 s, stage 2 runtime: 0.173601 s Iteration 2: stage 1 runtime: 0.196363 s, stage 2 runtime: 0.171210 s Iteration 3: stage 1 runtime: 0.199372 s, stage 2 runtime: 0.171221 s Iteration 4: stage 1 runtime: 0.207167 s, stage 2 runtime: 0.173756 s Iteration 5: stage 1 runtime: 0.194635 s, stage 2 runtime: 0.173653 s Iteration 6: stage 1 runtime: 0.208866 s, stage 2 runtime: 0.175466 s Iteration 7: stage 1 runtime: 0.196667 s, stage 2 runtime: 0.169261 s Iteration 8: stage 1 runtime: 0.193073 s, stage 2 runtime: 0.171227 s Iteration 9: stage 1 runtime: 0.192630 s, stage 2 runtime: 0.168801 s Average runtime: 0.372344 s, speed: 486.07 MB/s Average stage 1 runtime: 0.199528 s (53.59 %), stage 2 runtime: 0.172816 s (46.41 %) Best runtime: 0.361431 s, speed: 500.75 MB/s
All configurable flags are stored in
src/flags.h. Note that the number of threads to use is hardcoded at compile time.
The following features are not yet supported by our parser:
- Null characters (
'\0') within strings; currently we use null-terminated C-style strings.
- Conversion & validation of escaped Unicode characters.
- Comments (
The following incorrect JSON fragments are accepted by our parser:
- Unescaped control characters within strings.
- Invalid escape sequences.
- Escaped characters outside strings.
For detailed discussion on JSON standards, please see JSON Test Suite.