Performance comparison from readme seems a bit unfair #29

PawelTroka · 2022-02-21T22:37:04Z

Hi!

First of all thanks for this library! This sounds like a very good idea.

However, what I noticed is that since it evaluates fields lazily, comparing it directly to other JSON libraries that provide you with the full dictionary right away is a bit unfair.

Assuming you will use the whole dict anyway, IMHO, a more fair comparison would be with .export() call.

>>> _json_string = '{"a fairly": "expensive", "json": "goes-in", "here": 121}'
>>> timeit(lambda: cysimdjson_parser.parse_string(_json_string).export(), number=100000)
3.6677306999990833
>>> timeit(lambda: orjson_parser.loads(_json_string), number=100000)
2.9754124999963096

Then however, it is slower than orjson.

It gets a lot faster if you will not use the whole dictionary.

>>> timeit(lambda: cysimdjson_parser.loads(_json_string)[7]['revisionNumber'], number=100000)
0.4328116000033333
>>> timeit(lambda: orjson_parser.loads(_json_string)[7]['revisionNumber'], number=100000)
3.0126906000004965

However, in my experience this is rarely the case.

The text was updated successfully, but these errors were encountered:

ateska · 2022-03-03T21:52:54Z

Hi,
yes, this is because the conversion to Python dict (and other types) is the most expensive bit of the whole JSON parsing.
The idea behind this is to harvest the raw power of SIMDJSON in Python; not to race against orjson.

It is also - as you point out correctly - not universal replacement, you need to make some trade-offs (read-only parsing output which is not a true Python dictionary).
The message is that: (1) these speeds are possible in Python (2) you need to adjust your design if you want to be in this performance range.

In our case, we parse rather big (10kb) JSONs in very high frequency (>50000 per second), we don't need to access all attributes (by far) and we don't need to modify the dictionary.
For this SIMDJSON is ideal choice.

I'll try to highlight that in the README.

ateska · 2022-03-03T21:59:20Z

https://github.com/TeskaLabs/cysimdjson/blob/main/README.md#trade-offs

ateska self-assigned this Mar 3, 2022

ateska closed this as completed Mar 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance comparison from readme seems a bit unfair #29

Performance comparison from readme seems a bit unfair #29

PawelTroka commented Feb 21, 2022

ateska commented Mar 3, 2022 •

edited

Loading

ateska commented Mar 3, 2022

Performance comparison from readme seems a bit unfair #29

Performance comparison from readme seems a bit unfair #29

Comments

PawelTroka commented Feb 21, 2022

ateska commented Mar 3, 2022 • edited Loading

ateska commented Mar 3, 2022

ateska commented Mar 3, 2022 •

edited

Loading