Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance comparison from readme seems a bit unfair #29

Closed
PawelTroka opened this issue Feb 21, 2022 · 2 comments
Closed

Performance comparison from readme seems a bit unfair #29

PawelTroka opened this issue Feb 21, 2022 · 2 comments
Assignees

Comments

@PawelTroka
Copy link

Hi!

First of all thanks for this library! This sounds like a very good idea.

However, what I noticed is that since it evaluates fields lazily, comparing it directly to other JSON libraries that provide you with the full dictionary right away is a bit unfair.

Assuming you will use the whole dict anyway, IMHO, a more fair comparison would be with .export() call.

>>> _json_string = '{"a fairly": "expensive", "json": "goes-in", "here": 121}'
>>> timeit(lambda: cysimdjson_parser.parse_string(_json_string).export(), number=100000)
3.6677306999990833
>>> timeit(lambda: orjson_parser.loads(_json_string), number=100000)
2.9754124999963096

Then however, it is slower than orjson.

It gets a lot faster if you will not use the whole dictionary.

>>> timeit(lambda: cysimdjson_parser.loads(_json_string)[7]['revisionNumber'], number=100000)
0.4328116000033333
>>> timeit(lambda: orjson_parser.loads(_json_string)[7]['revisionNumber'], number=100000)
3.0126906000004965

However, in my experience this is rarely the case.

@ateska
Copy link
Contributor

ateska commented Mar 3, 2022

Hi,
yes, this is because the conversion to Python dict (and other types) is the most expensive bit of the whole JSON parsing.
The idea behind this is to harvest the raw power of SIMDJSON in Python; not to race against orjson.

It is also - as you point out correctly - not universal replacement, you need to make some trade-offs (read-only parsing output which is not a true Python dictionary).
The message is that: (1) these speeds are possible in Python (2) you need to adjust your design if you want to be in this performance range.

In our case, we parse rather big (10kb) JSONs in very high frequency (>50000 per second), we don't need to access all attributes (by far) and we don't need to modify the dictionary.
For this SIMDJSON is ideal choice.

I'll try to highlight that in the README.

@ateska
Copy link
Contributor

ateska commented Mar 3, 2022

@ateska ateska self-assigned this Mar 3, 2022
@ateska ateska closed this as completed Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants