Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use json instead of ujson in tests #1255

Merged
merged 5 commits into from
Jul 17, 2017
Merged

Conversation

elelay
Copy link
Contributor

@elelay elelay commented Jun 5, 2017

Fixes #1238

  • Add documentation.
  • Add tests.
  • Add a changelog entry.
  • Add your name in the contributors file.
  • not changed the HTTP API

r? @glasserc @Natim

to catch bytes instead of str passed to storage.
Fixes Kinto#1238.

A new boolean settings `storage_strict_json` is introduced
to configure json instead of ujson in tests but keep ujson
in prod. `strict_json` is True by default in `StorageBase`
constructor, but False by default in `load_from_config`.

strict_json is still not in effect in:
 - test_main.py,
 - tests/core/test_initialization.py
 - tests/plugins/test_{history,quotas}.py#test_a_statsd...
@elelay elelay changed the title 1238 json in tests use json instead of ujson in tests Jun 5, 2017
Copy link
Contributor

@leplatrem leplatrem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't followed the whole conversation about this, but the approach here adds a bit of complexity (new setting, new storage behaviour, ...).

Are we still sure that ujson brings us enough benefit compared to python 3.6 json native module? Maybe we could drop it completely.

What do you think?

CHANGELOG.rst Outdated
@@ -40,6 +40,7 @@ This document describes changes between each past release.

- Make memory storage consistent with PostgreSQL with regard to bytes (#1237)
- Some minor cleanups about the use of kinto.readonly (#1241)
- use json instead of ujson in storage in tests (#1255)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Use (capitalized)

@@ -17,7 +17,8 @@ class PluginSetup(unittest.TestCase):
def test_a_statsd_timer_is_used_for_history_if_configured(self):
settings = {
"statsd_url": "udp://127.0.0.1:8125",
"includes": "kinto.plugins.history"
"includes": "kinto.plugins.history",
"storage_strict_json": True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to change this for the history plugin tests?

Copy link
Member

@Natim Natim Jun 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? The goal is to make sure the JSON used by the history public are serializable. It should not hurt to activate it.

@@ -122,6 +122,8 @@ Storage
+------------------------------+-------------------------------+--------------------------------------------------------------------------+
| kinto.storage_max_backlog | ``-1`` | Number of threads that can be in the queue waiting for a connection. |
+------------------------------+-------------------------------+--------------------------------------------------------------------------+
| kinto.storage_strict_json | ``False`` | (test only) validate that records are serializable using the json module.|
+------------------------------+-------------------------------+--------------------------------------------------------------------------+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is meant to be used in tests only I don't think it's necessary to document it here. No?

@Natim
Copy link
Member

Natim commented Jun 6, 2017

Are we still sure that ujson brings us enough benefit compared to python 3.6 json native module? Maybe we could drop it completely.

Kinto speed benefit heavily from the use of ujson. We should definitely keep it.

@Natim
Copy link
Member

Natim commented Jun 6, 2017

I haven't followed the whole conversation about this

The rationale is that PostgreSQL backend is using ujson while the memory backend was not. We discovered that sometime ujson doesn't behave exactly as the json module.

@elelay upgraded the memory backend to use ujson so that both backend behave the same way with regards to JSON handling.

To be honest I would just use ujson everywhere without adding a new setting.

@leplatrem
Copy link
Contributor

leplatrem commented Jun 6, 2017

Kinto speed is due to the use of ujson. We should definitely keep it.

Do we have numbers? We did benchmarks 2 years ago with Python 2.7. I continuously read about python 3.6 super performances. We should probably confirm that with updated figures.

To be honest I would just use ujson everywhere without adding a new setting.

+1 (all or nothing)

@Natim
Copy link
Member

Natim commented Jun 6, 2017

We should probably confirm that with updated figures.

Sure if you feel like doing so.

I continuously read about python 3.6 super performances.

We are not using Python 3.6 yet in production as far as I know. But I really doubt JSON handling in Python could be faster than in libc.

@glasserc
Copy link
Contributor

glasserc commented Jun 6, 2017

The problem is that the behavior of ujson is arguably broken and lets a developer accidentally serialize things that shouldn't be serializable. Any arbitrary object will be serialized by using its __dict__. But what I'm most worried about is that we would serialize a bytestring that seems OK until by chance it contains some non-UTF-8 encoded text and then failing.

Personally I don't think this is too much complexity to avoid those kinds of bugs. (The PR only changes ~75ish lines of code.) The only thing that I might consider is trying to localize the complexity to the utils module.

@elelay
Copy link
Contributor Author

elelay commented Jun 6, 2017

@leplatrem

We should probably confirm that with updated figures.

I see an up to 5x speed improvement (decode 256 UTF8 strings) for ujson/json, with an average of 2x for encode and decode.
Following table is the output of python tests/benchmark.py with Python 3.6.1. It lists calls/s (the more, the better).

ujson yajl simplejson json
Array with 256 doubles
encode 22485.14 8507.37 5752.49 6314.80
decode 35739.87 15104.25 14935.88 15758.40
Array with 256 UTF-8 strings
encode 3941.41 3788.01 2562.49 2573.70
decode 2675.51 918.95 536.55 522.48
Array with 256 strings
encode 51644.13 21904.67 23306.93 25884.61
decode 33265.51 25833.36 40359.65 29960.73
Medium complex object
encode 15259.40 8426.39 5482.92 7910.24
decode 14702.96 8594.63 8479.10 10029.45
Array with 256 True values
encode 159535.24 172716.43 78707.41 96372.96
decode 234235.84 109203.62 142965.52 168322.68
Array with 256 dict{string, int} pairs
encode 20273.73 15388.58 4463.77 9060.82
decode 17670.78 11974.63 9352.05 12167.52
Dict with 256 arrays with 256 dict{string, int} pairs
encode 76.93 56.22 15.78 33.29
decode 44.10 34.39 27.66 33.44
Complex object
encode 644.05 554.48 586.59
decode 591.66 307.53 226.03 219.96

@elelay
Copy link
Contributor Author

elelay commented Jun 6, 2017

@glasserc But what I'm most worried about is that we would serialize a bytestring that seems OK until by chance it contains some non-UTF-8 encoded text and then failing.

+1 this is why I took the trouble to implement this PR.

Now, choosing between ujson and json could maybe be implemented more elegantly. Is it possible to mock an attribute in any future instance of a class or sthing? Then we could switch the json attribute of StorageBase before tests and not have to explicitly configure it.

@elelay
Copy link
Contributor Author

elelay commented Jun 7, 2017

Meanwhile, I'm modifying ujson to add a 'reject_bytes' parameter.
If it's merged, it would allow us to always use ujson, with this parameter always True.

@glasserc
Copy link
Contributor

Refs ultrajson/ultrajson#266.

I tried "simplifying" this by centralizing the logic to dispatch between JSON flavors in utils.json, but that didn't work because there's nothing to dispatch according to. (This isn't flask; I don't have global access to the request or the config.) I think it would be possible to implement @elelay's suggestion of mocking/overriding the utils.json flavor using a monkeypatch, but that seems much uglier to me than just configuring it explicitly (as has been done here).

@leplatrem @Natim Would you veto if I wanted to land it? I see a +1 for using ujson on everything, but no strong objection against the patch as it stands now. Do you have an alternative implementation in mind that would be cleaner?

This PR needs a rebase in the meantime, plus there are a few remaining nits from the review.

@Natim
Copy link
Member

Natim commented Jul 12, 2017

Would you veto if I wanted to land it?

No, go ahead.

This PR needs a rebase

I am not confident handling the conflicts myself.

Copy link
Contributor

@glasserc glasserc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved the merge conflict using the fact that I was the person who wrote the conflicting code. Otherwise LGTM with a couple remaining nits that @leplatrem pointed out.

@elelay
Copy link
Contributor Author

elelay commented Jul 17, 2017

Thanks @glasserc. I think all nits are addressed.

@glasserc glasserc merged commit 5c84af2 into Kinto:master Jul 17, 2017
@glasserc
Copy link
Contributor

Thanks for your work and for bearing with us while we sorted this out!

@elelay elelay deleted the 1238-json-in-tests branch November 4, 2017 16:06
leplatrem added a commit that referenced this pull request Aug 12, 2020
Since ultrajson 3.0, ``json.dumps()`` will raise a ``TypeError`` if keys
are bytes.
This is now equivalent to the standard behaviour, and there is no
reason to have this ``strict_json`` option anymore.

This PR undoes some of the work done in:

* #1255
* #2573
leplatrem added a commit that referenced this pull request Aug 12, 2020
Since ultrajson 3.0, ``json.dumps()`` will raise a ``TypeError`` if keys
are bytes.
This is now equivalent to the standard behaviour, and there is no
reason to have this ``strict_json`` option anymore.

This PR undoes some of the work done in:

* #1255
* #2573
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants