use json instead of ujson in tests #1255

elelay · 2017-06-05T16:26:15Z

to catch bytes instead of str passed to storage. Fixes Kinto#1238. A new boolean settings `storage_strict_json` is introduced to configure json instead of ujson in tests but keep ujson in prod. `strict_json` is True by default in `StorageBase` constructor, but False by default in `load_from_config`. strict_json is still not in effect in: - test_main.py, - tests/core/test_initialization.py - tests/plugins/test_{history,quotas}.py#test_a_statsd...

leplatrem

I haven't followed the whole conversation about this, but the approach here adds a bit of complexity (new setting, new storage behaviour, ...).

Are we still sure that ujson brings us enough benefit compared to python 3.6 json native module? Maybe we could drop it completely.

What do you think?

leplatrem · 2017-06-06T07:58:19Z

CHANGELOG.rst

@@ -40,6 +40,7 @@ This document describes changes between each past release.

 - Make memory storage consistent with PostgreSQL with regard to bytes (#1237)
 - Some minor cleanups about the use of kinto.readonly (#1241)
+- use json instead of ujson in storage in tests (#1255)


nitpick: Use (capitalized)

leplatrem · 2017-06-06T08:00:34Z

tests/plugins/test_history.py

@@ -17,7 +17,8 @@ class PluginSetup(unittest.TestCase):
    def test_a_statsd_timer_is_used_for_history_if_configured(self):
        settings = {
            "statsd_url": "udp://127.0.0.1:8125",
-            "includes": "kinto.plugins.history"
+            "includes": "kinto.plugins.history",
+            "storage_strict_json": True


I don't think it's necessary to change this for the history plugin tests?

Why not? The goal is to make sure the JSON used by the history public are serializable. It should not hurt to activate it.

leplatrem · 2017-06-06T08:06:44Z

docs/configuration/settings.rst

@@ -122,6 +122,8 @@ Storage
 +------------------------------+-------------------------------+--------------------------------------------------------------------------+
 | kinto.storage_max_backlog    | ``-1``                        | Number of threads that can be in the queue waiting for a connection.     |
 +------------------------------+-------------------------------+--------------------------------------------------------------------------+
+| kinto.storage_strict_json    | ``False``                     | (test only) validate that records are serializable using the json module.|
+------------------------------+-------------------------------+--------------------------------------------------------------------------+


If it is meant to be used in tests only I don't think it's necessary to document it here. No?

Natim · 2017-06-06T08:13:43Z

Are we still sure that ujson brings us enough benefit compared to python 3.6 json native module? Maybe we could drop it completely.

Kinto speed benefit heavily from the use of ujson. We should definitely keep it.

Natim · 2017-06-06T08:18:24Z

I haven't followed the whole conversation about this

The rationale is that PostgreSQL backend is using ujson while the memory backend was not. We discovered that sometime ujson doesn't behave exactly as the json module.

@elelay upgraded the memory backend to use ujson so that both backend behave the same way with regards to JSON handling.

To be honest I would just use ujson everywhere without adding a new setting.

leplatrem · 2017-06-06T08:20:44Z

Kinto speed is due to the use of ujson. We should definitely keep it.

Do we have numbers? We did benchmarks 2 years ago with Python 2.7. I continuously read about python 3.6 super performances. We should probably confirm that with updated figures.

To be honest I would just use ujson everywhere without adding a new setting.

+1 (all or nothing)

Natim · 2017-06-06T08:22:46Z

We should probably confirm that with updated figures.

Sure if you feel like doing so.

I continuously read about python 3.6 super performances.

We are not using Python 3.6 yet in production as far as I know. But I really doubt JSON handling in Python could be faster than in libc.

glasserc · 2017-06-06T14:16:33Z

The problem is that the behavior of ujson is arguably broken and lets a developer accidentally serialize things that shouldn't be serializable. Any arbitrary object will be serialized by using its __dict__. But what I'm most worried about is that we would serialize a bytestring that seems OK until by chance it contains some non-UTF-8 encoded text and then failing.

Personally I don't think this is too much complexity to avoid those kinds of bugs. (The PR only changes ~75ish lines of code.) The only thing that I might consider is trying to localize the complexity to the utils module.

elelay · 2017-06-06T17:30:29Z

@leplatrem

We should probably confirm that with updated figures.

I see an up to 5x speed improvement (decode 256 UTF8 strings) for ujson/json, with an average of 2x for encode and decode.
Following table is the output of python tests/benchmark.py with Python 3.6.1. It lists calls/s (the more, the better).

	ujson	yajl	simplejson	json
Array with 256 doubles
encode	22485.14	8507.37	5752.49	6314.80
decode	35739.87	15104.25	14935.88	15758.40
Array with 256 UTF-8 strings
encode	3941.41	3788.01	2562.49	2573.70
decode	2675.51	918.95	536.55	522.48
Array with 256 strings
encode	51644.13	21904.67	23306.93	25884.61
decode	33265.51	25833.36	40359.65	29960.73
Medium complex object
encode	15259.40	8426.39	5482.92	7910.24
decode	14702.96	8594.63	8479.10	10029.45
Array with 256 True values
encode	159535.24	172716.43	78707.41	96372.96
decode	234235.84	109203.62	142965.52	168322.68
Array with 256 dict{string, int} pairs
encode	20273.73	15388.58	4463.77	9060.82
decode	17670.78	11974.63	9352.05	12167.52
Dict with 256 arrays with 256 dict{string, int} pairs
encode	76.93	56.22	15.78	33.29
decode	44.10	34.39	27.66	33.44
Complex object
encode	644.05		554.48	586.59
decode	591.66	307.53	226.03	219.96

elelay · 2017-06-06T19:00:52Z

@glasserc But what I'm most worried about is that we would serialize a bytestring that seems OK until by chance it contains some non-UTF-8 encoded text and then failing.

+1 this is why I took the trouble to implement this PR.

Now, choosing between ujson and json could maybe be implemented more elegantly. Is it possible to mock an attribute in any future instance of a class or sthing? Then we could switch the json attribute of StorageBase before tests and not have to explicitly configure it.

elelay · 2017-06-07T07:08:56Z

Meanwhile, I'm modifying ujson to add a 'reject_bytes' parameter.
If it's merged, it would allow us to always use ujson, with this parameter always True.

glasserc · 2017-06-20T18:44:18Z

Refs ultrajson/ultrajson#266.

I tried "simplifying" this by centralizing the logic to dispatch between JSON flavors in utils.json, but that didn't work because there's nothing to dispatch according to. (This isn't flask; I don't have global access to the request or the config.) I think it would be possible to implement @elelay's suggestion of mocking/overriding the utils.json flavor using a monkeypatch, but that seems much uglier to me than just configuring it explicitly (as has been done here).

@leplatrem @Natim Would you veto if I wanted to land it? I see a +1 for using ujson on everything, but no strong objection against the patch as it stands now. Do you have an alternative implementation in mind that would be cleaner?

This PR needs a rebase in the meantime, plus there are a few remaining nits from the review.

Natim · 2017-07-12T11:23:24Z

Would you veto if I wanted to land it?

No, go ahead.

This PR needs a rebase

I am not confident handling the conflicts myself.

glasserc

Resolved the merge conflict using the fact that I was the person who wrote the conflicting code. Otherwise LGTM with a couple remaining nits that @leplatrem pointed out.

elelay · 2017-07-17T20:43:06Z

Thanks @glasserc. I think all nits are addressed.

glasserc · 2017-07-17T20:54:17Z

Thanks for your work and for bearing with us while we sorted this out!

See #1255 And ultrajson/ultrajson#266

Since ultrajson 3.0, ``json.dumps()`` will raise a ``TypeError`` if keys are bytes. This is now equivalent to the standard behaviour, and there is no reason to have this ``strict_json`` option anymore. This PR undoes some of the work done in: * #1255 * #2573

elelay added 2 commits June 5, 2017 18:22

set strict json for plugin tests (not mandatory)

697144c

elelay changed the title ~~1238 json in tests~~ use json instead of ujson in tests Jun 5, 2017

changelog for Kinto#1255

2209a6d

leplatrem reviewed Jun 6, 2017

View reviewed changes

Merge branch 'master' into 1238-json-in-tests

83a1f63

glasserc approved these changes Jul 17, 2017

View reviewed changes

address remaining review remarks from @leplatrem

f5b21e5

glasserc merged commit 5c84af2 into Kinto:master Jul 17, 2017

glasserc mentioned this pull request Aug 14, 2017

Preparing release 7.3.2 #1313

Merged

elelay deleted the 1238-json-in-tests branch November 4, 2017 16:06

leplatrem added a commit that referenced this pull request Aug 10, 2020

ujson also raises on receiving bytes now

ff0dd9c

See #1255 And ultrajson/ultrajson#266

leplatrem mentioned this pull request Aug 10, 2020

ujson also raises on receiving bytes now #2573

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use json instead of ujson in tests #1255

use json instead of ujson in tests #1255

elelay commented Jun 5, 2017 •

edited

leplatrem left a comment

leplatrem Jun 6, 2017

leplatrem Jun 6, 2017

Natim Jun 6, 2017 •

edited

leplatrem Jun 6, 2017

Natim commented Jun 6, 2017 •

edited

Natim commented Jun 6, 2017

leplatrem commented Jun 6, 2017 •

edited

Natim commented Jun 6, 2017

glasserc commented Jun 6, 2017

elelay commented Jun 6, 2017 •

edited

elelay commented Jun 6, 2017 •

edited

elelay commented Jun 7, 2017

glasserc commented Jun 20, 2017

Natim commented Jul 12, 2017 •

edited

glasserc left a comment

elelay commented Jul 17, 2017

glasserc commented Jul 17, 2017

use json instead of ujson in tests #1255

use json instead of ujson in tests #1255

Conversation

elelay commented Jun 5, 2017 • edited

leplatrem left a comment

Choose a reason for hiding this comment

leplatrem Jun 6, 2017

Choose a reason for hiding this comment

leplatrem Jun 6, 2017

Choose a reason for hiding this comment

Natim Jun 6, 2017 • edited

Choose a reason for hiding this comment

leplatrem Jun 6, 2017

Choose a reason for hiding this comment

Natim commented Jun 6, 2017 • edited

Natim commented Jun 6, 2017

leplatrem commented Jun 6, 2017 • edited

Natim commented Jun 6, 2017

glasserc commented Jun 6, 2017

elelay commented Jun 6, 2017 • edited

elelay commented Jun 6, 2017 • edited

elelay commented Jun 7, 2017

glasserc commented Jun 20, 2017

Natim commented Jul 12, 2017 • edited

glasserc left a comment

Choose a reason for hiding this comment

elelay commented Jul 17, 2017

glasserc commented Jul 17, 2017

elelay commented Jun 5, 2017 •

edited

Natim Jun 6, 2017 •

edited

Natim commented Jun 6, 2017 •

edited

leplatrem commented Jun 6, 2017 •

edited

elelay commented Jun 6, 2017 •

edited

elelay commented Jun 6, 2017 •

edited

Natim commented Jul 12, 2017 •

edited