Optimize is_dict_like performance when called with a dict instance #207

gstarnberger · 2017-10-30T19:26:32Z

isinstance(something, Mapping) is very slow in Python (see https://stackoverflow.com/questions/42378726/why-is-checking-isinstancesomething-mapping-so-slow) and most of the time is_dict_like in schema.py is called with an actual dict instance instead of some other type of mapping.

This change optimizes for that common use-case by doing a cheap dict is_instance check and by only executing the more expensive check if it's not a dictionary. This yields an approx 25% performance improvement for some of my test cases that use very large (around 400 kb) JSON responses.

coveralls · 2017-10-30T19:30:49Z

Coverage remained the same at 98.007% when pulling 3e8e818 on gstarnberger:optimize_is_dict_like into 2b9bb67 on Yelp:master.

macisamuele

Could you add a comment here to explain the reason of this "redundant" logic condition? This will prevent (or at least help to prevent) removing isinstance(spec, dict) in the future

coveralls · 2017-10-31T05:16:32Z

Coverage remained the same at 98.007% when pulling ad30090 on gstarnberger:optimize_is_dict_like into 2b9bb67 on Yelp:master.

macisamuele · 2017-10-31T18:56:21Z

@gstarnberger I've merged this CR ... but I've tried to time isinstance calls with different configurations.

(benchmark) C02QJ1JPG8WL:bravado-core maci$ python -m timeit -v -n 1000000 -s 'from tests.profiling.conftest import large_pets; from collections import Mapping; d = large_pets(200)' 'isinstance(d, Mapping)'
raw times: 1.58 1.47 1.5
1000000 loops, best of 3: 1.47 usec per loop
(benchmark) C02QJ1JPG8WL:bravado-core maci$ python -m timeit -v -n 1000000 -s 'from tests.profiling.conftest import large_pets; from collections import Mapping; d = large_pets(200)' 'isinstance(d, (dict, Mapping))'
raw times: 1.6 1.62 1.62
1000000 loops, best of 3: 1.6 usec per loop
(benchmark) C02QJ1JPG8WL:bravado-core maci$ python -m timeit -v -n 1000000 -s 'from tests.profiling.conftest import large_pets; from collections import Mapping; d = large_pets(200)' 'isinstance(d, dict) or isinstance(d, Mapping)'
raw times: 1.68 1.66 1.66
1000000 loops, best of 3: 1.66 usec per loop
(benchmark) C02QJ1JPG8WL:bravado-core maci$ python -m timeit -v -n 1000000 -s 'from tests.profiling.conftest import large_pets; from collections import Mapping; d = large_pets(200)' 'isinstance(d, dict)'
raw times: 0.158 0.17 0.168
1000000 loops, best of 3: 0.158 usec per loop
(benchmark) C02QJ1JPG8WL:bravado-core maci$

According to the above results isinstance(d, (dict, Mapping)) has no improvements respect isinstance(d, Mapping), instead isinstance(d, dict) is about 10 times faster.
So in order to improve performances in case of instance is dict (majority of the cases) we should use a code similar to

def is_dict_like(spec):
    if isinstance(spec, dict):
        return True
    return isinstance(spec, Mapping)

gstarnberger · 2017-10-31T19:48:12Z

@macisamuele Thanks for merging the PR.

As for the tests above: The output of large_pets(200) is a list so your isinstance check with a dict type will always fail and fallback to the more expensive Mapping isinstance check. When Bravado reads a response object most of the arguments to is_dict_like are of type dict and therefore short-circuiting that check is able to increase performance significantly - in some of my tests with larger responses I see an approx 25% reduction in overall time.

isinstance(d, (dict, Mapping)) is evaluated as isinstance(d, dict) or isinstance(d, Mapping) and therefore also will also lead to the same evaluations (and a very similar response time) as your example above.

In [11]: timeit.timeit(lambda: isinstance({}, dict), number=100000)
Out[11]: 0.021892070770263672

In [12]: timeit.timeit(lambda: isinstance({}, (dict, Mapping)), number=100000)                                                                                                                               
Out[12]: 0.022366046905517578

In [13]: timeit.timeit(lambda: isinstance({}, Mapping), number=100000)                                                                                                                                       
Out[13]: 0.13477492332458496

In [14]: timeit.timeit(lambda: is_dict_like({}), number=100000)
Out[14]: 0.0332639217376709

macisamuele requested changes Oct 31, 2017

View reviewed changes

Optimize is_dict_like performance when called with a dict instance

ad30090

gstarnberger force-pushed the optimize_is_dict_like branch from 3e8e818 to ad30090 Compare October 31, 2017 03:05

macisamuele approved these changes Oct 31, 2017

View reviewed changes

macisamuele merged commit 6d0e547 into Yelp:master Oct 31, 2017

gstarnberger deleted the optimize_is_dict_like branch October 31, 2017 19:48

macisamuele mentioned this pull request Jan 3, 2018

2013 query params list sjaensch/bravado-asyncio#3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize is_dict_like performance when called with a dict instance #207

Optimize is_dict_like performance when called with a dict instance #207

gstarnberger commented Oct 30, 2017

coveralls commented Oct 30, 2017

macisamuele left a comment

coveralls commented Oct 31, 2017

macisamuele commented Oct 31, 2017

gstarnberger commented Oct 31, 2017

Optimize is_dict_like performance when called with a dict instance #207

Optimize is_dict_like performance when called with a dict instance #207

Conversation

gstarnberger commented Oct 30, 2017

coveralls commented Oct 30, 2017

macisamuele left a comment

Choose a reason for hiding this comment

coveralls commented Oct 31, 2017

macisamuele commented Oct 31, 2017

gstarnberger commented Oct 31, 2017