Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize is_dict_like performance when called with a dict instance #207

Merged
merged 1 commit into from Oct 31, 2017

Conversation

gstarnberger
Copy link
Member

isinstance(something, Mapping) is very slow in Python (see https://stackoverflow.com/questions/42378726/why-is-checking-isinstancesomething-mapping-so-slow) and most of the time is_dict_like in schema.py is called with an actual dict instance instead of some other type of mapping.

This change optimizes for that common use-case by doing a cheap dict is_instance check and by only executing the more expensive check if it's not a dictionary. This yields an approx 25% performance improvement for some of my test cases that use very large (around 400 kb) JSON responses.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 98.007% when pulling 3e8e818 on gstarnberger:optimize_is_dict_like into 2b9bb67 on Yelp:master.

Copy link
Collaborator

@macisamuele macisamuele left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here to explain the reason of this "redundant" logic condition? This will prevent (or at least help to prevent) removing isinstance(spec, dict) in the future

@coveralls
Copy link

Coverage Status

Coverage remained the same at 98.007% when pulling ad30090 on gstarnberger:optimize_is_dict_like into 2b9bb67 on Yelp:master.

@macisamuele macisamuele merged commit 6d0e547 into Yelp:master Oct 31, 2017
@macisamuele
Copy link
Collaborator

@gstarnberger I've merged this CR ... but I've tried to time isinstance calls with different configurations.

(benchmark) C02QJ1JPG8WL:bravado-core maci$ python -m timeit -v -n 1000000 -s 'from tests.profiling.conftest import large_pets; from collections import Mapping; d = large_pets(200)' 'isinstance(d, Mapping)'
raw times: 1.58 1.47 1.5
1000000 loops, best of 3: 1.47 usec per loop
(benchmark) C02QJ1JPG8WL:bravado-core maci$ python -m timeit -v -n 1000000 -s 'from tests.profiling.conftest import large_pets; from collections import Mapping; d = large_pets(200)' 'isinstance(d, (dict, Mapping))'
raw times: 1.6 1.62 1.62
1000000 loops, best of 3: 1.6 usec per loop
(benchmark) C02QJ1JPG8WL:bravado-core maci$ python -m timeit -v -n 1000000 -s 'from tests.profiling.conftest import large_pets; from collections import Mapping; d = large_pets(200)' 'isinstance(d, dict) or isinstance(d, Mapping)'
raw times: 1.68 1.66 1.66
1000000 loops, best of 3: 1.66 usec per loop
(benchmark) C02QJ1JPG8WL:bravado-core maci$ python -m timeit -v -n 1000000 -s 'from tests.profiling.conftest import large_pets; from collections import Mapping; d = large_pets(200)' 'isinstance(d, dict)'
raw times: 0.158 0.17 0.168
1000000 loops, best of 3: 0.158 usec per loop
(benchmark) C02QJ1JPG8WL:bravado-core maci$

According to the above results isinstance(d, (dict, Mapping)) has no improvements respect isinstance(d, Mapping), instead isinstance(d, dict) is about 10 times faster.
So in order to improve performances in case of instance is dict (majority of the cases) we should use a code similar to

def is_dict_like(spec):
    if isinstance(spec, dict):
        return True
    return isinstance(spec, Mapping)

@gstarnberger
Copy link
Member Author

@macisamuele Thanks for merging the PR.

As for the tests above: The output of large_pets(200) is a list so your isinstance check with a dict type will always fail and fallback to the more expensive Mapping isinstance check. When Bravado reads a response object most of the arguments to is_dict_like are of type dict and therefore short-circuiting that check is able to increase performance significantly - in some of my tests with larger responses I see an approx 25% reduction in overall time.

isinstance(d, (dict, Mapping)) is evaluated as isinstance(d, dict) or isinstance(d, Mapping) and therefore also will also lead to the same evaluations (and a very similar response time) as your example above.

In [11]: timeit.timeit(lambda: isinstance({}, dict), number=100000)
Out[11]: 0.021892070770263672

In [12]: timeit.timeit(lambda: isinstance({}, (dict, Mapping)), number=100000)                                                                                                                               
Out[12]: 0.022366046905517578

In [13]: timeit.timeit(lambda: isinstance({}, Mapping), number=100000)                                                                                                                                       
Out[13]: 0.13477492332458496

In [14]: timeit.timeit(lambda: is_dict_like({}), number=100000)
Out[14]: 0.0332639217376709

@gstarnberger gstarnberger deleted the optimize_is_dict_like branch October 31, 2017 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants