Enhance unmarshaling performances #336

macisamuele · 2019-05-23T13:26:14Z

NOTE: This CR replaces #331 . This is done due to the fact that history on the other PR is quite confusing.

The goal of this PR is, as mentioned in the title, unmarshaling performance enhancement.
NOTE: A similar PR will be published for applying the same approach described here for the marshaling process. I would like to have this verified first in order to avoid having two massive branches open at the same time

In order to achieve unmarshaling performance boost I've rewritten the whole bravado_core.unmarshal module.

The current, before this PR, logic of unmarshaling is:

extract the type from the schema
if the schema if of type x then call the appropriate unmarshaling function
- if the schema is of type object: call the top unmarshaling function for the property (point 1)
- if the schema is of type object: call the top unmarshaling function for each item of the list (point 1)
- if the schema is of primitive type: identify, each time, the right formatting function to use
once the unmarshaling function is identified then perform the real unmarshaling

This PR aims to cache results of point 1 and 2 for later "cost-free" usage.

While making this changes I've noticed that there might be no real good reason to have all the unmarshaling functions public, users of the library commonly use bravado_core.unmarshal.unmarshal_schema_object.
In order to simplify future maintenance work I've marked all the other functions as deprecated and will be removed in the next major release. Doing so will allows us, if needed, to modify more the internals of unmarshaling without worrying about breaking backward compatibility.

Let's get to the juicy part 😄
Raw data on: https://gist.github.com/macisamuele/9ff23d3e5e81c40cd884425a7eae5751

This PR provides a massive performance improvement:

the worst performance improvement is ~46% (if in the branch takes 1 then on master takes 1.87)
the best performance improvement is ~70% (if in the branch takes 1 then on master takes 3.33)
the average performance improvement is ~60% (if the branch takes 1 then on master takes 2.57)

Be aware that:

bravado_core.unmarshal.get_unmarshaling_method is wrapped by _decorators.wrap_recursive_call_exception and memoize_by_id decorators.
_decorators.wrap_recursive_call_exception decorator is needed as models could be recursive and so invoking get_unmarshaling_method might end up on an infinite recursion.
In order to proactively deal with it I've modified @memoize_by_id decorator to raise a controlled exception if an unbounded recursion is identified (ℹ️ the interpreter would raise anyway with a RuntimeError: maximum recursion depth exceeded)
memoize_by_id decorator is needed to speed up evaluation of the marshaling method for the same spec and schema. This effectively is one of the major performance boost (basically the expensive point 1 and 2 of the original flow become, more or less, a dictionary lookup)
All the test changes have been made before the refactoring (except tests/unmarshal/unmarshal_object_test.py) to guarantee that functionally nothing changed ;)

…decorators

…ng of arrays without schema Tests updates are done before the refactor to ensure confidence of the changes

…ntinous creation of jsonschema validator instances

coveralls · 2019-05-23T13:31:45Z

Coverage decreased (-0.5%) to 97.771% when pulling c804bc1 on macisamuele:maci-enhance-unmarshaling-performances into c48be8a on Yelp:master.

bravado_core/_decorators.py

bravado_core/model.py

sjaensch · 2019-05-25T13:35:09Z

bravado_core/model.py

-        from bravado_core.unmarshal import unmarshal_model
-        return unmarshal_model(cls._swagger_spec, cls._model_spec, val)
+        from bravado_core.unmarshal import unmarshal_schema_object
+        return unmarshal_schema_object(cls._swagger_spec, cls._model_spec, val)


Why are you changing this?

As unmarshal_model will be marked as deprecated it will be better to use the single unmarshal_schema_object access point.
An alternative might be to cache the result of bravado_core.unmarshal. get_unmarshaling_method in here. The later approach would require us to cache an additional information in order to reduce a single function call, so I preferred to call the general entry point as it is stable.

NOTE: The main difference (that might be lead us to revert this change) is related to the fact that Model._unmarshal will not honour the use_modules configuration.

tests/_decorators_test.py

bravado_core/unmarshal.py

sjaensch · 2019-05-25T13:51:56Z

bravado_core/unmarshal.py

+    properties_to_default_value,
+    additional_properties_unmarshaling_function,
+    model_value,
+):


This function could use a bit more documentation. For example, why do we need model_to_unmarshaling_function_mapping? It has to do with discriminated models, right? Why do we need it? Can't we look that up in here? And why do we need model-specific unmarshaling functions at all?

This function could use a bit more documentation

It's fair to assume that the whole module deserves more documentation 😉
I'll take of that soon-ish (maybe adding some support for typing would also help to understand what tyoe of object to expect)

why do we need model-specific unmarshaling functions at all

It might be an unfortunate name ... I'll fix that by renaming this function into _unmarshal_object as this function actually takes care of performing the unmarshaling of a type: object value. The knowledge of the model is important in order to provide Model instances (if use_models is enabled) instead of plain dictionaries

why do we need model_to_unmarshaling_function_mapping

Yes this was set in order to deal with polymorphic objects. We might try to simplify this by calling get_unmarshaling_method on the discriminated model.
The downside of this approach is that determining the referenced schema might have a cost (we might be forced to dereference).
I will investigate a bit more on this to ensure that we're not pre-evaluating functions that might be evaluated later (especially if we can guarantee that they will be evaluated only once).

In order to avoid performance penalties while unmarshaling polymorphic schemas it would be better to already have the possible models definitions that the objected could be discriminated with.

We don't actually need to have the unmarshaling function already present, as obtaining it would be fast (if the function is already evaluated), so I'm basically reducing the amount of info that we're caching with a negligible performance different and more importantly I'm updating the parameter name as for sure the one that was defined was not as helpful as I did expect.

NOTE: Usage of partials has been removed due to typing issues More details could be found on python/mypy#1484

macisamuele

Thanks a lot for the valuable feedback.
As for now I'm going to:

add PEP-561 typing annotation (at least for the rewritten module) -> helps following the flow
removing un-needed cached attributes
address feedbacks

Documentation will be added soon

bravado_core/model.py

macisamuele · 2019-05-25T17:13:46Z

bravado_core/model.py

-        from bravado_core.unmarshal import unmarshal_model
-        return unmarshal_model(cls._swagger_spec, cls._model_spec, val)
+        from bravado_core.unmarshal import unmarshal_schema_object
+        return unmarshal_schema_object(cls._swagger_spec, cls._model_spec, val)


As unmarshal_model will be marked as deprecated it will be better to use the single unmarshal_schema_object access point.
An alternative might be to cache the result of bravado_core.unmarshal. get_unmarshaling_method in here. The later approach would require us to cache an additional information in order to reduce a single function call, so I preferred to call the general entry point as it is stable.

NOTE: The main difference (that might be lead us to revert this change) is related to the fact that Model._unmarshal will not honour the use_modules configuration.

macisamuele · 2019-05-25T17:23:40Z

bravado_core/unmarshal.py

+    properties_to_default_value,
+    additional_properties_unmarshaling_function,
+    model_value,
+):


This function could use a bit more documentation

It's fair to assume that the whole module deserves more documentation 😉
I'll take of that soon-ish (maybe adding some support for typing would also help to understand what tyoe of object to expect)

why do we need model-specific unmarshaling functions at all

It might be an unfortunate name ... I'll fix that by renaming this function into _unmarshal_object as this function actually takes care of performing the unmarshaling of a type: object value. The knowledge of the model is important in order to provide Model instances (if use_models is enabled) instead of plain dictionaries

why do we need model_to_unmarshaling_function_mapping

Yes this was set in order to deal with polymorphic objects. We might try to simplify this by calling get_unmarshaling_method on the discriminated model.
The downside of this approach is that determining the referenced schema might have a cost (we might be forced to dereference).
I will investigate a bit more on this to ensure that we're not pre-evaluating functions that might be evaluated later (especially if we can guarantee that they will be evaluated only once).

bravado_core/unmarshal.py

tests/_decorators_test.py

.coveragerc

bravado_core/_compat_typing.py

mypy.ini

setup.py

macisamuele added 8 commits May 23, 2019 14:35

Add dedicated unmarshaling performance tests

a68ab75

Define helper method to detect type from schema object

942bb5f

Add TODO to remove manual propagation of x-nullable

baa3382

Define handle_null_value_decorator and wrap_recursive_call_exception …

401697a

…decorators

Update tests and update unmarshaling object logic to allow unmarshali…

8e45885

…ng of arrays without schema Tests updates are done before the refactor to ensure confidence of the changes

Refactor unmarshaling functions to allow boost performances

dcf7059

Reduce test verbosity due to bravado-core deprecated functions

fc107f4

Cache bravado_core.swagger20_validator.get_validator_type to avoid co…

cc16bf2

…ntinous creation of jsonschema validator instances

macisamuele requested a review from sjaensch May 23, 2019 13:26

macisamuele added 2 commits May 23, 2019 20:18

Postpone object initialisation and fix typos

dcc16a6

Add explict Model._from_dict test

6743a1d

sjaensch suggested changes May 25, 2019

View reviewed changes

macisamuele added 4 commits May 26, 2019 13:20

PR Feedbacks - Part 1

1e3d277

Setup typing accordingly to PEP 561

3869e80

Force typing on bravado_core.unmarshal and bravado_core._decorators

bfa6cee

NOTE: Usage of partials has been removed due to typing issues More details could be found on python/mypy#1484

Remove unneeded model cached properties

52a2bd4

macisamuele commented May 26, 2019

View reviewed changes

macisamuele added 4 commits May 26, 2019 18:38

Disable branch coverage and ignore bravado_core/_compat_typing.py

751cc4e

get_unmarshaling_method is a private method of the module

9f7f68f

Update _unmarshal_object parameters and use keyword arguments

35c3edc

Add extensive documentation

c36fb14

macisamuele force-pushed the maci-enhance-unmarshaling-performances branch from 57dde01 to c36fb14 Compare May 26, 2019 17:02

sjaensch suggested changes May 27, 2019

View reviewed changes

.coveragerc Show resolved Hide resolved

bravado_core/_compat_typing.py Outdated Show resolved Hide resolved

mypy.ini Show resolved Hide resolved

setup.py Outdated Show resolved Hide resolved

PR feedbacks (typing)

de32f85

sjaensch approved these changes May 27, 2019

View reviewed changes

CR Feedbacks 2 (typing)

c804bc1

sjaensch approved these changes May 27, 2019

View reviewed changes

macisamuele merged commit 815f95e into Yelp:master May 27, 2019

macisamuele deleted the maci-enhance-unmarshaling-performances branch May 27, 2019 12:27

This was referenced May 27, 2019

Use partials as they do not introduce the overhead of wrapping functions #338

Merged

Enhance marshaling performances #339

Merged

Ensure that unmarshal uses the format only if defined #340

Merged

This was referenced Jun 11, 2019

change unmarshal array #307

Closed

Memoization #306

Closed

Unmarshal array inline #305

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance unmarshaling performances #336

Enhance unmarshaling performances #336

macisamuele commented May 23, 2019

coveralls commented May 23, 2019 •

edited

sjaensch May 25, 2019

macisamuele May 25, 2019

sjaensch May 25, 2019

macisamuele May 25, 2019

macisamuele May 26, 2019 •

edited

macisamuele left a comment

macisamuele May 25, 2019

macisamuele May 25, 2019

Enhance unmarshaling performances #336

Enhance unmarshaling performances #336

Conversation

macisamuele commented May 23, 2019

coveralls commented May 23, 2019 • edited

sjaensch May 25, 2019

Choose a reason for hiding this comment

macisamuele May 25, 2019

Choose a reason for hiding this comment

sjaensch May 25, 2019

Choose a reason for hiding this comment

macisamuele May 25, 2019

Choose a reason for hiding this comment

macisamuele May 26, 2019 • edited

Choose a reason for hiding this comment

macisamuele left a comment

Choose a reason for hiding this comment

macisamuele May 25, 2019

Choose a reason for hiding this comment

macisamuele May 25, 2019

Choose a reason for hiding this comment

coveralls commented May 23, 2019 •

edited

macisamuele May 26, 2019 •

edited