Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance unmarshaling performances #336

Merged

Conversation

macisamuele
Copy link
Collaborator

NOTE: This CR replaces #331 . This is done due to the fact that history on the other PR is quite confusing.

The goal of this PR is, as mentioned in the title, unmarshaling performance enhancement.
NOTE: A similar PR will be published for applying the same approach described here for the marshaling process. I would like to have this verified first in order to avoid having two massive branches open at the same time

In order to achieve unmarshaling performance boost I've rewritten the whole bravado_core.unmarshal module.

The current, before this PR, logic of unmarshaling is:

  1. extract the type from the schema
  2. if the schema if of type x then call the appropriate unmarshaling function
    • if the schema is of type object: call the top unmarshaling function for the property (point 1)
    • if the schema is of type object: call the top unmarshaling function for each item of the list (point 1)
    • if the schema is of primitive type: identify, each time, the right formatting function to use
  3. once the unmarshaling function is identified then perform the real unmarshaling

This PR aims to cache results of point 1 and 2 for later "cost-free" usage.

While making this changes I've noticed that there might be no real good reason to have all the unmarshaling functions public, users of the library commonly use bravado_core.unmarshal.unmarshal_schema_object.
In order to simplify future maintenance work I've marked all the other functions as deprecated and will be removed in the next major release. Doing so will allows us, if needed, to modify more the internals of unmarshaling without worrying about breaking backward compatibility.

Let's get to the juicy part 😄
Raw data on: https://gist.github.com/macisamuele/9ff23d3e5e81c40cd884425a7eae5751

This PR provides a massive performance improvement:

  • the worst performance improvement is ~46% (if in the branch takes 1 then on master takes 1.87)
  • the best performance improvement is ~70% (if in the branch takes 1 then on master takes 3.33)
  • the average performance improvement is ~60% (if the branch takes 1 then on master takes 2.57)

Be aware that:

  • bravado_core.unmarshal.get_unmarshaling_method is wrapped by _decorators.wrap_recursive_call_exception and memoize_by_id decorators.
    _decorators.wrap_recursive_call_exception decorator is needed as models could be recursive and so invoking get_unmarshaling_method might end up on an infinite recursion.
    In order to proactively deal with it I've modified @memoize_by_id decorator to raise a controlled exception if an unbounded recursion is identified (ℹ️ the interpreter would raise anyway with a RuntimeError: maximum recursion depth exceeded)
    memoize_by_id decorator is needed to speed up evaluation of the marshaling method for the same spec and schema. This effectively is one of the major performance boost (basically the expensive point 1 and 2 of the original flow become, more or less, a dictionary lookup)
  • All the test changes have been made before the refactoring (except tests/unmarshal/unmarshal_object_test.py) to guarantee that functionally nothing changed ;)

@macisamuele macisamuele requested a review from sjaensch May 23, 2019 13:26
@coveralls
Copy link

coveralls commented May 23, 2019

Coverage Status

Coverage decreased (-0.5%) to 97.771% when pulling c804bc1 on macisamuele:maci-enhance-unmarshaling-performances into c48be8a on Yelp:master.

bravado_core/_decorators.py Outdated Show resolved Hide resolved
bravado_core/model.py Outdated Show resolved Hide resolved
bravado_core/model.py Outdated Show resolved Hide resolved
from bravado_core.unmarshal import unmarshal_model
return unmarshal_model(cls._swagger_spec, cls._model_spec, val)
from bravado_core.unmarshal import unmarshal_schema_object
return unmarshal_schema_object(cls._swagger_spec, cls._model_spec, val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you changing this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As unmarshal_model will be marked as deprecated it will be better to use the single unmarshal_schema_object access point.
An alternative might be to cache the result of bravado_core.unmarshal. get_unmarshaling_method in here. The later approach would require us to cache an additional information in order to reduce a single function call, so I preferred to call the general entry point as it is stable.

NOTE: The main difference (that might be lead us to revert this change) is related to the fact that Model._unmarshal will not honour the use_modules configuration.

tests/_decorators_test.py Outdated Show resolved Hide resolved
bravado_core/unmarshal.py Show resolved Hide resolved
properties_to_default_value,
additional_properties_unmarshaling_function,
model_value,
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function could use a bit more documentation. For example, why do we need model_to_unmarshaling_function_mapping? It has to do with discriminated models, right? Why do we need it? Can't we look that up in here? And why do we need model-specific unmarshaling functions at all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function could use a bit more documentation

It's fair to assume that the whole module deserves more documentation 😉
I'll take of that soon-ish (maybe adding some support for typing would also help to understand what tyoe of object to expect)

why do we need model-specific unmarshaling functions at all

It might be an unfortunate name ... I'll fix that by renaming this function into _unmarshal_object as this function actually takes care of performing the unmarshaling of a type: object value. The knowledge of the model is important in order to provide Model instances (if use_models is enabled) instead of plain dictionaries

why do we need model_to_unmarshaling_function_mapping

Yes this was set in order to deal with polymorphic objects. We might try to simplify this by calling get_unmarshaling_method on the discriminated model.
The downside of this approach is that determining the referenced schema might have a cost (we might be forced to dereference).
I will investigate a bit more on this to ensure that we're not pre-evaluating functions that might be evaluated later (especially if we can guarantee that they will be evaluated only once).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to avoid performance penalties while unmarshaling polymorphic schemas it would be better to already have the possible models definitions that the objected could be discriminated with.

We don't actually need to have the unmarshaling function already present, as obtaining it would be fast (if the function is already evaluated), so I'm basically reducing the amount of info that we're caching with a negligible performance different and more importantly I'm updating the parameter name as for sure the one that was defined was not as helpful as I did expect.

Copy link
Collaborator Author

@macisamuele macisamuele left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the valuable feedback.
As for now I'm going to:

  • add PEP-561 typing annotation (at least for the rewritten module) -> helps following the flow
  • removing un-needed cached attributes
  • address feedbacks

Documentation will be added soon

bravado_core/model.py Outdated Show resolved Hide resolved
bravado_core/model.py Outdated Show resolved Hide resolved
from bravado_core.unmarshal import unmarshal_model
return unmarshal_model(cls._swagger_spec, cls._model_spec, val)
from bravado_core.unmarshal import unmarshal_schema_object
return unmarshal_schema_object(cls._swagger_spec, cls._model_spec, val)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As unmarshal_model will be marked as deprecated it will be better to use the single unmarshal_schema_object access point.
An alternative might be to cache the result of bravado_core.unmarshal. get_unmarshaling_method in here. The later approach would require us to cache an additional information in order to reduce a single function call, so I preferred to call the general entry point as it is stable.

NOTE: The main difference (that might be lead us to revert this change) is related to the fact that Model._unmarshal will not honour the use_modules configuration.

properties_to_default_value,
additional_properties_unmarshaling_function,
model_value,
):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function could use a bit more documentation

It's fair to assume that the whole module deserves more documentation 😉
I'll take of that soon-ish (maybe adding some support for typing would also help to understand what tyoe of object to expect)

why do we need model-specific unmarshaling functions at all

It might be an unfortunate name ... I'll fix that by renaming this function into _unmarshal_object as this function actually takes care of performing the unmarshaling of a type: object value. The knowledge of the model is important in order to provide Model instances (if use_models is enabled) instead of plain dictionaries

why do we need model_to_unmarshaling_function_mapping

Yes this was set in order to deal with polymorphic objects. We might try to simplify this by calling get_unmarshaling_method on the discriminated model.
The downside of this approach is that determining the referenced schema might have a cost (we might be forced to dereference).
I will investigate a bit more on this to ensure that we're not pre-evaluating functions that might be evaluated later (especially if we can guarantee that they will be evaluated only once).

bravado_core/unmarshal.py Show resolved Hide resolved
tests/_decorators_test.py Outdated Show resolved Hide resolved
@macisamuele macisamuele force-pushed the maci-enhance-unmarshaling-performances branch from 57dde01 to c36fb14 Compare May 26, 2019 17:02
.coveragerc Show resolved Hide resolved
bravado_core/_compat_typing.py Outdated Show resolved Hide resolved
mypy.ini Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@macisamuele macisamuele merged commit 815f95e into Yelp:master May 27, 2019
@macisamuele macisamuele deleted the maci-enhance-unmarshaling-performances branch May 27, 2019 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants