New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance unmarshaling performances #336
Enhance unmarshaling performances #336
Conversation
…ng of arrays without schema Tests updates are done before the refactor to ensure confidence of the changes
…ntinous creation of jsonschema validator instances
from bravado_core.unmarshal import unmarshal_model | ||
return unmarshal_model(cls._swagger_spec, cls._model_spec, val) | ||
from bravado_core.unmarshal import unmarshal_schema_object | ||
return unmarshal_schema_object(cls._swagger_spec, cls._model_spec, val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you changing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As unmarshal_model
will be marked as deprecated it will be better to use the single unmarshal_schema_object
access point.
An alternative might be to cache the result of bravado_core.unmarshal. get_unmarshaling_method
in here. The later approach would require us to cache an additional information in order to reduce a single function call, so I preferred to call the general entry point as it is stable.
NOTE: The main difference (that might be lead us to revert this change) is related to the fact that Model._unmarshal
will not honour the use_modules
configuration.
properties_to_default_value, | ||
additional_properties_unmarshaling_function, | ||
model_value, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function could use a bit more documentation. For example, why do we need model_to_unmarshaling_function_mapping
? It has to do with discriminated models, right? Why do we need it? Can't we look that up in here? And why do we need model-specific unmarshaling functions at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function could use a bit more documentation
It's fair to assume that the whole module deserves more documentation 😉
I'll take of that soon-ish (maybe adding some support for typing would also help to understand what tyoe of object to expect)
why do we need model-specific unmarshaling functions at all
It might be an unfortunate name ... I'll fix that by renaming this function into _unmarshal_object
as this function actually takes care of performing the unmarshaling of a type: object
value. The knowledge of the model is important in order to provide Model
instances (if use_models
is enabled) instead of plain dictionaries
why do we need
model_to_unmarshaling_function_mapping
Yes this was set in order to deal with polymorphic objects. We might try to simplify this by calling get_unmarshaling_method
on the discriminated model.
The downside of this approach is that determining the referenced schema might have a cost (we might be forced to dereference).
I will investigate a bit more on this to ensure that we're not pre-evaluating functions that might be evaluated later (especially if we can guarantee that they will be evaluated only once).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to avoid performance penalties while unmarshaling polymorphic schemas it would be better to already have the possible models definitions that the objected could be discriminated with.
We don't actually need to have the unmarshaling function already present, as obtaining it would be fast (if the function is already evaluated), so I'm basically reducing the amount of info that we're caching with a negligible performance different and more importantly I'm updating the parameter name as for sure the one that was defined was not as helpful as I did expect.
NOTE: Usage of partials has been removed due to typing issues More details could be found on python/mypy#1484
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the valuable feedback.
As for now I'm going to:
- add PEP-561 typing annotation (at least for the rewritten module) -> helps following the flow
- removing un-needed cached attributes
- address feedbacks
Documentation will be added soon
from bravado_core.unmarshal import unmarshal_model | ||
return unmarshal_model(cls._swagger_spec, cls._model_spec, val) | ||
from bravado_core.unmarshal import unmarshal_schema_object | ||
return unmarshal_schema_object(cls._swagger_spec, cls._model_spec, val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As unmarshal_model
will be marked as deprecated it will be better to use the single unmarshal_schema_object
access point.
An alternative might be to cache the result of bravado_core.unmarshal. get_unmarshaling_method
in here. The later approach would require us to cache an additional information in order to reduce a single function call, so I preferred to call the general entry point as it is stable.
NOTE: The main difference (that might be lead us to revert this change) is related to the fact that Model._unmarshal
will not honour the use_modules
configuration.
properties_to_default_value, | ||
additional_properties_unmarshaling_function, | ||
model_value, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function could use a bit more documentation
It's fair to assume that the whole module deserves more documentation 😉
I'll take of that soon-ish (maybe adding some support for typing would also help to understand what tyoe of object to expect)
why do we need model-specific unmarshaling functions at all
It might be an unfortunate name ... I'll fix that by renaming this function into _unmarshal_object
as this function actually takes care of performing the unmarshaling of a type: object
value. The knowledge of the model is important in order to provide Model
instances (if use_models
is enabled) instead of plain dictionaries
why do we need
model_to_unmarshaling_function_mapping
Yes this was set in order to deal with polymorphic objects. We might try to simplify this by calling get_unmarshaling_method
on the discriminated model.
The downside of this approach is that determining the referenced schema might have a cost (we might be forced to dereference).
I will investigate a bit more on this to ensure that we're not pre-evaluating functions that might be evaluated later (especially if we can guarantee that they will be evaluated only once).
57dde01
to
c36fb14
Compare
NOTE: This CR replaces #331 . This is done due to the fact that history on the other PR is quite confusing.
The goal of this PR is, as mentioned in the title, unmarshaling performance enhancement.
NOTE: A similar PR will be published for applying the same approach described here for the marshaling process. I would like to have this verified first in order to avoid having two massive branches open at the same time
In order to achieve unmarshaling performance boost I've rewritten the whole
bravado_core.unmarshal
module.The current, before this PR, logic of unmarshaling is:
This PR aims to cache results of point 1 and 2 for later "cost-free" usage.
While making this changes I've noticed that there might be no real good reason to have all the unmarshaling functions public, users of the library commonly use
bravado_core.unmarshal.unmarshal_schema_object
.In order to simplify future maintenance work I've marked all the other functions as deprecated and will be removed in the next major release. Doing so will allows us, if needed, to modify more the internals of unmarshaling without worrying about breaking backward compatibility.
Let's get to the juicy part 😄
Raw data on: https://gist.github.com/macisamuele/9ff23d3e5e81c40cd884425a7eae5751
This PR provides a massive performance improvement:
Be aware that:
bravado_core.unmarshal.get_unmarshaling_method
is wrapped by_decorators.wrap_recursive_call_exception
andmemoize_by_id
decorators._decorators.wrap_recursive_call_exception
decorator is needed as models could be recursive and so invokingget_unmarshaling_method
might end up on an infinite recursion.In order to proactively deal with it I've modified
@memoize_by_id
decorator to raise a controlled exception if an unbounded recursion is identified (ℹ️ the interpreter would raise anyway with aRuntimeError: maximum recursion depth exceeded
)memoize_by_id
decorator is needed to speed up evaluation of the marshaling method for the same spec and schema. This effectively is one of the major performance boost (basically the expensive point 1 and 2 of the original flow become, more or less, a dictionary lookup)tests/unmarshal/unmarshal_object_test.py
) to guarantee that functionally nothing changed ;)