Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak with oneOf in pypy #408

Closed
funous opened this issue May 18, 2018 · 3 comments
Closed

Memory leak with oneOf in pypy #408

funous opened this issue May 18, 2018 · 3 comments

Comments

@funous
Copy link

funous commented May 18, 2018

When jsonschema is used with pypy and the schema used contains a oneOf rule, the memory consumption grows indefinitely over time. This only seems to happen if there is a matching subschema, but it's not the last subschema in the oneOf list. This only happens in pypy:

pypy3 --version
Python 3.5.3 (fdd60ed87e94, Apr 24 2018, 06:10:04)
[PyPy 6.0.0 with GCC 6.2.0 20160901]

See the code and json schema below (schema.json):

import json
import jsonschema

with open("schema.json", encoding='utf-8') as schema_file:
    schema = jsonschema.Draft4Validator(json.loads(schema_file.read()))
    obj = {'field': 3}  # matches first subschema - causes a leak
    # obj = {'field': True}  # matches last subschema - does not cause a leak
    for j in range(0, 1000000):
        schema.is_valid(obj)
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "oneOf": [
    {
      "properties": {
        "field": { "type": "number" }
      }
    },
    {
      "properties": {
        "field": { "type": "boolean" }
      }
    }
  ]
}

Running this in pypy quickly shows the memory consumption going up to a few gigabytes.

This might be a pypy bug, but I was not able to re-create this leak without the use of jsonschema.

If it helps, as a workaround I use modified jsonschema with this is_valid function in jsonschema/validators.py:

        def is_valid(self, instance, _schema=None):
            # Fully iterate through the generator to avoid a pypy memory leak
            errors = list(self.iter_errors(instance, _schema))
            return not errors
@Julian
Copy link
Member

Julian commented May 19, 2018

This seems almost certainly to be a bug in pypy3 (or less likely in pypyX 6, but I can't test pypy2 6 yet because it isn't yet in homebrew...).

But pypy2 5.10.0 settles just fine on 84MB of memory usage here, while pypy3 grows to about 25GB after 2 or 3 minutes, yeah.

@funous
Copy link
Author

funous commented May 22, 2018

I've managed to make an example independent on jsonschema and opened up an issue on pypy bug tracker: https://bitbucket.org/pypy/pypy/issues/2833/memory-leak-with-non-exhausted-nested

Although the pypy documentation recommends closing iterators in finally statements manually (which would be here: https://github.com/Julian/jsonschema/blob/4f2eb1f533070300dff6853b176db490b3e14b88/jsonschema/validators.py#L283), it doesn't seem to actually work in this case. Let's see what the feedback on the pypy issue is.

@Julian
Copy link
Member

Julian commented May 28, 2018

Great! Closing this out but will keep an eye on the upstream ticket.

@Julian Julian closed this as completed May 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants