-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in yajl2_c backend #28
Comments
This problem was initially reported in #28. Since memory leaks are still checked manually (i.e., there is no automatic unit tests that check memory consumption remains constant throughout an ijson execution) this leak was somehow missed until now. The problem was *not* present in the 2.x series, and this Py_DECREF() call got lost during the refactoring for version 3. Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
Thanks a lot @tomplex for the report. The problem wasn't specific to I'm actually a bit surprised that this problem wasn't reported earlier, given that keeping memory usage low is the whole point of ijson. But on the other hand the leak was only triggered by array structures, so if your JSON document had few arrays you wouldn't see your memory grow much, while in array-rich documents (like yours) it became more obvious. BTW the error wasn't present in the 2.X ijson versions, and the leak was introduced by a line of code that got lost during the code refactoring for version 3. Please give the new code a try (it's already on the |
Oh, and BTW, the garbage collector bit is unnecessary -- memory was leaked not because of circular references, but simply because its count references were not reduced. In other words users shouldn't ever need to call the garbage collector. I take it you added it to your tests to ensure you were doing all you could to clean memory usage, but with the fix in place you shouldn't even see your memory grow in the first place. |
Thanks for the quick response @rtobar! I'm running my test again right now, I'll let you know how it looks. |
Awesome, it looks good! Here's the output: using backend yajl2_c
starting memory usage: 0.008796 GB
memory usage after reading file: 0.010052 GB
memory usage after garbage collection: 0.010052 GB As for the the garbage collector, you're right, it was just to make sure I was doing everything possible to clean up memory. I appreciate the clarification, though. Again, thanks for the very quick response. I'm glad I'll be able to switch back to the yajl2_c backend for work tomorrow, because it is way faster. =) thanks for your great work on this project!! |
No worries, I'll close this issue for the time being then and will leave a note once a new version is up in PyPI. |
ijson 3.0.2 is now available in PyPI. |
This problem was caused by calling Py_DECREF() more than once on the static, global PyObject* "item" variable that we use to keep the "item" string literal in memory. The situation only arouse when the top-level JSON value was an array, and therefore did not hit all users necessarily. More than one of our tests use arrays as their top-level JSON values that went through parse_basecoro, so in principle we should have hit the bug, but it still went unnoticed. This might have been just luck, with the python memory allocator playing in our favor and placing valid objects back into the memory previously used by the "item" variable before we decreased its reference count again. In order to reproduce the error more reliably more tests need to be added to try and hit the problem with a higher chance. This problem was originally reported in #29, and was a direct consequence of the fix introduced to fix #28. Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
Hello, Thank you for maintaining ijson. I wanted to say that this also seems to fix a segfault happening when doing something like: import ijson
with open("test.json", "rb") as fp:
l = list(ijson.items(fp, "item"))
print(len(l))
with open("test.json", "rb") as fp:
l = list(ijson.items(fp, "item"))
print(len(l)) Where the second call to |
@lopter interesting. Out of curiosity, are you sure it was fixed in 3.0.2 and not in 3.0.3? In any case it's good to hear the problem is not present anymore. |
Ok, now it makes more sense :-). Yes, that's exactly the type of situation that would have triggered the segfault reported at #29 provided that the top-level JSON value was an array. |
Opening a 1GB JSON file and iterating over it using
ijson.items
takes up a large amount of memory which can not be reclaimed by running the garbage collector.Software & Environment version:
ijson==3.0.1, Python3.8, Debian buster.
Included are a Dockerfile and Python file which fully demonstrate the issue.
Create a new directory with the Dockerfile and main.py and then run:
$ docker build -t test-ijson . $ docker run -it test-ijson using backend yajl2_c starting memory usage: 0.010604 GB memory after reading file: 5.824332 GB memory usage after garbage collection: 5.824332 GB
if you change
main.py
to use thepython
backend, there is no issue:Same with
yajl2_cffi
:Though they are, of course, much slower to test than
yajl_c
.The text was updated successfully, but these errors were encountered: