-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak with yajl2_c backend #34
Comments
Tested with
|
@mhugo thanks for spotting this. I indeed managed to reproduce the error in all versions since 3.0. The rate of growth is thankfully very slow, or at least much slower than memory leaks found due to the parsing itself -- so probably is a memory leak on the generator objects themselves. |
@mhugo could you try the Note that this affected both sync and async generators (but not the coros). There were also a few different problems in different places, so all of https://gist.github.com/rtobar/4ee73e673b999d40c7c13381f1bf5b51 |
The C backend had a couple of memory leaks in the initialization routines of some of the generators it implements. Because these leaks occurred when generators were initialized, they did not show up easily, unless many such objects were created. This is unlike memory leaks in the parsing process, which are apparent with a full pass over a single parsing operation. Two of these leaks were solved by adding the corresponding decrease in the reference count of the objects leaked. The rest were solved by improving how we built tuples out of sequence slices, changing our previous manual, memory-leaking approach to use a simpler GetSlice operation, which is easier to ready anyway. This commit addresses #34. Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
@rtobar Thank you very much for your fast answer ! I'll have a look at your branch and let you know. |
I confirm the (I am on debian 9, sorry for the old python versions) With yajl2_c, python 2.7.13
With yajl_cffi, python 2.7.13:
With yajl2_c, python 3.5.3
With yajl_cffi, python 3.5.3:
So there is still a small leak with python 2 (confirmed by increasing the number of calls), and it happens whatever the backend ... but yeah I know Python 2 ... Side note: I've tried your test script. It works well on the first function (and confirms there is no memory leak). However it generates infinite exceptions on
|
@mhugo Thanks for reporting back. Sounds a bit suspicious that both backends leak in 2, so that could be yet something else... I might have some time during the weekend to experiment, but otherwise this will have to wait until next week. I wrote the modified code in my gist in a hurry and tried it only with 3.8. With some dedication I could probably add something like that to unit test suite though, which has been on my mind for a while. |
Further testing shows that the python 2.7 memory growth is independent from ijson. https://gist.github.com/rtobar/b1a83a4712f810e6aa227cff988609fc Running the code above gives me:
This seems like a deep, core python behavior that changed in 3. This SO answer seems to hit the nail. Also note that using Let me know if his makes sense and if you can see the same results. If you do then we can declare this solved. |
Thank you very much for your investigation !
I confirm that (and learned something). So this has nothing to do with ijson. Problem solved ! :-) Thanks again for your fast reaction. Do you plan a patch release ? Or do you have more to include before a release ? |
@mhugo I leaned something new too, so that's too of us :-) Yes, I'll make a patch release containing this fix and will release it to PyPI. I'll update this ticket once that's done. |
A new 3.1.2 version of ijson containing the fix for this problem is now available on PyPI. |
Thank you very much for this effort @rtobar! ijson is a great piece of software and has saved a lot of memory and time for me. |
@LuisAlejandro I'm glad this package has been helpful :-). And we both have @isagalaev to thank, as he created it and came up with an intuitive design that makes it easy to use. |
@rtobar thanks again for your quick reaction and the fix release ! |
In #34 memory leaks were reported on the yajl2_c backend that happened at generator construction time. After investigation, we discovered one of these in the "chain" utility method, which leaked the temporary coroutines it chained together. The fix in a8159e4 (first present in 3.1.2) attempted to fix this particular problem, along with the rest of the identified memory leaks. The fix for the "chain" method was not correct though: the condition we used to identify whether the internal "coro" variable had to be decref'd was incorrect, and led to our code incorrect decref'ing the "sink" parameter given by the caller, which was meant to be left untouched. This opened the door to potential issues when the caller decref'd its reference to "sink". The problem was not immediately obvious, as depending on the program's activity, memory allocator and other factors, the memory used by the destroyed "sink" object could still be valid. In fact our unit tests never caught this error, and this problem was only reported to us about a year after 3.1.2 was released by users who saw this problem only after creating and destroying ijson.parser objects thousands of times. To reproduce this issue locally I wrote a small script that created a list of N ijson.parser generators, and then depleted them in order. Only when running with N=10k I was able to get a failure every 2 or 3 executions. After applying this patch I saw no failures, so I'm fairly confident the issue is gone. This problem was reported originally in #66, which this commit should fix. Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
In #34 memory leaks were reported on the yajl2_c backend that happened at generator construction time. After investigation, we discovered one of these in the "chain" utility method, which leaked the temporary coroutines it chained together. The fix in a8159e4 (first present in 3.1.2) attempted to fix this particular problem, along with the rest of the identified memory leaks. The fix for the "chain" method was not correct though: the condition we used to identify whether the internal "coro" variable had to be decref'd was incorrect, and led to our code incorrect decref'ing the "sink" parameter given by the caller, which was meant to be left untouched. This opened the door to potential issues when the caller decref'd its reference to "sink". The problem was not immediately obvious, as depending on the program's activity, memory allocator and other factors, the memory used by the destroyed "sink" object could still be valid. In fact our unit tests never caught this error, and this problem was only reported to us about a year after 3.1.2 was released by users who saw this problem only after creating and destroying ijson.parser objects thousands of times. To reproduce this issue locally I wrote a small script that created a list of N ijson.parser generators, and then depleted them in order. Only when running with N=10k I was able to get a failure every 2 or 3 executions. After applying this patch I saw no failures, so I'm fairly confident the issue is gone. This problem was reported originally in #66, which this commit should fix. Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
Hi,
Thank you very much for the ijson library.
I think I may have found a memory leak when using the
yajl2_c
backend. I've reused a code similar to the one found in a previous issue:https://gist.github.com/mhugo/dec469223e578ea7ec94946edcd43e6f
With yajl2_c:
With yajl2_cffi:
The text was updated successfully, but these errors were encountered: