-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible memory leaks in Cython 3.0a1—a4 #3578
Comments
It might be possible to get a more detailed diagnosis by setting |
Yeah, without knowing the code base at all, this is going to be really difficult to debug. |
FWIW, I tried building gevent on my side, noticed that it has non-trivial dependencies, tried to follow the travis build script for setting those up, and failed. Miserably. |
gevent should be pretty easy to build; the Travis script is complicated to allow caching. Here's how I ran with the refnanny just now. I didn't see any complaints. $ mktmpenv -p /opt/local/bin/python2.7 # clean virtual env; leak tests are (still) most heavily tested under Python 2.7
Running virtualenv with interpreter /opt/local/bin/python2.7
New python executable in //tmp-cbef1ea9989eef/bin/python
Installing setuptools, pip, wheel...
done.
...
$ git clone https://github.com/gevent/gevent
$ cd ./gevent
$ pip install -U 'cython>=3.0a4' cffi setuptools # needed to build
Collecting cython
Using cached Cython-3.0a4-cp27-cp27m-macosx_10_9_x86_64.whl (1.9 MB)
Collecting cffi
Using cached cffi-1.14.0-cp27-cp27m-macosx_10_9_x86_64.whl (173 kB)
...
$ CPPFLAGS="-DCYTHON_REFNANNY=1" python setup.py develop
Compiling src/gevent/resolver/cares.pyx because it changed.
[1/1] Cythonizing src/gevent/resolver/cares.pyx
Compiling src/gevent/libev/corecext.pyx because it changed.
…
building 'gevent.__greenlet_primitives' extension
ccache /usr/bin/clang -fno-strict-aliasing -fno-common -dynamic -pipe -Ofast -pipe -ffast-math -msse -msse2 -msse3 -fomit-frame-pointer -march=native -flto -ffunction-sections -Wno-parentheses-equality -isysroot/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Ofast -pipe -ffast-math -msse -msse2 -msse3 -fomit-frame-pointer -march=native -flto -ffunction-sections -Wno-parentheses-equality -DCYTHON_REFNANNY=1 -I/opt/local/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -I//include/python2.7 -I//gevent/deps -Isrc/gevent -Isrc/gevent/libev -Isrc/gevent/resolver -I. -I/opt/local/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/gevent/_greenlet_primitives.c -o build/temp.macosx-10.15-x86_64-2.7/src/gevent/_greenlet_primitives.o -Wno-unreachable-code -Wno-deprecated-declarations -Wno-incompatible-sysroot -Wno-tautological-compare -Wno-implicit-function-declaration -Wno-unused-value -Wno-macro-redefined
…
Finished processing dependencies for gevent==20.5.1.dev0
$ pip install objgraph # needed for the leak checks
Collecting objgraph
Using cached objgraph-3.4.1-py2.py3-none-any.whl (17 kB)
Collecting graphviz
Downloading graphviz-0.14-py2.py3-none-any.whl (18 kB)
Installing collected packages: graphviz, objgraph
Successfully installed graphviz-0.14 objgraph-3.4.1
$ GEVENTTEST_LEAKCHECK=1 python -m gevent.tests
…
GEVENTTEST_LEAKCHECK=1 //bin/python -u -mgevent.tests.test__queue
....FF...FF...FF...FF...FF..........FF..FF..FF..FF..FF............
======================================================================
FAIL: test_raises_timeout_Timeout (__main__.TestGetInterrupt)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [3, 1109, 1, 1, 1]
Timeout 5 +1
======================================================================
FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestGetInterrupt)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [5, 1111, 2, 2, 2]
Timeout 10 +1
RuntimeError 6 +1
======================================================================
FAIL: test_raises_timeout_Timeout (__main__.TestGetInterruptChannel)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [3, 1118, 1, 1, 1]
Timeout 15 +1
======================================================================
FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestGetInterruptChannel)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [5, 1121, 2, 2, 2]
Timeout 20 +1
RuntimeError 11 +1
======================================================================
FAIL: test_raises_timeout_Timeout (__main__.TestGetInterruptJoinableQueue)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [3, 1128, 1, 1, 1]
Timeout 25 +1
======================================================================
FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestGetInterruptJoinableQueue)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [5, 1131, 2, 2, 2]
Timeout 30 +1
RuntimeError 16 +1
======================================================================
FAIL: test_raises_timeout_Timeout (__main__.TestGetInterruptLifoQueue)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [3, 1138, 1, 1, 1]
Timeout 35 +1
======================================================================
FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestGetInterruptLifoQueue)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [5, 1141, 2, 2, 2]
Timeout 40 +1
RuntimeError 21 +1
======================================================================
FAIL: test_raises_timeout_Timeout (__main__.TestGetInterruptPriorityQueue)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [3, 1148, 1, 1, 1]
Timeout 45 +1
======================================================================
FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestGetInterruptPriorityQueue)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [5, 1151, 2, 2, 2]
Timeout 50 +1
RuntimeError 26 +1
======================================================================
FAIL: test_raises_timeout_Timeout (__main__.TestPutInterrupt)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [3, 1172, 1, 1, 1]
Timeout 55 +1
======================================================================
FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestPutInterrupt)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [5, 1175, 2, 2, 2]
Timeout 60 +1
RuntimeError 31 +1
======================================================================
FAIL: test_raises_timeout_Timeout (__main__.TestPutInterruptChannel)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [3, 1180, 1, 1, 1]
Timeout 65 +1
======================================================================
FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestPutInterruptChannel)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [5, 1183, 2, 2, 2]
Timeout 70 +1
RuntimeError 36 +1
======================================================================
FAIL: test_raises_timeout_Timeout (__main__.TestPutInterruptJoinableQueue)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [3, 1190, 1, 1, 1]
Timeout 75 +1
======================================================================
FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestPutInterruptJoinableQueue)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [5, 1194, 2, 2, 2]
Timeout 80 +1
RuntimeError 41 +1
======================================================================
FAIL: test_raises_timeout_Timeout (__main__.TestPutInterruptLifoQueue)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [4, 1200, 1, 1, 1]
Timeout 85 +1
======================================================================
FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestPutInterruptLifoQueue)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [6, 1203, 2, 2, 2]
Timeout 90 +1
RuntimeError 46 +1
======================================================================
FAIL: test_raises_timeout_Timeout (__main__.TestPutInterruptPriorityQueue)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [4, 1208, 1, 1, 1]
Timeout 95 +1
======================================================================
FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestPutInterruptPriorityQueue)
----------------------------------------------------------------------
Traceback (most recent call last):
File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
return _RefCountChecker(self, method)(args, kwargs)
File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
while self._check_deltas(growth):
File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
AssertionError: refcount increased by [6, 1211, 2, 2, 2]
Timeout 100 +1
RuntimeError 51 +1
----------------------------------------------------------------------
Ran 66 tests in 20.825s
FAILED (failures=20) |
Hmm, too bad. It prints stuff like |
Yes, good point. stdout is buffered and hidden unless the test fails. If the test fails, stdout is printed. There is a verbose option that skips the buffering (or an individual test can be run directly; they're just straight forward The passing tests that print are all testing imports, so they import or execute a bunch of modules without actually doing much else. The output for all of them looks like this:
|
Yeah, the crucial part is to find out where these objects originate and at which point we loose the reference. We already knew which type of objects were left over in the end. |
It's looking like it has to do with keyword argument handling. I've simplified and added some print debugging. A successful run (no leaks) outputs this; notice the begin and end ref counts of the parameter object are the same:
The code is basically this: def top():
timeout = SomeObject()
print('begin; id=', id(timeout), 'RC', sys.getrefcount(timeout))
middle(timeout)
print('end; RC', sys.getrefcount(timeout))
def middle(timeout):
print(' before wait', …)
bottom(timeout)
print(' after wait', …)
def bottom(timeout):
print("\tcalling get on', …)
get(True, timeout)
print("\tcalled get', …) And And here's failing output; notice the ending ref count is too high.
The only difference is how the timeout is finally passed: def bottom(timeout):
print("\tcalling get on', …)
get(timeout=timeout)
print("\tcalled get', …) Of course, if there's a .pxd involved, the signature has to change; for the keyword case it's |
I think the problem is in here: static CYTHON_INLINE PyObject * __Pyx_PyDict_GetItemStrWithError(PyObject *dict, PyObject *name) {
PyObject *res = PyObject_GetItem(dict, name);
if (res == NULL && PyErr_ExceptionMatches(PyExc_KeyError))
PyErr_Clear();
return res;
} That first line should be |
If I make that change in ModuleSetupCode.c, all gevent's leak tests pass. |
The bug was on Python 2, or PyPy < 7.2 Fixes cython#3578
…Py2 to get the semantics right of returning a borrowed reference with non-KeyError exceptions left in place. Closes #3578
Thanks for the report and the investigations @jamadden. |
…0.5.2 Jason Madden (58): Back to development: 20.5.1 Avoid closing the same libuv watcher object twice. Add faulthandler. Fix compilation with Cython 3. Go back to pinning Cython < 3. Add link to Cython issue cython/cython#3578 Working on an improved loop exit protocol. Clean hub resources in a compatible way. Calling debug() on a destroyed libuv loop would raise TypeError More safely handle getting a callback from libuv once we're destroyed. More safely handle getting a callback from libuv once we're destroyed. Properly clean up native resources in gevent.signal_handler() when cancel is called. Update to PyPy 7.3.1 on Travis. Fixes for PyPy 7.3.1 Make waiters on Events that arrive while the event is ready and notifying greenlets that had to block run after they do. Testrunner: Ignore warnings from dns.zone Fix calling gevent.wait() twice on a set Event. fixup Missed trailing colon. Add postrelease hook to stop zest.releaser from modifying CHANGES.rst in its last step. Tweak documentation for baseserver.max_accept. [skip ci] Rename C accelerator modules to avoid clashes. Tweak a URL in the docs. [skip ci] Add failing test for #1437 Progress on cross-thread locks. Missing import. Adjust when and how Semaphore captures the hub. Cleanup to pass leak tests. More fixes for leakchecks. test__threading_native_before_monkey was depending on a side-effect. Fix benchmark, and make AsyncResult use the provided API. Update to Cython 3.0a5. Fixes #1599 Make zope.interface/event and setuptools required dependencies. Update to libuv 1.38 Try configuring with bash. win/udp.c is incompatible with Visual Studio 9 Debugging output for libuv/libuv#2862 Also automake version. Update from Ubuntu 16.04 to 18.04 Try one line? m4/ax_pthread.m4 was being ignored by m4/.gitignore. Revert update to bionic. Bump c-ares to 1.16.1. Fixes #1624. whitespace manpage patch Restore config.guess/config.sub to c-ares. Use latest xcode Add libresolv to the libraries needed to link c-ares on macOS. distutils cflags may be None, e.g., on Win Tweak links in readme. [skip ci] Preparing release 20.5.1 Back to development: 20.5.2 Try to only upload gevent wheels on appveyor. Catch greenlet.error when destroying a hub from another thread. Attempt better warnings handling. default action should be first. Don't get a warning about ResourceWarning on Py2. Preparing release 20.5.2
I'm testing gevent with Cython 3. gevent has a combination of Cython code in .pyx/pxd and pure-Python code annotated and compiled with .pxd files. (Which is awesome, BTW. Thanks again for a great project!)
gevent's test suite has a job that runs all the tests and watches for memory leaks. It basically does this by using
gc.get_objects()
to count all the objects extent, run a particular test, and count the objects again. If the counts don't match, this process is repeated a few times to see if a pattern emerges. When it's been run enough times to see that a particular type of object is always gaining new instances after a test run, the test is declared to leak and is failed.Using Cython 0.29.17, this test job passes.
The test job with Cython 3.0a4 fails; many tests report leaks. For example:
That's showing that the last three runs of this test resulted in two new objects each time; one new
RuntimeError
and one newTimeout
(a kind of exception). I thought at first that maybe it was just exceptions leaking since the first reports are just of exceptions, but later on we see a variety of objects leaking:I've been using
language_level=3str
in most places for some time already. The only other thing that caught my eye as a possible cause was the change to makebinding
True by default. But I changed that directive in mysetup.py
and tested locally and still get the failures.Locally I've tested 3.0a1 through 3.0a4, inclusive, and get the failures on all versions. (I also tried git master as-of now—I think—and have the same issues.)
Next, I tried not using Cython compiled versions of the pure-Python modules, keeping just the compiled .pyx files, and locally the tests pass again (let me know if that wasn't clear). So it appears to have something to do with that mode?
I haven't tried to create a smaller example yet because I was hoping the problem might just be obvious :) Please let me know what other info I can provide or how I can help track this down.
The text was updated successfully, but these errors were encountered: