Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leaks in Cython 3.0a1—a4 #3578

Closed
jamadden opened this issue May 6, 2020 · 12 comments · Fixed by #3593
Closed

Possible memory leaks in Cython 3.0a1—a4 #3578

jamadden opened this issue May 6, 2020 · 12 comments · Fixed by #3593

Comments

@jamadden
Copy link
Contributor

jamadden commented May 6, 2020

I'm testing gevent with Cython 3. gevent has a combination of Cython code in .pyx/pxd and pure-Python code annotated and compiled with .pxd files. (Which is awesome, BTW. Thanks again for a great project!)

gevent's test suite has a job that runs all the tests and watches for memory leaks. It basically does this by using gc.get_objects() to count all the objects extent, run a particular test, and count the objects again. If the counts don't match, this process is repeated a few times to see if a pattern emerges. When it's been run enough times to see that a particular type of object is always gaining new instances after a test run, the test is declared to leak and is failed.

Using Cython 0.29.17, this test job passes.

The test job with Cython 3.0a4 fails; many tests report leaks. For example:

  FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestGet)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "/home/travis/.runtimes/snakepit/2.7.d/lib/python2.7/site-packages/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "/home/travis/.runtimes/snakepit/2.7.d/lib/python2.7/site-packages/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "/home/travis/.runtimes/snakepit/2.7.d/lib/python2.7/site-packages/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [9, 1168, 2, 2, 2]
  RuntimeError        6        +1
  Timeout            10        +1

That's showing that the last three runs of this test resulted in two new objects each time; one new RuntimeError and one new Timeout (a kind of exception). I thought at first that maybe it was just exceptions leaking since the first reports are just of exceptions, but later on we see a variety of objects leaking:

840  FAIL: test_unordered (__main__.TestErrorInIterator)
841  ----------------------------------------------------------------------
842  Traceback (most recent call last):
843    File "/home/travis/.runtimes/snakepit/2.7.d/lib/python2.7/site-packages/gevent/testing/leakcheck.py", line 214, in wrapper
844      return _RefCountChecker(self, method)(args, kwargs)
845    File "/home/travis/.runtimes/snakepit/2.7.d/lib/python2.7/site-packages/gevent/testing/leakcheck.py", line 192, in __call__
846      while self._check_deltas(growth):
847    File "/home/travis/.runtimes/snakepit/2.7.d/lib/python2.7/site-packages/gevent/testing/leakcheck.py", line 168, in _check_deltas
848      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
849  AssertionError: refcount increased by [17, 1187, 10, 10, 10]
850  list               2061        +2
851  deque                 5        +1
852  ThreadPool            5        +1
853  instancemethod      116        +1
854  set                 291        +1
855  Queue                 5        +1
856  Semaphore             5        +1
857  fork                  5        +1
858  _Condition            7        +1

I've been using language_level=3str in most places for some time already. The only other thing that caught my eye as a possible cause was the change to make binding True by default. But I changed that directive in my setup.py and tested locally and still get the failures.

Locally I've tested 3.0a1 through 3.0a4, inclusive, and get the failures on all versions. (I also tried git master as-of now—I think—and have the same issues.)

Next, I tried not using Cython compiled versions of the pure-Python modules, keeping just the compiled .pyx files, and locally the tests pass again (let me know if that wasn't clear). So it appears to have something to do with that mode?

I haven't tried to create a smaller example yet because I was hoping the problem might just be obvious :) Please let me know what other info I can provide or how I can help track this down.

@da-woods
Copy link
Contributor

da-woods commented May 6, 2020

It might be possible to get a more detailed diagnosis by setting CYTHON_REFNANNY as a cflag when you run the tests - https://github.com/cython/cython/wiki/enhancements-refnanny. That's Cython's own mechanism for trying to identify memory leaks. It'd usually give you some line numbers for the generated C code.

@scoder
Copy link
Contributor

scoder commented May 7, 2020

Yeah, without knowing the code base at all, this is going to be really difficult to debug.
+1 for giving the refnanny a run over it. You just need an installed Cython version and then define the C macro CYTHON_REFNANNY=1 (with -D in your CFLAGS, or in your setup.py).

@scoder
Copy link
Contributor

scoder commented May 8, 2020

FWIW, I tried building gevent on my side, noticed that it has non-trivial dependencies, tried to follow the travis build script for setting those up, and failed. Miserably.
@jamadden it would be nice if you could try a test run with the refnanny on your side and if it complains somewhere, paste the respective C code snippets here.

@jamadden
Copy link
Contributor Author

jamadden commented May 8, 2020

gevent should be pretty easy to build; the Travis script is complicated to allow caching. Here's how I ran with the refnanny just now. I didn't see any complaints.

$ mktmpenv -p /opt/local/bin/python2.7  # clean virtual env; leak tests are (still) most heavily tested under Python 2.7
Running virtualenv with interpreter /opt/local/bin/python2.7
New python executable in //tmp-cbef1ea9989eef/bin/python
Installing setuptools, pip, wheel...
done.
...
$ git clone https://github.com/gevent/gevent
$ cd ./gevent
$ pip install -U 'cython>=3.0a4' cffi setuptools # needed to build
Collecting cython
  Using cached Cython-3.0a4-cp27-cp27m-macosx_10_9_x86_64.whl (1.9 MB)
Collecting cffi
  Using cached cffi-1.14.0-cp27-cp27m-macosx_10_9_x86_64.whl (173 kB)
...
$ CPPFLAGS="-DCYTHON_REFNANNY=1" python setup.py develop
Compiling src/gevent/resolver/cares.pyx because it changed.
[1/1] Cythonizing src/gevent/resolver/cares.pyx
Compiling src/gevent/libev/corecext.pyx because it changed.

building 'gevent.__greenlet_primitives' extension
ccache /usr/bin/clang -fno-strict-aliasing -fno-common -dynamic -pipe -Ofast -pipe -ffast-math -msse -msse2 -msse3 -fomit-frame-pointer -march=native -flto -ffunction-sections -Wno-parentheses-equality -isysroot/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Ofast -pipe -ffast-math -msse -msse2 -msse3 -fomit-frame-pointer -march=native -flto -ffunction-sections -Wno-parentheses-equality -DCYTHON_REFNANNY=1 -I/opt/local/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -I//include/python2.7 -I//gevent/deps -Isrc/gevent -Isrc/gevent/libev -Isrc/gevent/resolver -I. -I/opt/local/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/gevent/_greenlet_primitives.c -o build/temp.macosx-10.15-x86_64-2.7/src/gevent/_greenlet_primitives.o -Wno-unreachable-code -Wno-deprecated-declarations -Wno-incompatible-sysroot -Wno-tautological-compare -Wno-implicit-function-declaration -Wno-unused-value -Wno-macro-redefined

Finished processing dependencies for gevent==20.5.1.dev0
$ pip install objgraph # needed for the leak checks
Collecting objgraph
  Using cached objgraph-3.4.1-py2.py3-none-any.whl (17 kB)
Collecting graphviz
  Downloading graphviz-0.14-py2.py3-none-any.whl (18 kB)
Installing collected packages: graphviz, objgraph
Successfully installed graphviz-0.14 objgraph-3.4.1
$ GEVENTTEST_LEAKCHECK=1 python -m gevent.tests

GEVENTTEST_LEAKCHECK=1 //bin/python -u -mgevent.tests.test__queue
  ....FF...FF...FF...FF...FF..........FF..FF..FF..FF..FF............
  ======================================================================
  FAIL: test_raises_timeout_Timeout (__main__.TestGetInterrupt)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [3, 1109, 1, 1, 1]
  Timeout        5        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestGetInterrupt)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [5, 1111, 2, 2, 2]
  Timeout            10        +1
  RuntimeError        6        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout (__main__.TestGetInterruptChannel)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [3, 1118, 1, 1, 1]
  Timeout       15        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestGetInterruptChannel)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [5, 1121, 2, 2, 2]
  Timeout            20        +1
  RuntimeError       11        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout (__main__.TestGetInterruptJoinableQueue)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [3, 1128, 1, 1, 1]
  Timeout       25        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestGetInterruptJoinableQueue)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [5, 1131, 2, 2, 2]
  Timeout            30        +1
  RuntimeError       16        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout (__main__.TestGetInterruptLifoQueue)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [3, 1138, 1, 1, 1]
  Timeout       35        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestGetInterruptLifoQueue)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [5, 1141, 2, 2, 2]
  Timeout            40        +1
  RuntimeError       21        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout (__main__.TestGetInterruptPriorityQueue)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [3, 1148, 1, 1, 1]
  Timeout       45        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestGetInterruptPriorityQueue)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [5, 1151, 2, 2, 2]
  Timeout            50        +1
  RuntimeError       26        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout (__main__.TestPutInterrupt)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [3, 1172, 1, 1, 1]
  Timeout       55        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestPutInterrupt)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [5, 1175, 2, 2, 2]
  Timeout            60        +1
  RuntimeError       31        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout (__main__.TestPutInterruptChannel)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [3, 1180, 1, 1, 1]
  Timeout       65        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestPutInterruptChannel)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [5, 1183, 2, 2, 2]
  Timeout            70        +1
  RuntimeError       36        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout (__main__.TestPutInterruptJoinableQueue)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [3, 1190, 1, 1, 1]
  Timeout       75        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestPutInterruptJoinableQueue)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [5, 1194, 2, 2, 2]
  Timeout            80        +1
  RuntimeError       41        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout (__main__.TestPutInterruptLifoQueue)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [4, 1200, 1, 1, 1]
  Timeout       85        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestPutInterruptLifoQueue)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [6, 1203, 2, 2, 2]
  Timeout            90        +1
  RuntimeError       46        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout (__main__.TestPutInterruptPriorityQueue)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [4, 1208, 1, 1, 1]
  Timeout       95        +1

  ======================================================================
  FAIL: test_raises_timeout_Timeout_exc_customized (__main__.TestPutInterruptPriorityQueue)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "//gevent/src/gevent/testing/leakcheck.py", line 214, in wrapper
      return _RefCountChecker(self, method)(args, kwargs)
    File "//gevent/src/gevent/testing/leakcheck.py", line 192, in __call__
      while self._check_deltas(growth):
    File "//gevent/src/gevent/testing/leakcheck.py", line 168, in _check_deltas
      raise AssertionError('refcount increased by %r\n%s' % (deltas, diff))
  AssertionError: refcount increased by [6, 1211, 2, 2, 2]
  Timeout           100        +1
  RuntimeError       51        +1

  ----------------------------------------------------------------------
  Ran 66 tests in 20.825s

  FAILED (failures=20)

@scoder
Copy link
Contributor

scoder commented May 8, 2020

Hmm, too bad. It prints stuff like REFNANNY: References leaked: to stdout from the running code (because we use doctests in Cython, where stdout stands out). Is that maybe captured and hidden in your tests? Maybe there's a "verbose" option to show test outputs?

@jamadden
Copy link
Contributor Author

jamadden commented May 8, 2020

Is that maybe captured and hidden in your tests? Maybe there's a "verbose" option to show test outputs?

Yes, good point. stdout is buffered and hidden unless the test fails. If the test fails, stdout is printed. There is a verbose option that skips the buffering (or an individual test can be run directly; they're just straight forward unittest modules run with unittest.main). If I do that, I see three passing tests print some refnanny lines apparently during interpreter shutdown; none of the tests that fail with leaks print anything.

The passing tests that print are all testing imports, so they import or execute a bunch of modules without actually doing much else. The output for all of them looks like this:

| GEVENTTEST_LEAKCHECK=1 //bin/python -u -mgevent.tests.test__execmodules
  .....................................................s..............
  ----------------------------------------------------------------------
  Ran 68 tests in 0.357s

  OK (skipped=1)
  refnanny raised an exception: 'NoneType' object has no attribute 'append'
  refnanny raised an exception: 'NoneType' object has no attribute 'append'
  refnanny raised an exception: 'NoneType' object has no attribute 'append'
  refnanny raised an exception: 'NoneType' object has no attribute 'append'
  refnanny raised an exception: 'NoneType' object has no attribute 'append'
  refnanny raised an exception: 'NoneType' object has no attribute 'append'
  refnanny raised an exception: 'NoneType' object has no attribute 'append'
  refnanny raised an exception: 'NoneType' object has no attribute 'append'
  refnanny raised an exception: 'NoneType' object has no attribute 'append'
  refnanny raised an exception: 'NoneType' object has no attribute 'append'

- GEVENTTEST_LEAKCHECK=1 //bin/python -u -mgevent.tests.test__execmodules [Ran 68 tests in 0.5s]

@jamadden
Copy link
Contributor Author

jamadden commented May 8, 2020

Asking objgraph to produce a graph of the references to the leaking objects shows a bunch of objects hanging out with refcounts > 0, but no apparent objects still referencing them.
objgraph-RpbewR

Or, objects with a reference count of 2, but only one identifiable reference.
objgraph-2uLH1b

@scoder
Copy link
Contributor

scoder commented May 8, 2020

Yeah, the crucial part is to find out where these objects originate and at which point we loose the reference. We already knew which type of objects were left over in the end.
But if the refnanny doesn't complain, then the problem is almost certainly in some hand written helper function and not in generated code. Humans get things wrong much more easily than machines. :) I'll see if I can find anything suspicious.

@jamadden
Copy link
Contributor Author

jamadden commented May 8, 2020

It's looking like it has to do with keyword argument handling.

I've simplified and added some print debugging. A successful run (no leaks) outputs this; notice the begin and end ref counts of the parameter object are the same:

begin; id= 4382381952 RC 2
  before wait; id= 4382381952 RC 5
		calling get on <Queue> id= 4382381952 rc= 7
			Entered get; id= 4382381952 rc= 8 
			Exited get; id= 4382381952 rc= 8
		called get <Queue> id= 4382381952 rc= 7
  after wait; id= 4382381952 RC 5
end; RC 2

The code is basically this:

def top():
    timeout = SomeObject()
    print('begin; id=', id(timeout), 'RC', sys.getrefcount(timeout))
    middle(timeout)
    print('end; RC', sys.getrefcount(timeout))

def middle(timeout):
    print('  before wait', …)
    bottom(timeout)
    print('  after wait', …)

def bottom(timeout):
    print("\tcalling get on', …)
    get(True, timeout)
    print("\tcalled get', …)

And get is a method of a class in a pure-python module compiled with Cython that does nothing but print those two 'Entered'/'Exited' lines; it doesn't seem to matter if the class is declared as a cdef class in a .pxd or not; both cases behave the same.

And here's failing output; notice the ending ref count is too high.

begin; id= 4539311680 RC 2
  before wait; id= 4539311680 RC 5
		calling get on <Queue> id= 4539311680 rc= 7
			Entered get; id= 4539311680 rc= 8
			Exited get; id= 4539311680 rc= 8
		called get <Queue> id= 4539311680 rc= 8
  after wait; id= 4539311680 RC 6
end; RC 3

The only difference is how the timeout is finally passed:

def bottom(timeout):
    print("\tcalling get on', …)
    get(timeout=timeout)
    print("\tcalled get', …)

Of course, if there's a .pxd involved, the signature has to change; for the keyword case it's cpdef get(self, block=*, timeout=*).

@jamadden
Copy link
Contributor Author

jamadden commented May 8, 2020

I think the problem is in here:

static CYTHON_INLINE PyObject * __Pyx_PyDict_GetItemStrWithError(PyObject *dict, PyObject *name) {
    PyObject *res = PyObject_GetItem(dict, name);
    if (res == NULL && PyErr_ExceptionMatches(PyExc_KeyError))
        PyErr_Clear();
    return res;
}

That first line should be PyDict_GetItem, shouldn't it? To match all the other versions of the function? If I make that change, this test, at least, passes.

@jamadden
Copy link
Contributor Author

jamadden commented May 8, 2020

If I make that change in ModuleSetupCode.c, all gevent's leak tests pass.

jamadden added a commit to NextThought/cython that referenced this issue May 8, 2020
The bug was on Python 2, or PyPy < 7.2

Fixes cython#3578
scoder added a commit that referenced this issue May 9, 2020
…Py2 to get the semantics right of returning a borrowed reference with non-KeyError exceptions left in place.

Closes #3578
scoder added a commit that referenced this issue May 9, 2020
…y leak (GH-3593)

Reimplement __Pyx_PyDict_GetItemStrWithError() as a hacky version in Py2 to get the semantics right of returning a borrowed reference with non-KeyError exceptions left in place.

Closes #3578
@scoder scoder added this to the 3.0 milestone May 9, 2020
@scoder
Copy link
Contributor

scoder commented May 9, 2020

Thanks for the report and the investigations @jamadden.

clrpackages pushed a commit to clearlinux-pkgs/gevent that referenced this issue Jun 8, 2020
…0.5.2

Jason Madden (58):
      Back to development: 20.5.1
      Avoid closing the same libuv watcher object twice.
      Add faulthandler.
      Fix compilation with Cython 3.
      Go back to pinning Cython < 3.
      Add link to Cython issue cython/cython#3578
      Working on an improved loop exit protocol.
      Clean hub resources in a compatible way.
      Calling debug() on a destroyed libuv loop would raise TypeError
      More safely handle getting a callback from libuv once we're destroyed.
      More safely handle getting a callback from libuv once we're destroyed.
      Properly clean up native resources in gevent.signal_handler() when cancel is called.
      Update to PyPy 7.3.1 on Travis.
      Fixes for PyPy 7.3.1
      Make waiters on Events that arrive while the event is ready and notifying greenlets that had to block run after they do.
      Testrunner: Ignore warnings from dns.zone
      Fix calling gevent.wait() twice on a set Event.
      fixup
      Missed trailing colon.
      Add postrelease hook to stop zest.releaser from modifying CHANGES.rst in its last step.
      Tweak documentation for baseserver.max_accept. [skip ci]
      Rename C accelerator modules to avoid clashes.
      Tweak a URL in the docs. [skip ci]
      Add failing test for #1437
      Progress on cross-thread locks.
      Missing import.
      Adjust when and how Semaphore captures the hub.
      Cleanup to pass leak tests.
      More fixes for leakchecks.
      test__threading_native_before_monkey was depending on a side-effect.
      Fix benchmark, and make AsyncResult use the provided API.
      Update to Cython 3.0a5. Fixes #1599
      Make zope.interface/event and setuptools required dependencies.
      Update to libuv 1.38
      Try configuring with bash.
      win/udp.c is incompatible with Visual Studio 9
      Debugging output for libuv/libuv#2862
      Also automake version.
      Update from Ubuntu 16.04 to 18.04
      Try one line?
      m4/ax_pthread.m4 was being ignored by m4/.gitignore.
      Revert update to bionic.
      Bump c-ares to 1.16.1. Fixes #1624.
      whitespace
      manpage patch
      Restore config.guess/config.sub to c-ares.
      Use latest xcode
      Add libresolv to the libraries needed to link c-ares on macOS.
      distutils cflags may be None, e.g., on Win
      Tweak links in readme. [skip ci]
      Preparing release 20.5.1
      Back to development: 20.5.2
      Try to only upload gevent wheels on appveyor.
      Catch greenlet.error when destroying a hub from another thread.
      Attempt better warnings handling.
      default action should be first.
      Don't get a warning about ResourceWarning on Py2.
      Preparing release 20.5.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants