Clone this wiki locally
Reference Counting Guide
This is a quick guide to the rules for reference counting that I wrote for myself while trying to track down some refcounting issues in the closures code. I wrote this for myself, but it's probably useful to anyone who's trying to modify the parts of Cython that generates reference counts.
Of course, we only refcount Python objects. In all of the text below, everything is assumed to be a
Where's the Code?
Most of the reference counting code is concentrated in two places:
FuncDefNodeemits all the code for reference counting that goes at the top or bottom of a C function.
NameNodedoes all the decision-making for reference counting around an assignment statement.
What are the reference counting operations?
There are essentially four reference counting operations available in Cython: two are essentially macros from the Python/C API, and two more are available via the refnanny. All of these take a single argument, which should be a non-
PyObject *. Each of these also has a variant with an
X in the name (such as
__Pyx_XDECREF), which is identical except that it accepts
NULL as an argument (and does nothing in that case).
__Pyx_INCREF(obj): This corresponds to
Py_INCREFin the Python/C API. This simply increments the reference count of
__Pyx_DECREF(obj): This corresponds to
Py_DECREFin the Python/C API. This decrements the reference count of
objby 1. As noted in the Python docs, calling this function can lead to a collection happening, which could involve arbitrary code getting executed, i.e. anything in a
__del__method. (This is worth keeping in mind if you're seeing strange issues with control flow after a decref.)
__Pyx_GOTREF(obj): This tells the refnanny that the reference count of
objwas incremented by some other function. A common example of this is whenever we create an object via a
tp_newcall: the new object gets created with a refcount of 1, so we do a
__Pyx_GOTREFafterward to tell the refnanny that this object has a reference count of 1.
__Pyx_GIVEREF(obj): This tells the refnanny that one reference to
objis now owned by another Python object. This happens in the case of a
returnstatement, for instance -- the reference owned by our local variable is now owned by whoever received the return value.
In the case that the refnanny is turned off, the
DECREF calls simply become the corresponding Python/C API calls, and the
GIVEREF simply disappear. If the refnanny is turned on, then all four of these become calls into the refnanny, which increment and decrement the reference counts it maintains, and then call the Python/C API if necessary.
Another way of thinking about the
DEC operations vs. the
GIVE operations is this: if you're doing an operation that leads to changing the number of references to an object, you're responsible for doing the appropriate incref or decref. On the other hand, if you're making calls to the Python/C API which change the reference count of an object, you should do a giveref or gotref to tell the refnanny what happened.
When should reference counts change?
The general rule is the obvious one: whenever we make a new reference to some Python object, you should do an incref, and whenever you set that variable to point to something else, you should do a decref. Of course, there are reasons that we want to break this rule, most commonly in the name of optimization. If we are generating both an incref and decref statement, and we don't have to worry about anything suspect happening in-between, we can simply eliminate both. Of course, there are plenty of things that can happen in between; here are two standard examples to keep in mind:
- Variables can be set in only one branch of an
ifstatement, or any other sort of branching operation; in this case, it's usually easier to just include the reference counting ops than it is to try and determine that it's safe to eliminate them.
- Variables can get reassigned in the body of a loop statement. In this case, it's often possible that we'd want different behavior in the first iteration of the loop vs. later iterations. Again, it's easier to just include a few extra increfs or decrefs in this case.
Rules for Cython refcounting code generation
Here are the rules for what reference counting code is generated.
- Being referenced in the scope object associated to a closure is a reference we have to count ourselves, and it's owned by the scope object that represents the closure. These references are destroyed by the decref statements that happen in the
tp_deallocmethod of the scope object itself.
- Objects created by a constructor (i.e. a call to a
tp_newmethod) come back with a reference count of 1.
- If a variable comes in as an argument, is never reassigned, and is not captured in a closure, then by rights we should incref at the start and decref at the end. However, in this case it's safe to just eliminate both.
- If a variable comes in as an argument, and is reassigned, we emit an incref at the start of the function, and a decref at the point where it's reassigned.
- Local variables get an incref when first used, and a decref at the end. (Local variables that don't get used shouldn't exist.)
- The scope object associated to a closure owns a reference to each of the entries inside. When first setting up a scope object, we'll do an incref and a giveref for each of the items inside. The incref is for the reference owned by the scope object, which will only disappear once the scope object itself is destroyed (i.e. in the corresponding
tp_deallocmethod). Since this will happen outside the purview of the refnanny, we also do a giveref, to tell the refnanny not to worry about the corresponding decref.
- The rules for dealing with entries inside a scope object at the end of a function are slightly more tricky. If it's a variable in the body of the function, then there's one reference represented by that name. We want to do a giveref in this case, since that reference is now owned by the scope object, and the corresponding decref will happen when
tp_deallocis called on the scope object. If it came in as an argument, we still need to do a giveref. However, if it came in as an argument, and it wasn't reassigned, then we also need to do the incref we skipped at the beginning of the function. Quick recap for scope-related objects:
- local variable: giveref
- argument, no assignments: incref + giveref
- argument, reassigned: giveref
- scope object itself: decref
- It's easy to check whether or not a name gets reassigned. The corresponding entry in the symbol table has a field called
self.assignments, which is a list of all assignment statements where it's the left hand side. So it's reassigned iff
- Yes, you do need to keep accurate reference counts for
Py_None. (I remember hearing that this would change in Python 3, but the online docs suggest otherwise.) This is probably easier to work with from the code-generation point of view (i.e. you don't want to require special cases every time you generate an incref statement), but it strikes me as humorous that cycles get spent reference-counting a unique object. Since we have a few central methods that generate the reference counting statements in the output, we could definitely easily eliminate much of this code in Cython (i.e. all the literal
__Pyx_INCREF(Py_None)calls), but the indirect calls (
a is None) would be harder to spot. I doubt this is much of a gain anyway.