New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference stealing #2722

Open
jakirkham opened this Issue Nov 16, 2018 · 2 comments

Comments

Projects
None yet
3 participants
@jakirkham
Contributor

jakirkham commented Nov 16, 2018

Opening this issue to discuss how Cython handles reference stealing. Have noted that some functions account for this by returning PyObject* instead of object. However with arguments this strategy doesn't seem to work correctly. Namely the PyObject* will still result in the object's reference count decremented and may require an explicit cast. Given this, am raising this issue to figure out how we should handle cases where the reference is stolen. This comes up with some Python and NumPy API functions that have this behavior (e.g. in-place operations and macros).

@scoder

This comment has been minimized.

Contributor

scoder commented Nov 16, 2018

Good question. :) Let me collect a couple of thoughts on this.

First of all, regarding the examples at hand, the obvious solution for users is to not use PyList_SetItem(lst, i, obj) but lst[i] = obj. For a typed list, Cython can optimise the latter but not the first (so the assignment is likely faster than the C-API call), and the code readability also suffers a lot from such misoptimisations.

Then, it feels to me like this is not really a bug in Cython, because Cython does the right thing, seen from the point of the caller. It's the function itself that misbehaves, in a way. It is still a usability issue, because Cython can't help with this kind of contract and does not guard against accidental misuse.

Changing the argument type would be a signal to the user that something unusual is going on, and that it's up to the user to take care of it. It's not perfect, because PyObject* is just a clumsy way to say object, and users could just replace func(obj) with func(<PyObject*>obj) to make it compile again, but not a single bit safer. It would raise the bar, at least. But it's also something that will be difficult to get away from again if we later decide that we want Cython to be more helpful.

But then, what could Cython do? Automatically increase the reference count of the argument when it gets passed? That would go against the intention of these functions, which specifically want to avoid useless reference counting.

What Cython could do is to set the reference to NULL after the call (without decrefing it) and then generating NULL checking code wherever necessary when the variable is used. Note that the call might happen inside of a loop, if-block, try-except/finally and whatnot, and the variable might also be part of a closure, so further usage might happen in various places. We have a mechanism for this, it's used for closure variables, since they may or may not have been initialised in other places that use it. But it also increases the runtime overhead, meaning that calling a function on a variable could introduce totally unpredictable slow downs in other parts of the code. Not great at all.

Personally, I think we'd be better off entirely without such functions. They are just too error prone by design.

I'd be happy to hear other thoughts on this, but could live with just raising the bar for users of such functions by requiring an explicit cast to make them aware that they are on their own.

@robertwb

This comment has been minimized.

Contributor

robertwb commented Nov 16, 2018

I agree that all reference-stealing function should be marked with a large warning.

The problem is that there's not really a "correct" way to indicate this with argument types as the way to "properly" use these functions is all about what you do (or don't do) after passing the argument. This makes it hard (impossible?) to check via types.

It's worth noting that there is a certain convenience here for writing against the raw C API, as one can pass new references to these functions without worrying about cleaning them up. Increfing before the call might not be that bad (and try to optimize it away if it matters).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment