Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear lifetimes of borrowed references #5

Open
encukou opened this issue May 9, 2023 · 14 comments
Open

Unclear lifetimes of borrowed references #5

encukou opened this issue May 9, 2023 · 14 comments

Comments

@encukou
Copy link
Contributor

encukou commented May 9, 2023

Lots of functions return borrowed references, but in many cases it's not clear what's lending the reference, and how to ensure that lender keeps the reference alive as long as necessary.

For example, PyList_GetItem is very tricky to use 100% correctly -- it's relatively obvious that the list is the lender (so you need to keep it around), but it's not obvious that arbitrary Python code (like a finalizer) could clear the list and drop the reference.

@markshannon
Copy link

To me reading, this implies that borrowed references can be safely returned, and that we just need some additional mechanism/documentation to make them easier to use safely.

I think it would be simpler to say that borrowed references returned from functions cannot be safe in general, so should be treated as always unsafe.

@encukou
Copy link
Contributor Author

encukou commented May 10, 2023

I agree (with caveats), but that's a solution rather than the problem. This repo is for problem statements.

@markshannon
Copy link

Returning borrowed references is, in general, unsafe. So it is a problem that we return them.

What I'm saying is that the problem of unclear lifetimes is more simply stated as the more general problem that returning borrowed references is unsafe.
Unclear lifetimes is one example of why it is unsafe, but there are others, e.g. consistency and lack of ownership.

@encukou
Copy link
Contributor Author

encukou commented May 10, 2023

Could you file a separate issue (or more)?
It's perfectly fine to borrow a tuple element in current CPython. Or, extending the discussion to memory in general, the utf8 buffer from a string. Both may make future changes harder, but since borrowing is faster, people want it quite strongly.

@gvanrossum
Copy link

I'm with Mark here. While initially it appears convenient to return a borrowed value, the bugs it can cause are too subtle, and we've fixed many bugs in CPython core related to this over the years.

@markshannon
Copy link

since borrowing is faster, people want it quite strongly.

Let's correct this to "since borrowing is believed to be faster, people want it quite strongly."

Performance is a subtle thing. Consistency makes for easier reasoning about behavior, which makes for easier optimization, which makes for better performance.
Regular, consistent behavior (from the VM's point of view) is often what gives the best performance.

@encukou
Copy link
Contributor Author

encukou commented May 17, 2023

We could maybe¹⁾ also correct it to “since borrowing is currently faster, people want it quite strongly.”

In the long run, you're right. But people want speed now, and will complain if you take things away without an immediate/certain replacement.

¹⁾ I haven't run the benchmarks so I might be wrong.

@vstinner
Copy link
Contributor

IMO we should design a migration plan to move away from borrowed references. The goal would be to get ride of them by default in the long term. A C extension should be able to define a macro which hides any function returning borrowed references.

The number of affected functions is big, and some of them are widely used like Py_TYPE() or PyDict_GetItem(). A first item would be to ease formatting an object type name in an error message without having to use the common Py_TYPE(obj)->tp_name pattern. I made a change related to that but it got reverted.

Also I failed to find a generic replacement for "get n-th item" API where the index is a C integer: PyTuple_GetItem() and PyList_GetItem() are only for tuple or list, and return a borrowed reference. The HPy API has a "GetItemi" API for that if i recall correctly.

@gvanrossum
Copy link

This might also deserve a separate issue:

Also I failed to find a generic replacement for "get n-th item" API where the index is a C integer: PyTuple_GetItem() and PyList_GetItem() are only for tuple or list, and return a borrowed reference. The HPy API has a "GetItemi" API for that if i recall correctly.

See #30 (comment), where I call such APIs convenient. We might want a generic PyObject_GetItemIndex that takes a sequence and an integer index and returns a new reference to the corresponding item. (There are some additional design questions.)

@davidhewitt
Copy link

We might want a generic PyObject_GetItemIndex that takes a sequence and an integer index and returns a new reference to the corresponding item. (There are some additional design questions.)

What would the difference be between PyObject_GetItemIndex and the existing PySequence_GetItem ?

@gvanrossum
Copy link

What would the difference be between PyObject_GetItemIndex and the existing PySequence_GetItem ?

Nothing. Sorry, I thought the latter would take a PyLong argument but it just takes a C integer. I'm guessing @vstinner (who originated this a few comments above) forgot about PySequence_GetItem too.

@vstinner
Copy link
Contributor

Oh right, there is already PySequence_GetItem(). I forgot it.

@vstinner
Copy link
Contributor

See also Function must not return a pointer to content without an explicit resource management: issue #57.

@iritkatriel iritkatriel added the v label Jul 20, 2023
@encukou
Copy link
Contributor Author

encukou commented Oct 23, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants