Array export and mass incref/decref #15

encukou · 2023-05-10T09:51:31Z

For a stable ABI -- of any flavor -- incref and decref will need to be functions, and so they'll be relatively expensive. We should allow minimizing their usage.

We're missing ways to:

Export Python containers to an array of PyObject* (strong references)
- Preferably in batches (one by one iteration is too slow, everything at once eats memory)
- Preferably with an option to convert to raw integers
Mass decref an array of PyObject*
Mass incref an array, to pass it on

Discussion in https://discuss.python.org/t/15993 (confusingly mixed with other topics -- please make a new topic rather than reply)

The text was updated successfully, but these errors were encountered:

gvanrossum · 2023-05-16T20:30:12Z

Can you sketch the batching export API?

Separately, what are the use cases for the mass incref/decref operations? When does one encounter an array of PyObject* that all need to be incref'ed?

encukou · 2023-05-17T09:46:19Z

See the linked discussion for use cases & possible solutions. I might summarize when I have more spare cycles.

what are the use cases for the mass incref/decref operations

If we add batch export for containers so users can avoid a call per each item, we need batch decref for the same reason.

If we need decref, IMO we should add incref simply for symmetry. The use cases I saw are pretty niche, but there's not much point arguing their validity :)

vstinner · 2023-07-11T23:38:47Z

Export Python containers to an array of PyObject* (strong references)

While holding a strong reference is appealing in terms of safety, I fear that it can have a big performance overhead.

In issue python/cpython#106593 I propose a different approach: proposed PyObject_AsObjectArray() gives a "read-only" access to an array of objects with borrowed references (the sequence holds references, not the view), but the proposed PyResource API makes sure that the sequence remains alive while the view is being used. The implementation only increases (Py_INCREF) the sequence reference count while the view is used, and then decreases it.

It's a similar idea than another idea of a Py_HOLD_REF() macro which keeps an object valid while it's being accessed: issue python/cpython#99481 But here the PyResource API is more generic and gives the type more freedom on how to make sure that the view remains valid while being used (it can copy memory, increment the refcount of items, etc.).

Preferably in batches (one by one iteration is too slow, everything at once eats memory)

Proposed PyObject_AsObjectArray() creates a whole view at once to then gives a raw access to a PyObject** array in C code for best performance: item = array[index].

Other approaches are discussed in issue python/cpython#106593

encukou · 2023-07-12T10:08:07Z

Resizable containers need to reallocate the underlying buffer, so keeping the container alive isn't enough.
Something like NumPy array, which has a fixed size, can have mutable contents and give you a reasonably safe pointer to its buffer. But a PyList can't.

vstinner · 2023-07-12T10:18:42Z

If we provide a read-only view, if the view contains the whole sequence or just a short slice, we must provide a way to warranty that it's safe to access safely the view under some conditions.

One conditon can be that the loop consuming the view must not modify directly or indirectly the sequence. If there is a risk that the sequence is modified, the safest option is to copy the sequence.

The contraint is that API is used for efficiency. So we need some kind of tradeoffs for performance. Obvisously, there is the safest implementation: always copy the sequence. But you loose the benefits of best performance. Or another safe option is to iterate on the list with PyIter_Next(), but here what is wanted is a direct access to an array of objects to let the compiler does its magic optimizations like vectorization, loop unrolling, etc.

encukou · 2023-07-12T11:13:26Z

Let's add the safe API first. If we need maximum efficiency API with raw access, we should add it as an alternative.

vstinner · 2023-07-12T11:48:55Z

There is no unique API fitting all use cases: #9

In python/cpython#106593 I propose to have an unsafe but efficient view by default, and safe but slow as an opt-in option.

In the majority of use cases, the unsafe option is safe in practice. And that's why people write C code, for performance.

pitrou · 2023-10-16T21:22:49Z

I agree that batching is the way forward to reconcile performance, safety and abstraction.

However I would suggest that batching be done on a caller-specified slice as in this example.

iritkatriel added the theme: object references label May 10, 2023

encukou mentioned this issue Jun 6, 2023

Evolving the Stable ABI to use functions for INCREF, DECREF, TYPE #45

Open

vstinner mentioned this issue Jul 11, 2023

C API: Add PyObject_AsObjectArray() function: get tuple/list items as PyObject** array python/cpython#106593

Closed

iritkatriel added the v label Jul 20, 2023

encukou mentioned this issue Oct 23, 2023

C API: Consider adding public PyLong_AsByteArray() and PyLong_FromByteArray() functions python/cpython#111140

Closed

iritkatriel removed the v label Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Array export and mass incref/decref #15

Array export and mass incref/decref #15

encukou commented May 10, 2023

gvanrossum commented May 16, 2023 •

edited

encukou commented May 17, 2023

vstinner commented Jul 11, 2023

encukou commented Jul 12, 2023

vstinner commented Jul 12, 2023

encukou commented Jul 12, 2023

vstinner commented Jul 12, 2023

pitrou commented Oct 16, 2023

Array export and mass incref/decref #15

Array export and mass incref/decref #15

Comments

encukou commented May 10, 2023

gvanrossum commented May 16, 2023 • edited

encukou commented May 17, 2023

vstinner commented Jul 11, 2023

encukou commented Jul 12, 2023

vstinner commented Jul 12, 2023

encukou commented Jul 12, 2023

vstinner commented Jul 12, 2023

pitrou commented Oct 16, 2023

gvanrossum commented May 16, 2023 •

edited