-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
enhancements array
- Status: Main idea accepted; details open for tuning
- Started on implemented in Kurt Smith's Google Summer of Code: https://github.com/cython/cython/tree/gsoc-kurt-regraft.
PEP-3118 introduces a standard way of accessing binary array-like data of Python objects in a variety of layouts: Strided data, indirect data (pointers-to-arrays), and so on.
We provide native language support to access such buffers. This also provides a nice builtin array type in Cython, as well as a unified way of accessing data in Python objects, C pointers, and builtin arrays.
Examples:
cdef int[[, ]] x x = np.zeros((10, 10), dtype=np.intc) # acquire from Python object x = int[:10, :10]() # create instance of builtin array type; we do our own non-GIL refcounting for this
We also allow casting from C pointers, although we need to insert a bit of shape/stride information:
x = <int[:10:1, :10:10]>malloc(10 * 10 * sizeof(int)) ... free(x.data); x = None # both needed!
No matter how x
is assigned to, it can be accessed through the same native code. Under the hood, x
consists of a struct with information about pointer, strides, shape, access mode, etc.
- Advantages:
-
- Unifies access to array-like data across C and Python -- same compiled code can operate on both
- Convenient way to get a native array type and get rid of malloc/free for native code
- Cleaner interface for computational work than the current
np.ndarray[int]
- Necessary for efficient slicing and SIMD operations
Note: In the whole document we will use int
as a placeholder for any primitive type + structs.
- Can be assigned to
None
in the same way as cdef classes.- Fast indexing:
x[3, 4]
- Fast non-GIL slicing:
y = x[3:10, None, :]
. Passing aNone
slice has the same effect as in NumPy; a 1-length dimension is added.- Convenient copying emits optimal loop:
x[...] = y
. In SIMD we could go beyond this.- Fields:
x.ndim
,x.shape
,x.strides
,x.suboffsets
export information from PEP-3118. Strides are given in number of bytes.x.object
returns the PEP-3118 object that holds a refcount to memory. However,x
may be a sub-slice of the buffer returned; orNone
in the case of pointing to a C pointer or the builtin array type.- Access to base data pointer:
x.data
. It has the typechar*
. The reason for making thischar*
is that strides are in bytes and does not have to be a multiple of the size of the base type, so pointer arithmetic is in general a bad idea and one needs to think.
- If the buffer has a corresponding "sane" C pointer type,
x.typeddata
is available. Examples:- {{{
cdef int[::1] x # x.typeddata has type int* cdef int[::1, ::follow] x # x.typeddata has type int* (strides guaranteed to be a multiple of sizeof(int)) cdef int[::indirect, ::1] x # x.typeddata has type int** # But neither of these have typeddata, due to presence of strides: cdef int[:] x cdef int:indirect,
All operations honor @wraparound
and @boundscheck
directives. They can operate in nogil mode unless otherwise noted; if bounds-checking needs to raise an exception then the GIL will be acquired using with gil
.
Resizing might never happen, as PEP 3118 does not support it (though we could make custom extensions to the API...). But we could provide utility functions to concatenate arrays etc. into new buffers.
It seems clear that we want to have full flexibility in passing typed memory views around the code. One should be free to say
cdef class A: cdef int[:] x def f(self, obj): cdef int[:] buf = obj self.x = buf[:10]
and this should be very efficient:
cdef f(int[:} x): ...
To make slices efficient, these don't enter Python space -- obj
will never know that the slice is taken.
Since we need to call PyBuffer_Release
once our last reference disappears, we need our own reference counting -- or acquisition counting, just to keep them apart.
The levels are then:
- An object containing the
Py_buffer
as well as an acquisition count. This is allocated on the heap, acquisition counted, andPyBuffer_Release
is called when done. In the current implementation this is a Cython cdef class.- A stack/field-allocated struct
__pyx_memviewslice
that contains the unpacked shape, stride and buffer pointer, possibly re-sliced since acquisition. This is copied by value (taking care to update acquisition count in base object) when passing it to somebody else.
This remains to be hashed out. We want to support reassigning slice objects in situations such as
cdef int[:] = obj cdef int[:] curslice with nogil: for i in ... curslise = obj[i:i+10]
Since curslice
could hold our sole reference to a Py_buffer
, it may be necesarry to call PyBuffer_Release
. One option is to automatically acquire the GIL in such cases, which are rather unlikely to crop up in user code. Another is to only allow this if we can infer that it is GIL-safe (because curslice
is assigned to None above).
If we forget about Python buffers and only consider the builtin Cython array type, it would be very nice to allow using it in nogil mode without restrictions. We are then forced to do our own locking (one way or another) for the acquisition count; for instance it may be common for multiple threads to take seperate slices off the same master view. Still, a custom lock for this purpose is probably a lot faster than acquiring the GIL.
A typed memory view should coerce to either a Python memoryview
, or a more powerful Cython equivalent. We should not coerce back to the original object, because we may have resliced it. E.g., one is required to do np.asarray(mytypedview)
to get a NumPy array.
The number of dimensions are, at least for the time being, hard-coded (we could use ellipsis, int[...]
, to support arbitrary-dimensional arrays). Each dimension is given by a slice, which can specify data access mode.
Assume from cython.view import contig, indirect, strided, follow
.
Examples:
- First dimension is contiguous:
int[[:contig, ]]
orint[[:1, ]]
- Fortran-contiguous data (that is,
x.strides[1] == x.shape[0] * sizeof(int)
):int[::1, ::follow]
int**
-style "ragged" arrays with contiguous pointer-array and contiguous data-array (although the shape must be rectangular, at least according to PEP-3118):int[::indirect & contig, ::1]
int**
-style arrays where both the pointer-array and the data-array is strided:int[::indirect, ::1]
- A 2D strided array containing pointers to single values:
int[:, ::indirect]
.
- There are two orthogonal questions for each dimension:
-
- Does the array (along the dimension in question) contain the data (/the rest of the array), or a pointer to the data?
- Is the array strided or not?
We can, e.g., have a strided array of pointers, a strided array of values, and so on.
Current state: There's contig, strided, follow, direct, indirect, full
. Of these, full
means to determine direct
vs. indirect
at runtime, while strided
is a superset of contig
.
These can then be combined using the &
operator.
In some ways, this is less than ideal. Perhaps we can make sane constants for every possible combination...
The syntax is chosen partly because cython.int[[, ]]
can work in pure Python mode. This could, e.g., be a wrapper around a NumPy array, or we could rather easily write something in Cython to do the same if we can accept a Cython-compiled shadow module. Obviously, there would be no speed benefits, but it would be nice if one could write optionally compiled pure Python code.
The np.ndarray[int]
support could either be kept fully or deprecated with a rather long deprecation cycle. Either way, it is better reimplemented as syntax candy on top of passing around both a typed memory view and an associated object. For cdef
functions we could turn an argument of this type into two arguments.