Clone this wiki locally
Proposal for a new buffer syntax
- Author: Dag Seljebotn
This propsal is obsolete, please see the Cython array type CEP.
What is the problem?
The current syntax is e.g.
cdef object[int, mode="fortran"] x cdef np.ndarray[double, ndim=2] y
This comes from viewing the buffer syntax as an optimization of the Python  operator in certain special cases. Any non-optimizable operations are passed to the underlying object. In addition, the typename controls the default access mode ("strided" vs. "indirect").
- It allows Python/NumPy syntax (except for in variable declaration), so one can simply add types to existing code
- **Disadvantages: **
- It gives no clear way of bringing about efficient slicing, efficient arithmetic operations, etc. E.g., with
cdef object[int] a = np.arange(10) cdef object[int] b b = a[5:] # made efficient print b # prints 5
Now, which Python object does b refer to? The one of a?
print b[<object>(0)] # huh, prints 0?
Or, perhaps None?
print b.foo() # crashes program * Often one forget to type the indices, or indices in the wrong way (arr[i][j]) or similar without a warning at all that the code will be slowed down by a factor of hundreds * The mechanism itself is rather crude (only optimizes one specific case), yet the syntax doesn't show this, and so one gets a "magic" feel to it.
The proposed solution would be introducing buffers as a first class native type with a new syntax.
cdef int[:] buf = obj print buf # fast access print obj.some_method() print buf.some_method() # NOT ALLOWED!
The syntax would embed everything needed to know for optimizing PEP 3118 buffer access without knowing anything about the underlying object type (like NumPy arrays) at all, or allowing operations on the object owning the buffer directly.
PEP 3118 allows for a very wide class of buffer layouts; restricting this is possible in a lot of ways and almost any restriction can give a lot of speedup.
It could work like this. Assume
from cython import strided, contig, full, ptr:
int[[,:,]]is 3D strided mode.
int[::strided, ::strided, ::strided]is the same written out in full
int[:,:,::1]is 3D C-contiguous mode
int[::contig, ::contig ::strided(1)]is again the same in full
int[[:1,::,:]]is 3D Fortran-contiguous mode
int[::full, ::full]is generic 2D buffer supporting any layout (adds lots of branches...)
int[::ptr, ::1]is a strided array of pointers to contiguous arrays.
int[::ptr(1), ::1]is a contiguous array of pointers to contiguous arrays.
Of course, all this mustn't be supported at once.
The existing usecase
An alternative to the existing syntax would be code like this:
from cython import shape # or cython.buffer def mysum(int[:] arr): cdef int s for i in range(shape(arr, 0)): s += arr[i] return s
int[:] is an alternative to
int[[,:,]] is three-dimensional and so on. This makes a clear distinction from the C array syntax and it looks more Pythonic. Also it is within the Python grammar.
int[:] accesses only the buffer, not the corresponding Python object. Coercion from objects acquire a buffer view, while coercion to objects is disallowed in earlier Python versions and gives a standard Python memoryview in newer versions (backports could also be done, though likely e.g. a
__frombuffer__ operator in
numpy.pxd for efficient
numpy.ndarray(buf) construction works better with less efforts).
- Read-only vs. read-write is automatic like today
- Mode is passed as a string, e.g.
int[:,:,"fortran"]for a 2D array with Fortran contiguous ordering. Default mode is "strided", one can pass "indirect" to get indirect indexing.
- Negative indices or not can be done by
int[[,0]], this is slightly more featureful than today (disallowing negative wraparound on second dimension only)
Main differences from today, in the context of NumPy:
def f(): cdef int[:] a = ..., b = ..., c c = a + b # would not work before new features are implemented c = a[2:110] # would not work before new features are implemented print a.flags # nope
int[:] represents only the buffer and not the NumPy array object. Slicing and arithmetic on these