enhancements arraytypes

robertwb edited this page Jun 18, 2009 · 14 revisions
Clone this wiki locally

C arrays deserve better language support

Note: This CEP should might possibly be merged into the Cython array CEP.

Original proposal by Brian Granger.

Things that could be improved:

  • dynamically allocated and deallocated arrays. Example:

    {{{#!python def func(size):

    cdef int a[size]


    The generated code would call malloc() internally to allocate an array of the requested size and call free() at function exit.

  • array iteration and Python type coercion. Example:

    {{{#!python cdef int a[10] # fill a l = list(a) }}}

    This might require some kind of generic iteration support, possibly an extension class that coerces each array entry to the respective Python type.

    One could also support the array type as a more integral part of the language, with {{{#!python def foo(L)

    cdef int a[len(L)] a = L # unpacking code generated for x in a: # for loop and indexing generated

    print a

    L = a # python list created and filled - StefanBehnel: while I like the rest, this should better be list(a)



  • in functions that hold the GIL (i.e. that are not declared "nogil") this could use Python's own malloc functions instead of a plain system malloc(), as it is usually faster for small amounts of memory.
  • Arrays declared in this manner are allocated at declaration time (which currently means not inside a block) and cannot be resized.
  • The behavior of such arrays is as if they were allocated on the stack. This makes life much simpler, and is consistent with constant-sized arrays, but means they cannot be returned, assigned, etc.

The general consensus is that, especially as we support constant-sized arrays with this syntax, this will be a good thing to have.

See discussion at http://codespeak.net/pipermail/cython-dev/2008-April/000411.html and http://codespeak.net/pipermail/cython-dev/2008-April/000385.html

Array objects

It is natural to want to pass arrays around, assign them to variables, etc. as if they were atomic objects rather than pointers. Currently one has to manually use malloc and free (which is often done incorrectly, especially in the case of error recovery). The obvious way to do this is to tie into the Python memory management system. An example of this is given by Lisandro Dalcin:

cdef extern from "Python.h":
object PyString_FromStringAndSize(char*,Py_ssize_t) char* PyString_AS_STRING(object)
cdef inline object pyalloc_i(int size, int **i):
if size < 0: size = 0 cdef Py_ssize_t n = size * sizeof(int) cdef object ob = PyString_FromStringAndSize(NULL, n) i[0] = <int*> PyString_AS_STRING(ob) return ob
def foo(sequence):
cdef int size = len(sequence), cdef int *buf = NULL cdef object tmp = pyalloc_i(size, &buf)


It may be worthwhile to add such functionality into the language itself, or via a provide extension type (with possible syntactic sugar). The returned object would be a special array type, assignable only to arrays of the same type, recounted, with fast (1-dimensional) indexing, iteration, etc. It would be much more lightweight (but potentially faster and fewer dependancies) then using, for example, NumPy. Maybe we could support a fast append too.


What syntax would we use. In light of the recent buffer interface, what about carray[int] for the type and carray[int](size) as a constructor? Some another alternative is [int].


What about:

cdef inline void* __malloc__(int bytes): return sage_malloc(bytes)
cdef inline void __free__(void* ptr): return sage_free(ptr)

at the top-level Cython module scope in order to override the malloc-er and use sage_malloc instead?