DagSverreSeljebotn soc details

DagSverreSeljebotn edited this page May 11, 2008 · 30 revisions
Clone this wiki locally

Implementation plan: Developing Cython for easy NumPy integration

This is the details page for my [:DagSverreSeljebotn/soc: SoC application] which should be read first.

Note: I do not propose to implement everything here fully within the scope of the project, but it was convenient to list the full plan towards well integrated NumPy support on one page. Therefore some items are markes as prerequisites or bonus. The point I plan to leave off will give a good foundation and several valuable features, but will lack a final optimization regarding NumPy performance. Also, should even this prove to be too much, there is a plan B listed seperately.

The main goal is described here: [:enhancements/numpy: Better NumPy integration]

As can be seen in the examples on that page, the code for the easy-to-use approach can without too much trouble be automatically converted to code for the high-performance approach. A string-based preprocessor named PEX (not published) does this, and I plan to spend about one day studying what PEX has done so that I am sure I understand all the issues involved and don't reinvent the wheel. However there are compelling reasons for building analogous functionality directly into Cython without relying on yet another preprocessor. PEX also uses a little intuitive curly-brace syntax for array access to make the string transformation easier.

Support for external C libraries, and for accessing the internals of Python extensions for quicker access, is in Cython done by shipping a pxd-file containing the necesarry declarations for accessing the C code. Accessing numpy efficientely using for instance [http://new.scipy.org/Wiki/Cookbook/Pyrex_and_NumPy: this pxd] is not difficult for somebody who knows a bit about NumPy and C, but it is inconvenient.

The plan is to increase the power of pxd files so that it is possible to ship a more powerful numpy.pxd that contains the necesarry syntax candy and inlineable functions to make NumPy access convenient for the end-user.

For that to happen, the following wishlist items should be implemented in Cython. Note that if time gets short or some unforeseen obstacles crop up there is a plan B outlined at the bottom that I can follow instead (but this is not expected).

I have already contributed a patch to Cython that makes it possible to introduce new functionality through reasonably independent code tree transforms. The patch was well accepted in the community and will be very useful to most of these features. It will make it possible to introduce new functionality to Cython without breaking existing Cython code (http://wiki.cython.org/enhancements/parsetreetransforms).

Note: Any new syntax in the different drafts below are mostly to convey ideas rather than specifying the final syntax; final syntax must be discussed and decided upon with the Cython community.

Prerequisite - 1: Work on parse tree transforms and split type analysis and type coercion

  • [:enhancements/refactorcoercion: Details]

This subtask is well recognized between Cython developers and will hopefully be a joint effort between Cython developers during [http://wiki.sagemath.org/dev1 SAGE Developer days 1] (which I plan to attend). If this subtask cannot be accomplished within that timeframe, plan B will be seriously considered.

Some new features for Cython are best implemented as [:enhancements/parsetreetransforms: code tree transforms], essentially a stand-alone piece of code sits in a pipeline that modifies the code tree to get the desired effect. While this is a very convenient way of introducing certain functionality, there's currently a showstopper: It is not possible to insert new transforms between type analyzis and type coercion.

For NumPy implementation (and, in fact, most interesting transformations) having a transform inserted between these two is crucial, as whatever one does:
  • One must some way or another attach the NumPy-specific functionality to the type of the variable being a NumPy array
  • The information extracted from and put into the array must have support for being coerced.

2. Inlineable code in pxd files

  • [:enhancements/inlineable: Details]

As the NumPy pxd will contain inlineable code, Cython must be changed to allow code in pxd files (for the same reasons that most C and C++ compilers require inlineable code and templates to be declared in header files).

3. Types with arguments, overloading and method templates

  • [:enhancements/typeparameters: Type arguments]
  • [:enhancements/overloading: Function overloading]
  • [:enhancements/methodtemplates: Method templates]

Currently Cython doesn't support the numpy.ndarray(numpy.uint8, 2) syntax mentioned in the [:enhancements/numpy: NumPy/Cython-spec]. For NumPy, the type attributes will be used as a convenient way to make assumptions about the type of the contents in the array and its number of dimensions (which is dynamic for NumPy arrays); this is needed in order to generate efficient C code.

Function overloading is likely to be a prerequisite for method templates (or at least share enough code for it to be wise to tackle it first). Within this project this will definitely be restricted in scope to "cdef" functions (C-only, not callable from Python), which makes the task much more manageable.

Type arguments have many uses, but the one relevant for NumPy is method templates: Allowing the compiler to generate multiple versions of a method for different compile-time arguments. This is one step below full C++ template functionality, in that class/type templates are not supported. Only method templates are wanted for NumPy support as one wants to integrate nicely with the existing Python runtime object for NumPy arrays, and so does not want to create new types for different modes of access. Rather different compile-time views of the same runtime object is wanted.

4. Compile-time operator overloading

  • [:enhancements/operators: Details]

This will be the final Cython language enhancment needed for NumPy. Once this is in place, one can implement the needed operators for NumPy as templates.

5. Write NumPy pxd file

Once these features are in place, NumPy functionality can be implemented as a pxd-file.

The project can be regarded complete with this subtask. The NumPy integration will be fully working. The performance is likely to be a little bit better than the current Python access method, but not yet on the level of raw C access. However one is now in a much better position than before, only the extra steps outlined below are needed to improve the performance (which I hope to have time for within the scope of the project, but might not).

Bonus - 6. Compile-time code unroller

  • [:enhancements/inlining: Details]

This is an optional step that I hope to have time for, but it should not be considered part of the project.

The __getitem__ and __setitem__ operators takes an argument consisting of Python tuples. The main structure of this argument is known compile time and the packing and unpacking into Python tuples can thus be avoided.

This subtask is about creating a transform that does the minimum of compile-time computation and unrolling of some compile-time known bits of functions (the functions must be marked with the @cython.unroll decorator for this to happen). There are other and purer approaches to doing this kind of optimization, but this will provide just enough power to make template methods with different behaviour depending on type arguments collapse into the right version already at compile-time; so it is just what is needed for NumPy.

After this step, each array access will be one or two assembler instructions slower per dimension than the hand-written optimal code, which is deemed acceptable for now. We know that further Cython compiler optimizations to remove this overhead is possible, but there will probably not be time for doing this (if there is then great!)

Possible future work: Removing those last assembler instructions is all about caching certain data lookups outside of any for-loops in the calling code. To do this, inlining of the overloaded operator must happen Cython side; and then one can either mark up the expressions that can be cached and the dependencies of the expressions with a psuedo-function (simpler), or automatically recognize such expressions globally (advanced and depends on (wanted) Cython improvements in other areas).

Plan B: Direct NumPy transformation

This is a backup solution if something in the plan above explodes. The requirements of this approach is only the prerequisite item 1 and type arguments support in the parser (small part of subproject 3).

The backup plan is to write a plugin transformation in the Cython compilation pipeline could look specifically for the use of operators on NumPy variables and rewrite these directly in the parse tree. This is somewhat similar to what PEX is doing, but would remove the inconvenient curly braces syntax and potentially be more solid and powerful as it operates on the parse tree rather than on strings.

This is not very complicated and can be accomplished rather quickly. It is not a preferred solution because it means that the code written will be much more specific to only NumPy, and no new features are added to Cython itself along the way. This will be seriously considered if progress according to the above plan can't be made quickly enough.

Example code

See [:enhancements/numpy: Better NumPy integration] for example code accessing NumPy from Cython.

In the end, NumPy support should be possible by declaring something similar to the following pxd file:

cdef extern from "numpy/arrayobject.h":
    ctypedef class numpy.ndarray [object PyArrayObject]:
        cdef char *data
        cdef int nd
        cdef Py_intptr_t *dimensions
        cdef Py_intptr_t *strides

            cython.types.type dtype
            int nd = 1
            bool checkbounds = True

        @cython.template("rtype", "dtype", "nd", "checkbounds")
        cdef inline rtype __getitem__(numpy.ndarray(dtype, nd, checkbounds) self, object index):
            # This big function will be unrolled/computed into much simpler statements by the
            # unroller at compile time.
            # Not all cases are handled (ellipsis etc.) but it should give an idea.

            if len(index) != nd: raise ValueError("Needs exactly %d indices" % nd)
            cdef int bufidx
            if not isinstance(index, tuple):
                # Access in 1-D mode
                if checkbounds:
                    if index < 0:
                        index = self.dimensions[0] + index
                        if index < 0: raise IndexError("Array index out of range")
                    elif index >= self.dimensions[0]:
                        raise IndexError("Array index out of range")
                bufidx = index
                for subindex, dim in zip(index, range(len(index)):
                    if len(subindex) > 1:
                       ...return a slice through NumPy API...
                        if checkbounds:
                            if subindex < 0:
                                subindex = self.dimensions[dim] + index
                                if subindex < 0: raise IndexError("Array index out of range")
                            elif subindex >= self.dimensions[dim]:
                                raise IndexError("Array index out of range")
                        bufidx += self.strides[dim] * subindex
            return (<dtype*>(self.data + bufidx))[0]

       # Corresponding code for __setitem__