Skip to content

enhancements overlaypythonmodules

robertwb edited this page · 11 revisions
Clone this wiki locally

CEP 513 - pxd overlays on Python modules

  • Status: Idea

This proposal can basically be boiled down to (1) auto cimport and (2) function overloading (covered elsewhere). It will be cleaned up and simplified.

The problem

Currently, users not only have to start typing their code; they also have to learn to use the C library. For instance, naive code like this:

import math
cdef double my_sqrt(double x):
   return math.sqrt(x)

ends up being very ineffective because the Python sqrt is called, meaning conversions back and forth to objects should happen. When Cython ships with a standard library, something like this must be done instead:

import libc.math # math.h
cdef double my_sqrt(double x):
   return libc.math.sqrt(x)

...meaning that the user must start to learn the C library. (Before Cython starts shipping such a library, the situation is even worse.)

robertwb: I'm not sure it's worse, because there is much more documentation on math.h then there will ever be on "libc.math"

Note that while in many situations requring the user to get to know the "libc module" is perfectly ok, this is all about gradually adding user-friendliness one function after another.

A possible solution

In addition to the current operation, one is also allowed to use pxd files as Cython "overlays" to Python modules. This does not replace current operation modes but is an additional feature. This means:

  • When an import statement is encountered, Cython searches it's path for a pxd file that would match the package (using same structure), and if so, imports it as well. For instance, if the file os/path.pxd or os/path/__init__.pxd exists in CYTHONPATH (or equivalent), then import os.path automatically cimports this pxd file as well.

robertwb: This is an interesting idea. One drawback may be that any change to a cimported .pxd (direct or indirect) requires a recompile of the module, and if this graph is auto-generated to be maximal it may greatly increase compile times. (In other words, there are times when I want to import even though I could cimport.)

DagSverreSeljebotn: I don't think this is a blocker though -- one can have command-line options or #pragma-like psuedo-functions to disable auto-pxd-importing, and with time automatic precompiled headers. C++ and to some extend C already has this problem and has their solutions.


Such overlays can contain typed overloads of functions present in the Python library. In addition I propose that simply replacing any existing function (by taking and returning object parameters) should be allowed, though not recommended, in order to have a simpler system and reduce the complexity of this. Basically, a function is looked up in the pxd first, and then the Python module.

(Note: cimported pxds already work like this; ie, any declared extern C functions will shadow existing Python functions. This is pretty much automatic from Cython operation. So this is about allowing inline functions in pxd files..)

Only cdef functions are allowed, and will usually be defined "inline" (might make this mandatory too...*shrug*). For instance, "math.pxd" could look like this:

import libc.math # math.h

cdef inline double sqrt(double x):
   return libc.math.sqrt(x)

and then the first code snippet in the introduction will work as the user expected it to. Anything that doesn't match the signature (ie uses a double) will end up to the sqrt in the Python module instead.

__builtin__.pxd can contain optimizations for lots of builtins:

cdef inline int max(int a, int b): return a if a <= b a else b
cdef inline double max(double a, double b): return a if a <= b a else b

# Or, if templates gets added to Cython, simply
cdef inline T max(T a, T b) [T]: return a if a <= b a else b

Which basically means that without any explicit fiddling in Cython compiler core, simple use of normal Python min and max statements will end up as C "? :"-constructs rather than packing into Python objects, calling Python function....


Some support might be added for the same regarding classes.

does not seem to be considered here.

For instance, optimizations for the list type might be done list this in __builtin__.pxd:

import cpython.listobject # Cython distribution of listobject.h

cdef inline int len(list l):
    cdef cpython.listobject.PyListObject raw = l # Might want to think more about this
    return raw.ob_size

typedmixin list:
    cdef inline int append(list self, object o):
        return cpython.listobject.PyList_Append(self, o)

Again, calling a member not found in the typemixin would be looked up the usual way (through the type dict). Also, nothing in the typemixin is considered if it is not safe (the conditions under which it is safe is a subject for [:enhancements/builtins: CEP 507], basically, one must have a typed variable that has the exact type without the methods in question being overriden in the dict).

However, also pure Python implemented classes can have a "Cython optimization layer" in the same way, as a way of providing a Cython optimizations when used from Cython.

There is a problem here with inheritance. One way to do it is to declare that any functions declared like this will always take precedence (i.e. they will be like "final" methods in Java, except for no compiler errors...). The other option is to check for overriden functions in the object first, and if it is not overriden, call the overlay function. If doing the latter then the first must be allowed through special syntax because classes like numpy.ndarray needs the speed they can get (one could have a "notoverridden" keyword or decorator for the method for instance; meaning that the coder takes the consequences and Cython just generates direct calls).

In time: Helping type inference

In the same pxd files one could add some hints to help Cython type inference. Below, refers to the Python int object in a type context.


# Make the type system know that len returns an int object
def len(object o)

Note that in the current function overloading proposal "def" cannot be overloaded, only "cdef", so that this doesn't represent an overload, but adding type information to functions already present in the Python libraries.

I fail to find a useful example (so perhaps this really is useless :-)), but what I am thinking is that it could provide some help for type inference; making it possible to know what the type of x = f(g(h(y))) is even if both f, g and h are in the standard Python library (which could in turn provide more optimizations -- knowing that is used opens the way for using a pure C int, even though overflows must be watched...this is a bit unclear, I'll admit. Basic type inference will have to come first.)


The advantages to NumPy support from all of this should be quite evident:
  • Finding the C math functions through the normal Python namespace is a big bonus for this audience
  • Also, typemixin could be used in order to mix in the Cython operators for ndarray. numpy.pxd would basically declare "typemixin ndarray" after the pattern above, making it clear that the ndarray object also has lots of Python behaviour that is not optimized.

Not sorted out

The relationship with the underlying struct for builtins and extension types is a bit unclear. Perhaps the "typemixin" should approach what the "struct" keyword gives today in some way.

Something went wrong with that request. Please try again.