Skip to content

tutorials simplecdef

ChrisBarker edited this page Jul 8, 2013 · 2 revisions

Writing a simple cdef function

A cdef function is a function in Cython that is only callable from cython (not python). They can be used as a way to write small functions that will run at C speed (because they are C functions).

After looking at the profiling example in the docs:

http://docs.cython.org/src/tutorial/profiling_tutorial.html

I tried to see what could be done to get as "pure" a C function as I could. i.e. I looked at the generated code, ant tried to get as little python/cython code in there as possible. First the punchline:

You probably don't need to bother; Cython, and the C compiler, do a pretty good job off the bat. Some lessons:

*If you're looking at the generated code (which can be useful for experts), textual length is a poor indicator of actual runtime overhead for much of the auto-generated boilerplate.

  • Don't worry about getting rid of every little bit of extra code in cdef functions.
  • It's actually easier to get top performance (i.e. compiler inlining, etc) if you write small cdef functions rather than use external C ones.

The example

This is the example from the profiling tutorial:

cdef double recip_square1(double i):
    return 1./(i*i)

simple and straightforward -- but there is more that you'd expect (or I expected) of cython boilerplate generated, so I though I'd try to clean that out.

C division

Python and C have different rules for division in some cases: with negative numbers, raising an exception with divide by zero, etc. Cython injects some code into the C so that you'll get the same results from Cython as you do from Python. This adds some boilerplate to the generated code, and I thought maybe a performance hit, so I tried turning that off:

## second version: turn on cdivision
@cython.cdivision(True)
cdef inline double recip_square2(double i):
    return 1./(i*i)

indeed, this results in very clean generated C code:

static CYTHON_INLINE double __pyx_f_10calc_pi_cy_recip_square2(double __pyx_v_i) {
  double __pyx_r;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("recip_square2", 0);

  __pyx_r = (1. / (__pyx_v_i * __pyx_v_i));
  goto __pyx_L0;

  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

note that the __Pyx_RefNanny stuff are macros that essentially go away when compiled.

For comparison, I also wrote a external function in C, and called that from Cython (recip_square3.c):

// pure C function for  recip_square3
double recip_square3(double i) {
    return 1. / (i*i);
    }

(recip_square3.h):

// header for pure C function for  recip_square3
double recip_square3(double i);

and the declaration in Cython:

## third version: call an external C function
cdef extern from "recip_square3.h":
    double recip_square3 (double i)

I also tried turning on and off inlining, and hand inlining the code directly in the cython. Here are all the versions I tried: (calc_pi_cy.pyx):

# File: calc_pi_cy.pyx
#
# Test of making a "pure C" cdef function
#
# Borrowed from examples given in:
#
#  http://docs.cython.org/src/tutorial/profiling_tutorial.html
#
# Chris Barker: Chris.Barker@noaa.gov
# July 1, 2013

cimport cython

## first version: simple inlined cdef
cdef inline double recip_square1(double i):
    return 1./(i*i)

def approx_pi1(int n):
    cdef int k
    cdef double val = 0.

    for k in range(1, n+1):
        val += recip_square1( k )

    return (6 * val)**0.5

## second version: turn on cdivision
@cython.cdivision(True)
cdef inline double recip_square2(double i):
    return 1./(i*i)

def approx_pi2(int n):
    cdef int k
    cdef double val = 0.

    for k in range(1, n+1):
        val += recip_square2( k )

    return (6 * val)**0.5


## third version: call an external C function
cdef extern from "recip_square3.h":
    double recip_square3 (double i)

def approx_pi3(int n):
    cdef int k
    cdef double val = 0.

    for k in range(1, n+1):
        val += recip_square3( k )

    return (6 * val)**0.5


## fourth version: completely inline the function in cython
cimport cython

def approx_pi4(int n):
    cdef int k
    cdef double val = 0.

    for k in range(1, n+1):
        val += 1./(<double>k * <double>k)

    return (6 * val)**0.5


## fifth version: regular cdef, no inline, cdivision
@cython.cdivision(True)
cdef double recip_square5(double i):
    return 1./(i*i)

def approx_pi5(int n):
    cdef int k
    cdef double val = 0.

    for k in range(1, n+1):
        val += recip_square5( k )

    return (6 * val)**0.5

and some timing code:

#!/usr/bin/env python

"""
timing script for calc_pi examples
"""

import timeit


N = 100000
def timer(version):
    time_number = 1000
    print timeit.timeit("approx_pi%i(N)"%version,
                        number=time_number,
                        setup="from __main__ import approx_pi%i, N"%version),
    print "seconds"

from calc_pi_cy import *
for i in range(1, 6):
    print "cython version %i:"%i
    timer(i)
    # and test result:
    print eval("approx_pi%i(%i)"%(i,N))
    print

The Results!

Here is a run of the timing code:

$ ./time_calc_pi.py
cython version 1:
0.410186052322 seconds
3.14158310433

cython version 2:
0.404766082764 seconds
3.14158310433

cython version 3:
1.44341897964 seconds
3.14158310433

cython version 4:
0.403806209564 seconds
3.14158310433

cython version 5:
0.403886079788 seconds
3.14158310433

so, they all take essentially the same amount of time to run, except version 3 -- which takes a LOT longer. Version 3 is the one that calls an external C function. I haven't looked at the compiler results to see for sure, but I'm pretty sure what's happening is that the compiler can auto-inline this simple function when it's all in the same C module -- when calling an external C function, it can't be inlined, and you have the C function call overhead -- small, but a lot when you are dealing with simple functions.

So: the moral of the story (see above) -- don't bother! Cython and the C compiler do a fine job as it is.

[Note: tested with Cython 0.19.1, Python 2.7 32 bit on OS-X 10.7 (gcc 4.2)

I've attached the cython code, timing code, and a setup.py to build it all. I'd be interested to know if there is diference with other platforms/compiliers.

A setup.py to built it:

#!/usr/bin/env python

from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize

setup(
    ext_modules = cythonize( [Extension('calc_pi_cy', ['calc_pi_cy.pyx', 'recip_square3.c']),
                            ] )
     )

(All code attached to this page in a zip file)

Clone this wiki locally