tutorials simplecdef
A cdef function is a function in Cython that is only callable from cython (not python). They can be used as a way to write small functions that will run at C speed (because they are C functions).
After looking at the profiling example in the docs:
http://docs.cython.org/src/tutorial/profiling_tutorial.html
I tried to see what could be done to get as "pure" a C function as I could. i.e. I looked at the generated code, ant tried to get as little python/cython code in there as possible. First the punchline:
You probably don't need to bother; Cython, and the C compiler, do a pretty good job off the bat. Some lessons:
*If you're looking at the generated code (which can be useful for experts), textual length is a poor indicator of actual runtime overhead for much of the auto-generated boilerplate.
- Don't worry about getting rid of every little bit of extra code in cdef functions.
- It's actually easier to get top performance (i.e. compiler inlining, etc) if you write small cdef functions rather than use external C ones.
This is the example from the profiling tutorial:
cdef double recip_square1(double i):
return 1./(i*i)
simple and straightforward -- but there is more that you'd expect (or I expected) of cython boilerplate generated, so I though I'd try to clean that out.
Python and C have different rules for division in some cases: with negative numbers, raising an exception with divide by zero, etc. Cython injects some code into the C so that you'll get the same results from Cython as you do from Python. This adds some boilerplate to the generated code, and I thought maybe a performance hit, so I tried turning that off:
## second version: turn on cdivision
@cython.cdivision(True)
cdef inline double recip_square2(double i):
return 1./(i*i)
indeed, this results in very clean generated C code:
static CYTHON_INLINE double __pyx_f_10calc_pi_cy_recip_square2(double __pyx_v_i) {
double __pyx_r;
__Pyx_RefNannyDeclarations
__Pyx_RefNannySetupContext("recip_square2", 0);
__pyx_r = (1. / (__pyx_v_i * __pyx_v_i));
goto __pyx_L0;
__pyx_r = 0;
__pyx_L0:;
__Pyx_RefNannyFinishContext();
return __pyx_r;
}
note that the __Pyx_RefNanny
stuff are macros that essentially go away when compiled.
For comparison, I also wrote a external function in C, and called that from Cython (recip_square3.c):
// pure C function for recip_square3
double recip_square3(double i) {
return 1. / (i*i);
}
(recip_square3.h):
// header for pure C function for recip_square3
double recip_square3(double i);
and the declaration in Cython:
## third version: call an external C function
cdef extern from "recip_square3.h":
double recip_square3 (double i)
I also tried turning on and off inlining, and hand inlining the code directly in the cython. Here are all the versions I tried: (calc_pi_cy.pyx):
# File: calc_pi_cy.pyx
#
# Test of making a "pure C" cdef function
#
# Borrowed from examples given in:
#
# http://docs.cython.org/src/tutorial/profiling_tutorial.html
#
# Chris Barker: Chris.Barker@noaa.gov
# July 1, 2013
cimport cython
## first version: simple inlined cdef
cdef inline double recip_square1(double i):
return 1./(i*i)
def approx_pi1(int n):
cdef int k
cdef double val = 0.
for k in range(1, n+1):
val += recip_square1( k )
return (6 * val)**0.5
## second version: turn on cdivision
@cython.cdivision(True)
cdef inline double recip_square2(double i):
return 1./(i*i)
def approx_pi2(int n):
cdef int k
cdef double val = 0.
for k in range(1, n+1):
val += recip_square2( k )
return (6 * val)**0.5
## third version: call an external C function
cdef extern from "recip_square3.h":
double recip_square3 (double i)
def approx_pi3(int n):
cdef int k
cdef double val = 0.
for k in range(1, n+1):
val += recip_square3( k )
return (6 * val)**0.5
## fourth version: completely inline the function in cython
cimport cython
def approx_pi4(int n):
cdef int k
cdef double val = 0.
for k in range(1, n+1):
val += 1./(<double>k * <double>k)
return (6 * val)**0.5
## fifth version: regular cdef, no inline, cdivision
@cython.cdivision(True)
cdef double recip_square5(double i):
return 1./(i*i)
def approx_pi5(int n):
cdef int k
cdef double val = 0.
for k in range(1, n+1):
val += recip_square5( k )
return (6 * val)**0.5
and some timing code:
#!/usr/bin/env python
"""
timing script for calc_pi examples
"""
import timeit
N = 100000
def timer(version):
time_number = 1000
print timeit.timeit("approx_pi%i(N)"%version,
number=time_number,
setup="from __main__ import approx_pi%i, N"%version),
print "seconds"
from calc_pi_cy import *
for i in range(1, 6):
print "cython version %i:"%i
timer(i)
# and test result:
print eval("approx_pi%i(%i)"%(i,N))
print
Here is a run of the timing code:
$ ./time_calc_pi.py
cython version 1:
0.410186052322 seconds
3.14158310433
cython version 2:
0.404766082764 seconds
3.14158310433
cython version 3:
1.44341897964 seconds
3.14158310433
cython version 4:
0.403806209564 seconds
3.14158310433
cython version 5:
0.403886079788 seconds
3.14158310433
so, they all take essentially the same amount of time to run, except version 3 -- which takes a LOT longer. Version 3 is the one that calls an external C function. I haven't looked at the compiler results to see for sure, but I'm pretty sure what's happening is that the compiler can auto-inline this simple function when it's all in the same C module -- when calling an external C function, it can't be inlined, and you have the C function call overhead -- small, but a lot when you are dealing with simple functions.
So: the moral of the story (see above) -- don't bother! Cython and the C compiler do a fine job as it is.
[Note: tested with Cython 0.19.1, Python 2.7 32 bit on OS-X 10.7 (gcc 4.2)
I've attached the cython code, timing code, and a setup.py to build it all. I'd be interested to know if there is diference with other platforms/compiliers.
- Chris Barker (Chris.Barker@noaa.gov)
A setup.py
to built it:
#!/usr/bin/env python
from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
setup(
ext_modules = cythonize( [Extension('calc_pi_cy', ['calc_pi_cy.pyx', 'recip_square3.c']),
] )
)
(All code attached to this page in a zip file)