tutorials NumpyPointerToC
A more up-to-date tutorial can be found in the Cython documentation.
One of the strengths of numpy arrays is that they are essentially wrappers around a regular C pointer (C array). This means that you can easily use Cython code to pass the data from a numpy array into C or C++ code, and manipulate it there, without any data copying.
In this case, the goal is to manipulate the data in a numpy array, such that there is no data copying, and the changes are seen in the numpy array on the Python side. This can be very useful, as you can then let Python/numpy handle all the memory management, while still leveraging C code that takes pointers to "standard" C arrays.
There are a number of ways to get the pointer from a numpy array -- this approach seems to be the consensus as the "best" way, at least as of June, 2012. Cython-users thread
A trivial C function (for example's sake) that multiplies all the elements of a 2-d array of floats by a passed-in value:
/*
c_multiply.c
simple C function that alters data passed in via a pointer
used to see how we can do this with Cython/numpy
*/
void c_multiply (double* array, double multiplier, int m, int n) {
int i, j ;
int index = 0 ;
for (i = 0; i < m; i++) {
for (j = 0; j < n; j++) {
array[index] = array[index] * multiplier ;
index ++ ;
}
}
return ;
}
This code takes a numpy array, and passes its data pointer to the C function to do the real work.
"""
multiply.pyx
simple cython test of accessing a numpy array's data
the C function: c_multiply multiplies all the values in a 2-d array by a scalar, in place.
"""
import cython
# import both numpy and the Cython declarations for numpy
import numpy as np
cimport numpy as np
# declare the interface to the C code
cdef extern void c_multiply (double* array, double value, int m, int n)
@cython.boundscheck(False)
@cython.wraparound(False)
def multiply(np.ndarray[double, ndim=2, mode="c"] input not None, double value):
"""
multiply (arr, value)
Takes a numpy array as input, and multiplies each element by value, in place
param: array -- a 2-d numpy array of np.float64
param: value -- a number that will be multiplied by each element in the array
"""
cdef int m, n
m, n = input.shape[0], input.shape[1]
c_multiply (&input[0,0], value, m, n)
return None
def multiply2(np.ndarray[double, ndim=2, mode="c"] input not None, double value):
"""
this method works fine, but is not as future-proof the numpy API might change, etc.
"""
cdef int m, n
m, n = input.shape[0], input.shape[1]
c_multiply (<double*> input.data, value, m, n)
return None
The np.ndarray[double, ndim=2, mode="c"]
assures that you get a C-contiguous numpy array of doubles -- this is key, as it's important that the data pointer points to a standard C array of floats.
The &input[0,0]
passed in the address of the beginning of the data array.
Note that if you wanted to, for example, iterate through the rows in Cython, but process each row as a 1-d C array, you could pass in the address of a row with: &input[i,0]
, and similarly for other sub-parts of the array.
- the :
-
@cython.boundscheck(False)
@cython.wraparound(False)
Tells Cython not to put in code to check if the indexes are out of the bounds of the array, and that you won't be using negative indexes (the Python syntax fo indexing from the end). This will prevent a bunch of error checking code from being added -- not a huge deal if there a fair bit of work being done in the function you're passing the pointer to, but still a lot more than a simple pointer pass.
This is totally safe if you are hard-coding the indexes to zero, like in this case -- you may want to be a bit more careful if you are using variables for the indexes.
Another way to do this is to pass in the 'data' member of the numpy array:
c_multiply (<double*> input.data, value, m, n)
However, this relies on the current numpy data structure -- so it will break if numpy changes. from the mailing list:
I think
&input[0, 0]
should still be preferred, as accessing the.data
attribute is deprecated in numpy and the rewrite toPyArray_DATA()
not yet merged. Taking the pointer to the first element is also more consistent with memoryviews.
Here is the setup.py to build the Cython and extension.
#!/usr/bin/env python
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import numpy
setup(
cmdclass = {'build_ext': build_ext},
ext_modules = [Extension("multiply",
sources=["multiply.pyx", "c_multiply.c"],
include_dirs=[numpy.get_include()])],
)
It can be built with python setup.py build_ext --inplace
some trivial test code (not a real unit test)
#!/usr/bin/env python
"""
simple test of the multiply.pyx and c_multiply.c test code
"""
import numpy as np
import multiply
a = np.arange(12, dtype=np.float64).reshape((3,4))
print a
multiply.multiply(a, 3)
print a
A py.test compliant unit test (might work with nose, with minor modification, too)
#!/usr/bin/env python
"""
multiply.pyx and c_multiply.c test code
designed to be run-able with py.test
"""
import pytest
import numpy as np
import multiply
def test_basic():
a = np.arange(12, dtype=np.float64).reshape((3,4))
b = a * 3
multiply.multiply(a, 3)
assert np.array_equal(a, b)
def test_wrong_dims():
a = np.arange(12, dtype=np.float64).reshape((3,2,2))
with pytest.raises(ValueError):
multiply.multiply(a, 3)
def test_wrong_type():
a = np.arange(12, dtype=np.float32).reshape((3,4))
b = a * 3
with pytest.raises(ValueError):
multiply.multiply(a, 3)
def test_zero_dims():
"""
this shoudln't crash!
"""
a = np.ones( (3, 0), dtype=np.float64)
b = a.copy()
multiply.multiply(a, 3) # zero size, shouldn't do anything
assert np.array_equal(a, b)
CategoryCythonDoc