Possible speedups using Cython #3

ForceBru · 2021-02-08T21:03:42Z

Hello! I saw this project on Reddit and was skeptical about whether compiling code that mainly uses NumPy with Cython provides any speedup.

It seems that it doesn't. I ran a couple of tests using this function:

SeaLion/sealion/cython_knn.pyx

Lines 11 to 15 in 1439a12

    
           def r2_score(y_pred, y_test): 
        
               y_pred, y_test = np.array(y_pred), np.array(y_test) 
        
               num = np.sum(np.power(y_test - y_pred, 2)) 
        
               denum = np.sum(np.power(y_test - np.mean(y_test), 2)) 
        
               return 1 - num / denum

I created two versions: one compiled with Cython (r2_score) and another pasted straight into Python (r2_score_python). Both contain the same exact code. I then ran a simple test in a Jupyter notebook:

y1, y2 = np.random.rand(2, 10_000_000)

%timeit r2_score(y1, y2)  # compiled with Cython
# 902 ms ± 4.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit r2_score_python(y1, y2)  # just pasted into Python
# 897 ms ± 1.91 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Not much of a difference, it seems (902 ms vs 897 ms). Somewhat surprisingly, Python + NumPy (r2_score_python) is actually faster than Cython + NumPy (r2_score).

Another solution: ugly but fast

I quickly hacked up this code:

%%cython -a --verbose
# cython: language_level=3, boundscheck=False, wraparound=False

import numpy as np
cimport numpy as np

cdef arr_sub_arr(double[:] dst, double[:] x, double[:] y):
    for i in range(x.shape[0]):
        dst[i] = x[i] - y[i]
        
cdef arr_sub_float(double[:] dst, double[:] x, double y):
    for i in range(x.shape[0]):
        dst[i] = x[i] - y
        
cdef arr_square(double[:] dst, double[:] src):    
    for i in range(src.shape[0]):
        dst[i] = src[i] * src[i]
        
cdef arr_mean(double[:] src):
    cdef Py_ssize_t n_elem = src.shape[0]
    
    return arr_sum(src) / n_elem
        
cdef arr_sum(double[:] src):
    cdef double _sum = 0.0
    
    for i in range(src.shape[0]):
        _sum += src[i]
        
    return _sum

cdef __r2_score_cython(double[:] y_pred, double[:] y_test):
    arr_sub_arr(y_pred, y_pred, y_test)
    arr_square(y_pred, y_pred)
    
    cdef double num = arr_sum(y_pred)
    cdef double y_test_mean = arr_mean(y_test)
    
    arr_sub_float(y_pred, y_test, y_test_mean)
    arr_square(y_pred, y_pred)
    cdef double denum = arr_sum(y_pred)
    
    return 1 - num / denum
    

cpdef r2_score_cython(np.ndarray y_pred, np.ndarray y_test):
    assert y_pred.ndim == y_test.ndim == 1, "Invalid number of dimenstions"
    
    cdef Py_ssize_t sh1 = y_pred.shape[0]
    cdef Py_ssize_t sh2 = y_test.shape[0]
    assert sh1 == sh2, f"Shape mismatch"
    
    return __r2_score_cython(
        np.array(y_pred),  # make a copy!
        y_test
    )

Nothing fancy - just for loops and a copy of y_pred being used to store intermediate computations, so it's also memory efficient and doesn't allocate any new arrays at all.

Here code like double[:] is a memoryview, not a NumPy array. See this Cython tutorial.

The speedup

%timeit r2_score_cython(y1, y2)
# 85.2 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Your original code ran in 897 ms. The code I quickly wrote with simple loops (but with types and compiled via Cython!) ran in 85 ms, which is 10.5 times faster!

That is to say that you can get way more speed out of Cython than you're currently getting. You get a lot of speed but now have to think about memory allocation and write for loops yourself, and this also works only for floating-point numbers - not integers or any other data types (I think this can be solved using fused types).

So, if you're into Cython, you could probably think of other ways to speed up your code.

The text was updated successfully, but these errors were encountered:

anish-lakkapragada · 2021-02-09T05:57:22Z

could you please explain a little more on what Py_ssize_t is ? I'm not that experienced in C or Cython. Thank you for spending the time to write this.

anish-lakkapragada · 2021-02-09T06:01:04Z

also 10x is a massive improvement!!!

anish-lakkapragada · 2021-02-09T06:27:37Z

I have just updated the source code in version 4.0.7. Thank you for your help, and feel free to keep on giving me feedback if you have the time. Please give me the green light to close this issue.

ForceBru · 2021-02-09T09:48:22Z

could you please explain a little more on what Py_ssize_t is ? I'm not that experienced in C or Cython. Thank you for spending the time to write this.

Py_ssize_t is a kind of signed integer that is used for indexing. It's big enough to store any reasonable index, and since arrays could be pretty large, I used this type to make sure that their size could be represented correctly.

Also, I could've written the last function in a cleaner way using memory views instead of np.array:

cpdef r2_score_cython(double[:] y_pred, double[:] y_test):
    assert tuple(y_pred.shape) == tuple(y_test.shape), f"Shape mismatch"
    
    return __r2_score_cython(
        np.array(y_pred),  # make a copy!
        y_test
    )

BTW, it could've been really useful if your library provided the EM-algorithm for working with mixture models. I've been messing around with it for some time and found that sklearn.mixture.GaussianMixture could be sped up significantly. My simple implementation in the Julia language is already about 2 times faster. Maybe Cython can give the same speedup for Python. Good luck with your project!

ghost · 2021-09-03T03:27:00Z

Hello! I saw this project on Reddit and was skeptical about whether compiling code that mainly uses NumPy with Cython provides any speedup.

Just curious because I think I am as skeptical, what is the point of using this weird syntax? I mean wouldn't it be better if performance critical components are written in c/c++ and provide a python API?

anish-lakkapragada · 2021-09-03T04:04:06Z

yeah, using c(++) and providing a python API like the big ML libraries do is probably what I should have done if I knew what C(++) was at the time and how to use it and how to build it. unfortunately I didn't, so I used cython.

BTW, I built this project for fun and of course there's a thousand things that SWEs can point out I should have done better. thanks!

ghost · 2021-09-03T04:22:32Z

Good job anyway, you always get to learn something as you go. I understand this being a fun project for learning purposes, I'm more questioning the practice of using cython, rather than criticizing your project, which probably is a great job at your level of experience, because I never saw the point.

anish-lakkapragada · 2021-09-03T05:23:28Z

oh yeah 100% agree wth you. using cython isn't as good as C++

anish-lakkapragada · 2021-09-03T05:23:46Z

would have been better to write python APIs and bind them but at the time Cython was all I knew

anish-lakkapragada · 2021-09-03T05:24:38Z

but because like 55% of the code runs on Cython in this project I might as well keep it in there. unfortunately not planning to update this project anymore.

ForceBru · 2021-09-03T10:04:06Z

Hello! I saw this project on Reddit and was skeptical about whether compiling code that mainly uses NumPy with Cython provides any speedup.

Just curious because I think I am as skeptical, what is the point of using this weird syntax? I mean wouldn't it be better if performance critical components are written in c/c++ and provide a python API?

Is the syntax that weird, though? This is basically Python syntax with slight modifications to let you specify types. (I think Cython should switch to Python's type annotations like cdef my_var: double = 3.141 at some point) Writing something in C or C++ means that you'll need to know C or C++, which are quite different from Python (and also each other).

Furthermore, Cython already compiles to C, so before rewriting in C or C++ manually, you need to be sure that it'd be worth it, that your code can be faster than code generated by Cython. IMO, Cython is a middle ground between Python and "proper" compiled and statically typed languages like C, C++ and Rust. Which is great, because it's almost as simple as Python and not nearly as difficult as C++. The advantage of Cython is that it's similar to Python and lets you jump right in and get noticeable speedups without too much trouble, whereas using C or C++ implies learning C or C++ first and being able to wield it confidently, which is pretty difficult.

ghost · 2021-09-03T13:09:48Z

It's not that weird, I'm just using weird because I wrote a few things in C++, which makes it more familiar to me than Cython. Generally I agree, learning C/C++ is more difficult than simply changing defs to cdefs, declaring additional stuff every now and then, and so on.

anish-lakkapragada closed this as completed Feb 9, 2021

anish-lakkapragada reopened this Feb 9, 2021

ForceBru closed this as completed Feb 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible speedups using Cython #3

Possible speedups using Cython #3

ForceBru commented Feb 8, 2021

anish-lakkapragada commented Feb 9, 2021

anish-lakkapragada commented Feb 9, 2021

anish-lakkapragada commented Feb 9, 2021

ForceBru commented Feb 9, 2021

ghost commented Sep 3, 2021

anish-lakkapragada commented Sep 3, 2021

ghost commented Sep 3, 2021

anish-lakkapragada commented Sep 3, 2021

anish-lakkapragada commented Sep 3, 2021

anish-lakkapragada commented Sep 3, 2021

ForceBru commented Sep 3, 2021

ghost commented Sep 3, 2021 •

edited by ghost

Possible speedups using Cython #3

Possible speedups using Cython #3

Comments

ForceBru commented Feb 8, 2021

Another solution: ugly but fast

The speedup

anish-lakkapragada commented Feb 9, 2021

anish-lakkapragada commented Feb 9, 2021

anish-lakkapragada commented Feb 9, 2021

ForceBru commented Feb 9, 2021

ghost commented Sep 3, 2021

anish-lakkapragada commented Sep 3, 2021

ghost commented Sep 3, 2021

anish-lakkapragada commented Sep 3, 2021

anish-lakkapragada commented Sep 3, 2021

anish-lakkapragada commented Sep 3, 2021

ForceBru commented Sep 3, 2021

ghost commented Sep 3, 2021 • edited by ghost

ghost commented Sep 3, 2021 •

edited by ghost