### Victorize your function

This can be accomplished by using one of two functions, both of which are from numpy:
1. np.where: for single condition applicable to any size array
2. np.select: for multiple conditions or if elif elif...else 


In [3]:
%pylab
import pandas as pd

Using matplotlib backend: Qt5Agg
Populating the interactive namespace from numpy and matplotlib


#### Generate fake data


In [4]:
n = 1000_000
df = pd.DataFrame({'a':np.random.randn(n),
                  'b':np.random.randn(n),
                  'N':np.random.randint(100,1000,size=(n,)),
                  'cat': np.random.choice(['type1','type2','type3'], size=n)})


In [25]:
def myFun(a,b,N,cat):
    if cat in ['type1','type2']:
        return (a+b)/N
    return (a-b)*N

**First** we use `apply` function to set a reference point

In [61]:
_ = %timeit df.apply(lambda x: myFun(x['a'], x['b'],x['N'],x['cat']), axis=1)

9.53 s ± 108 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [60]:
_ = %timeit np.where(df.cat.isin(['type1','type2']), (df.a+df.b)/df.N, (df.a-df.b)*df.N)

34.7 ms ± 531 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Huge improvement in speed!

### Compared to Cython:

Writing the same function in Cython processes the same data in less than half the time! Something to keep in mind

In [29]:
%load_ext cython

In [255]:
%%cython
cimport cython
import numpy as np
cimport numpy as np

ctypedef np.ndarray array

cdef double myfun_cy(double a, double b, int N, str cat):
    if cat in ['type1','type2']:
        return (a+b)/N
    return (a-b)*N

@cython.boundscheck(False)
@cython.wraparound(False)
cpdef array[double] apply_cy(array[double] col_a,
                                 array[double] col_b,
                                 array[long] col_N,
                                 array[str] col_cat):
    cdef int i, n = len(col_a)
    cdef array[double] z = np.empty(n)
    for i in range(n):
        z[i] = myfun_cy(col_a[i],col_b[i],col_N[i], col_cat[i])
    return z

In [256]:
_ = %timeit apply_cy(df['a'].values, df['b'].values,df['N'].values,df['cat'].values)

13.7 ms ± 218 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
