# Vectorization in Python

In [1]:
import numpy as np
from timeit import Timer

In [2]:
my_list = [53, 1, 16, 66, 56, 36, 99, 36, 90, 13]

result = []
for number in my_list:
    result.append(number * 2 + 1)


print(result)

[107, 3, 33, 133, 113, 73, 199, 73, 181, 27]


Above is the simpliest way to multiply every number in a list by 2 and then plus 1: the for loop. At this point, you should be very familiar with the for loop. You can use it to iterate over items in a list or to count to a specific number. The abstraction of a for loop is iteration: applying the same computation repeatedly with some variable values.

In [3]:
for i in range(10):

    my_number = 3 % (i + 1) + i / 2
    print(my_number)

0.0
1.5
1.0
4.5
5.0
5.5
6.0
6.5
7.0
7.5


While it is simple to understand, iteration might not be the most efficient model for repeating computation in some cases. Iteration is serialized: the CPU processes the iterations one after the other. In some cases where the calculation of an item depends on the result of calculation of previous items, the wait is necessary. However, in other cases, like the one in the first cell, the computation in each iteration is independent.

## Vectorization

Vectorization is the abstraction of applying the same instructions to multiple data entries. Because applying the same instruction on multiple data entries is such a common pattern, modern computers are highly optimized for this operation from low-level hardware (CPU architecture and memory) to high-level programming language (e.g. Python, Numpy). When writing a for loop, we are restricting the potential of these optimizations by forcing these repeated computations to be performed one by one, serialized in time.

![Vectorization](./resources/vectorization.png)


In compiled language like C, the compiler can often detect the unnecessary serialization and compile a for loop into vectorized machine code. Python, however, is a dynamically interpreted language, and such compiler optimizations are unfeasible.

Numpy arrays are optimized for vectorized calculations: let's see the performance difference in action:

## Example - Adding a constant value to every element of array

In [6]:
import numpy as np
 
# Creating a large array of size 10**6
array = np.random.randint(1000, size=10**6)
 
# method that adds elements using for loop
def add_forloop():
    new_array = [element + 1 for element in array]

# method that adds elements using vectorization
def add_vectorized():
    new_array = array + 1
    
# Finding execution time using timeit
computation_time_forloop = Timer(add_forloop).timeit(1)
computation_time_vectorized = Timer(add_vectorized).timeit(1)
 
print("Computation time is %0.9f using for-loop" % computation_time_forloop)
print("Computation time is %0.9f using vectorization" % computation_time_vectorized)

Computation time is 0.316398045 using for-loop
Computation time is 0.000706227 using vectorization


## Example - Computing a dot product of two vectors

In [7]:
import numpy as np
from timeit import Timer

# Create 2 vectors of same length
length = 100000
vector1 = np.random.randint(1000, size=length)
vector2 = np.random.randint(1000, size=length)

# Finds dot product of vectors using for loop
def dotproduct_forloop(vector1, vector2, length):
    dot = 0.0
    for i in range(length):
        dot += vector1[i] * vector2[i]
    return dot
        
# Finds dot product of vectors using numpy vectorization
def dotproduct_vectorize(vector1, vector2):
    dot = np.dot(vector1, vector2)
    return dot

# Finding execution time using timeit - lambda needed for wrapping function
# https://stackoverflow.com/questions/54135771/timeit-valueerror-stmt-is-neither-a-string-nor-callable
time_forloop = Timer(lambda: dotproduct_forloop(vector1, vector2, length)).timeit(1)
time_vectorize = Timer(lambda: dotproduct_vectorize(vector1, vector2)).timeit(1)

print("Finding dot product takes %0.9f units using for loop" % time_forloop)
print("Finding dot product takes %0.9f units using vectorization" % time_vectorize)

Finding dot product takes 0.052461013 units using for loop
Finding dot product takes 0.000090627 units using vectorization


## Exercise: Compute matrix multiplication

In [18]:
A = np.random.rand(50,50)
B = np.random.rand(50,50)

# sanity check code using identity matrix
# A = np.eye(50)
# B = np.eye(50)

length = 50

# hint: dot products written above are useful!
def matrixmultiply_forloop(A, B):
    C = np.zeros((50,50))
    for row in range(50):
        for col in range(50):
            C[row, col] = dotproduct_vectorize(A[row, :], B[:,col])
    print("vectorized result", np.diag(C))
    return C

def matrixmultiply_vectorize(A, B):
    A = np.array(A)
    B = np.array(B)
    C = np.matmul(A,B)
    print("vectorized result", np.diag(C))
    return C

# Finding execution time using timeit
time_forloop = Timer(lambda: matrixmultiply_forloop(A, B)).timeit(1)
time_vectorize = Timer(lambda: matrixmultiply_vectorize(A, B)).timeit(1)

print("Matrix multiplication takes %0.9f units using for loop" % time_forloop)
print("Matrix multiplication takes %0.9f units using vectorization" % time_vectorize)

vectorized result [10.81994943 13.09087427 13.28849131 11.550471   12.88346917 10.99210207
 15.25002995 13.83446563 11.44676556  9.84855326 16.29869727 13.60173938
 10.48849111 13.46463879 12.20116113 12.29011789 12.82444705 14.47932006
 14.75065828 11.94768796 13.64930969 11.00256349 11.83196185 12.63783396
 13.3352612   9.68964075 13.61566355 12.55615047 13.9797612  12.50123373
 14.43553393 13.6233058  13.64953387 11.04324044 12.6568047  15.19460165
 11.81450982 13.29917693 10.75951075 13.01355334 10.44806695 10.18374684
 13.65681951  9.91227222 12.1298882  12.61055798 12.22752986 11.76190842
 14.94340249 12.27026349]
vectorized result [10.81994943 13.09087427 13.28849131 11.550471   12.88346917 10.99210207
 15.25002995 13.83446563 11.44676556  9.84855326 16.29869727 13.60173938
 10.48849111 13.46463879 12.20116113 12.29011789 12.82444705 14.47932006
 14.75065828 11.94768796 13.64930969 11.00256349 11.83196185 12.63783396
 13.3352612   9.68964075 13.61566355 12.55615047 13.9797612  1

## Example - Count the number of elements less than K in the array

In [25]:
# trying changing the scale of X to make the difference due to vectorization more apparent
X = np.arange(20)
# X = np.arange(2000)
# X = np.arange(200000)

def lessthank_forloop(k=10):
    count = 0
    for i in range(len(X)):
        if X[i] < k:
            count = count + 1
    print("for loop result", count)
    return count

def lessthank_vectorize(k=10):
    num_lessthan_k = np.count_nonzero((X < k))
    print("vectorized result", num_lessthan_k)
    return num_lessthan_k

# Finding execution time using timeit
time_forloop = Timer(lessthank_forloop).timeit(1)
time_vectorize = Timer(lessthank_vectorize).timeit(1)

print("Finding < k takes %0.9f units using for loop" % time_forloop)
print("Finding < k takes %0.9f units using vectorization" % time_vectorize)

for loop result 10
vectorized result 10
Finding < k takes 0.000194947 units using for loop
Finding < k takes 0.000091530 units using vectorization


# How do we vectorize a function if the computation we want is more complicated and not already available in numpy? Use Numba @vectorize decorators!

## But first: What are python "decorators"?

A decorator is a function that takes another function and extends the behavior of the latter function without explicitly modifying it.

Example - smart_divide() decorator function checks whether the inputs to divide() are safe or not

![Decorator example](./resources/decorator_example.jpg)

Source: https://www.programiz.com/python-programming/decorator

We need the inner(a,b) function inside smart_divide() since decorators must output a callable rather than a value. The idea of a decorator is to return a function you can call as needed, with enhanced functionality.

## Exercise: Decorator that times the execution of a function

In [26]:
import time

def timer(func):
    def wrapper_timer():
        time_elap = Timer(func).timeit(1);
        print(time_elap)
    return wrapper_timer

@timer
def waste_some_time():
    for _ in range(100):
        sum([i**2 for i in range(10000)])
        
waste_some_time()

0.39337135599635076


## Numba @vectorize decorator - specify the element-wise operation and let Numba handle the vectorization

Read [this numba @vectorize decorator tutorial](https://numba.readthedocs.io/en/stable/user/vectorize.html)

### In essence: ...Using vectorize(), you write your function as operating over input scalars, rather than arrays. Numba will generate the surrounding loop (or kernel) allowing efficient iteration over the actual inputs....

Let's say the computation our imaginary problem at hand needs is as follows:

In [27]:
from numba import vectorize, float32, float64

@vectorize([float32(float32, float32),
            float64(float64, float64)])
def f(x, y):
    if x < 10:
        return 2*np.log(y)
    else:
        return np.sqrt(1 + x*10)

A = np.random.rand(30)
B = np.random.rand(30)

time_vectorize = Timer(lambda: f(A, B)).timeit(1)
print("Custom computation takes %0.9f units using numba @vectorize" % time_vectorize)

Custom computation takes 0.000028788 units using numba @vectorize


In [28]:
f.types

['ff->f', 'dd->d']

## There are certain benefits that numba @vectorize decorated functions enjoy automatically ...

.reduce() - applies user-defined f() along an array axis which reduces array dimension by 1. More info - https://numpy.org/doc/stable/reference/generated/numpy.ufunc.reduce.html#numpy.ufunc.reduce

.accumulate() - accumulates results of f() along an array axis. More info - https://numpy.org/doc/stable/reference/generated/numpy.ufunc.accumulate.html#numpy.ufunc.accumulate

Additional benefits - https://numpy.org/doc/stable/reference/ufuncs.html#ufunc

In [29]:
from numba import vectorize, float64, int32, int64, float32

@vectorize([int32(int32, int32),
            int64(int64, int64),
            float32(float32, float32),
            float64(float64, float64)])
def f(x, y):
    return x + y


A = np.arange(12).reshape(3, 4)
print(A, A.shape, "\n-----")

a = f.reduce(A, axis=0, keepdims=True)
print(a, a.shape, "\n-----")

b = f.reduce(A, axis=1, keepdims=True)
print(b, b.shape, "\n-----")

c = f.accumulate(A) # axis=0 by default
print(c, c.shape, "\n-----")

d = f.accumulate(A, axis=1)
print(d, d.shape, "\n-----")

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]] (3, 4) 
-----
[[12 15 18 21]] (1, 4) 
-----
[[ 6]
 [22]
 [38]] (3, 1) 
-----
[[ 0  1  2  3]
 [ 4  6  8 10]
 [12 15 18 21]] (3, 4) 
-----
[[ 0  1  3  6]
 [ 4  9 15 22]
 [ 8 17 27 38]] (3, 4) 
-----


In [30]:
f.types

['ii->i', 'll->l', 'ff->f', 'dd->d']