# Vectorization

In [2]:
import numpy as np
import time

## Why is Vectorization important? 

Because ML models are hungry for data, and these data are mainly represented by matrices, and performing matrix operations for large matrices is computational costly, not mentioning that you would need to write down lots of "for loops" to deal with every single possible case. Using vectorization you can perform normal matrix operations in a single line of code, and the speed is much faster than the _for loop version_. 

In [3]:
array_size = 1000000
x1_numpy = np.random.rand(1,array_size)
x2_numpy = np.random.rand(1,array_size)

### For Loop Version: 

In [4]:
tic = time.process_time()
dot_product = 0
for i in range(x1_numpy.shape[1]):
    dot_product +=x1_numpy[0,i]*x2_numpy[0,i]
toc = time.process_time()
print("dot product = " + str(dot_product))
print ("Computation time = " + str(1000 * (toc - tic)) + "ms")

dot product = 250121.23293094424
Computation time = 383.8790000000001ms


### Vectorized Version:

In [5]:
tic = time.process_time()
dot_product = np.dot(x1_numpy,x2_numpy.T)
toc = time.process_time()
print("dot product = " + str(dot_product))
print ("Computation time = " + str(1000 * (toc - tic)) + "ms")

dot product = [[250121.23293094]]
Computation time = 5.665000000000031ms


Let's see what happens when we increase the size of the array to 1 Billion!!

### 1B For Loop Version:

In [6]:
A = np.random.rand(1, 1000000000)
B = np.random.rand(1, 1000000000)
tic = time.process_time()
dot_product = 0
for i in range(A.shape[1]):
    dot_product +=A[0,i]*B[0,i]
toc = time.process_time()
print("dot product = " + str(dot_product))
print ("Computation time = " + str(1000 * (toc - tic)) + "ms")

dot product = 249995743.31640688
Computation time = 447705.865ms


### Note: 447705.865ms is equivalent to 7.461764416667 minutes ! 

### 1B Vectorized Version: 

In [7]:
tic = time.process_time()
dot_product = np.dot(A,B.T)
toc = time.process_time()
print("dot product = " + str(dot_product))
print ("Computation time = " + str(1000 * (toc - tic)) + "ms")

dot product = [[2.49995743e+08]]
Computation time = 41866.602999999996ms


### Note: 41866.602999999996ms is equivalent to 0.6977767166666666 minute. Not even a minute! 

## NOTE: I made sure to run this experiment on a mac computer and it took a very long time!! Not mentioning that it started making some noises lol. Hence not surprisingly, the difference between the two approaches becomes apparent as the array sizes increase.

I will leave this section here, so that I experiment with other deep learning framework, to understand how these libraries make our lives much easier.