# Vectorization

Goal: Be familiar of how vectorization can make your learning faster

In [1]:
import numpy as np

## Multivariate Linear Regression

Univariate Linear Regression : $ f_{w, b}(x^{i}) = w.x^{(i)} + b $

Multivariate Linear Regression : $ f_{\vec{w}, b}(x^{i}) = \vec{w}.\vec{x}^{(i)} + b $

$ \vec{w} = (w_{1}, w_{2}, w_{3}, ..., w_{n}) $ 

$ \vec{x} = (x_{1}, x_{2}, x_{3}, ..., x_{n}) $

Problem: How do we make the learning faster when dealing with multivariate task. 

Yappss, **Vectorization!**

Actually, we can solve the multivariate task with naive loop way, but the processing is still a sequential process. The idea of vectorization is to utilize paralellization, so the multiplication can be processed in one time. 

### Dot Product
In multivariate linear regression, we need to compute $ \vec{w}.\vec{x} $. Mathematicians sometimes this process as *dot product* of two vectors. 

In mathematics, given two vectors $\vec{w}, \vec{x}$, which $ \vec{w} = (w_{1}, w_{2}, w_{3}, ..., w_{n}) $ and $ \vec{x} = (x_{1}, x_{2}, x_{3}, ..., x_{n}) $. Dot product of $ \vec{w} $ and $ \vec{x} $ can be defined as:

<br>
<center>
    $ \vec{w}.\vec{x} = \sum_{i=1} ^{n} w_{i} . x_{i} $
</center>

#### Loop Version

In [2]:
w = np.random.rand(1000000)
x = np.random.rand(1000000)
b = np.random.rand(1)[0]
print(f"w shape: {w.shape}")
print(f"x shape: {w.shape}")
print(f"b: {b}")

w shape: (1000000,)
x shape: (1000000,)
b: 0.4125991685472208


In [3]:
def dot_product_loop_version(w, x):
    w_dot_x = 0
    m = x.shape[0]
    for i in range(m):
        w_dot_x += w[i] * x[i]
    return w_dot_x

In [5]:
%%time
temp = dot_product_loop_version(w, x)

CPU times: user 291 ms, sys: 2.59 ms, total: 293 ms
Wall time: 292 ms


#### Vectorize Version

We can use Numpy library from Python to demo the vectorize approach

In [6]:
def dot_product_vectorize_version(w, x):
    return np.dot(w, x)

In [7]:
%%time
temp = dot_product_vectorize_version(w, x)

CPU times: user 3.41 ms, sys: 1.92 ms, total: 5.33 ms
Wall time: 1.45 ms


From the result before, you can see how vectorization can make you computation become faster.