# Python and Vectorization

## 1. What is Vectorization?

**Vectorization** is the art of getting rid of explicit ```for``` loops in code.

In the deep learning era, you often find yourself training on relatively large datasets, because that is when deep learning algorithms tend to shine. So, it is important that your code runs very quickly, and the ability to perform vectorization has become a key skill.

In logistic regression, we have 

$$
z = w^T x + b,
$$

where $w, x \in \mathbb{R}^{n_x}$.

To computec $z$, a non vectorized way is:

```python
z = 0
for i in range(n_x):
    z += w[i] * x[i]
z += b
```

which can be very slow. In contrast, the vectorized version

```python
z = np.dot(w, x) + b
``` 

is much faster.

In [1]:
import numpy as np
import time

length = 1000000
a = np.random.RandomState(1234).rand(length) # set local seed 1234
b = np.random.RandomState(12345).rand(length) # set local seed 12345
tic = time.time()
c = np.dot(a, b)
toc = time.time()

print("Vectorized version: " + str(1000*(toc-tic)) + "ms")
print(c)

Vectorized version: 24.733304977416992ms
249928.1877257526


In [2]:
c = 0
tic = time.time()
for i in range(length):
    c += a[i] * b[i]
toc = time.time()

print("For loop: " + str(1000*(toc-tic)) + "ms")
print(c)

For loop: 376.6160011291504ms
249928.18772574305


## 2. Why Vectorization Makes Code Faster?

A lot of scalable deep learning implementations are done on a GPU or a CPU, both of which have parallelization instructions (**SIMD: Single Instruction Multiple Data**). So, if built-in functions like ```np.dot()``` are used, Python will be enabled to tkae much better advantage of parellelism to do computations much faster.

## 3. More Vectorization Examples

*Whenever possile, avoid explicit ```for``` loops.*

### 3.1 Multiplication

To compute $u = A v$, where $v \in \mathbb{R}^{n}$ and $A \in \mathbb{R}^{m \times n}$, the non-vectorized version is

```python
u = np.zeros((m, 1))
for i in range(m):
    for j in range(n):
        u[i] += A[i][j] * v[j]
```

while the vectorized version is

```python
u = np.dot(A, v)
```

### 3.2 Exponentiation

Say you need to apply the exponential operation on every element of a matrix/vector.

Suppose 

$$
v = \begin{bmatrix}
v_1 \\
v_2 \\
\vdots \\
v_n
\end{bmatrix}
\quad
\text{and}
\quad
u = \begin{bmatrix}
e^{v_1} \\ 
e^{v_2} \\ 
\vdots \\ 
e^{v_n}
\end{bmatrix}
$$

To implement this elementwise exponential operation, the non-vectorized way is 

```python
u = np.zeros((n,1))
for i in range(n):
    u[i] = math.exp(v[i])
```

while the vectorized way is 

```python
u = np.exp(v)
```

There are some other functions similar to ```np.exp```, such as
- ```np.log``` performs elementwise $\log$;
- ```np.abs``` performs elementwise taking absolute values;
- ```np.maximum``` compares two arrays and returns a new array containing the element-wise maxima;
- ```v ** 2``` takes elementwise square ```v```;
- ```1 / v``` takes elementwise inverse of ```v```.

### 3.3 Logistic Regression Derivatives

The original logistic regression algorithm is 

```python
J = 0; dw1 = 0; dw2 = 0; db = 0;
for i in range(m):
    z_i = np.dot(w.T, X[:,i]) + b
    a_i = sigmoid(z_i) # sigmoid is a user-defined sigmoid function
    J += -( Y[i] * np.log(a_i) + (1-Y[i]) * np.log(1-a_i) )
    dz_i = a_i - Y[i]
    dw1 += X[1,i] * dz_i 
    dw2 += X[2,i] * dz_i
    db += dz_i
J = J/m; dw1 = dw1/m; dw2 = dw2/m; db = db/m;
```

in which the snippet 

```python
dw1 += X[1,i] * dz_i 
dw2 += X[2,i] * dz_i
```

can be replaced with a ```for``` loop over all $n_x$ features in a general:

```python
for j in range(n_x):
    dw[j] += X[j,i] * dz_i # dw = np.zeros((n_x, 1))
```

Notice that this snippet can be vectorized into

```python
# initialzation: dw = np.zeros((n_x, 1))
dw += X[:,i] * dz_i
# dw = dw/m after the loop over all training examples
```

## 4. Vectorizing Logistic Regression

### 4.1 Vectorizing Forward Propagation

In forward propagation, for each training example $(x^{(i)}, y^{(i)})$, we need to compute

\begin{align*}
z^{(i)} = & w^T x^{(i)} + b \\
a^{(i)} = & \sigma(z^{(i)})
\end{align*}

Recall that we defined the design matrix $X$ as

$$
X = \begin{bmatrix}
| & | & & | \\ 
x^{(1)} & x^{(2)} & \cdots & x^{(m)} \\ 
| & | & & | 
\end{bmatrix}_{n_x \times m}
$$

So if we define $Z$ as 

$$
Z = \begin{bmatrix}
z^{(1)} & 
z^{(2)} &
\cdots &
z^{(m)}
\end{bmatrix}_{1 \times m}
$$

then 

$$
Z = \begin{bmatrix}
w^T z^{(1)} + b &
w^T z^{(2)} + b &
\cdots &
w^T z^{(m)} + b 
\end{bmatrix} = w^T X + 
b \begin{bmatrix}
1 & 1 & \cdots & 1 
\end{bmatrix}.
$$

The corresponding Python code is

```python
Z = np.dot(w.T, X) + b
```

in which broadcasting of ```b``` will be used.

Let 

$$
A = [ a^{(1)} \quad a^{(2)} \quad \cdots \quad a^{(m)} ]
$$ 
and we can need to define a vectorized ```sigmoid``` so that ```A = sigmoid(Z)``` can be used to compute ```A```.

```python
def sigmoid(Z):
    return 1 / (1 + np.exp(-Z))
```

### 4.2 Vectorizing Backward Propagation

In backward propagation, for each training example,

$$
dz^{(i)} = a^{(i)} - y^{(i)}
$$

Let 

$$
dZ = \begin{bmatrix}
dz^{(1)} & dz^{(2)} & \cdots & dz^{(m)}
\end{bmatrix}
$$

and 

$$
Y = \begin{bmatrix}
y^{(1)} & y^{(2)} & \cdots & y^{(m)}
\end{bmatrix}
$$

Then 

$$
dZ = A - Y
$$

As for $dw \to dw + x^{(i)} dz^{(i)}$ and $db \to db + dz^{(i)}$ in each iteration, we can write $dw$ and $db$ into 

$$
dw = \frac{1}{m} X (dZ)^T
$$

and

$$
db = \frac{1}{m} \sum_{i=1}^m dz^{(i)}
$$

In Python, the corresponding code is 

```python
dw = np.dot(X, dZ.T) / m
db = np.sum(dZ) / m
```

### 4.3 Summary

The vectorized Python code of gradient descent in one step is

```python
Z = np.dot(w.T, X) + b
A = sigmoid(Z)
dZ = A - Y
dw = np.dot(X, dZ.T) / m
db = np.sum(dZ)

w = w - alpha * dw
b = b - alpha * db
```

And the full version is

```python
w = np.zeros((n_x, 1))
b = 0
counter = 0
while counter <= max_iter:
    Z = np.dot(w.T, X) + b
    A = sigmoid(Z)
    dZ = A - Y
    dw = np.dot(X, dZ.T) / m
    db = np.sum(dZ)
    
    w = w - alpha * dw
    b = b - alpha * db
    counter = counter+1
```