# Vectorization

## 1. Multiple Features

<div style="text-align:center">
    <img src="b_1_multiFeature.png" alt="Image" width="500"/>
</div>

<div style="text-align:center">
    <img src="b_1_compare.png" alt="Image" width="500"/>
</div>

<div style="text-align:center">
    <img src="b_1_gradient.png" alt="Image" width="500"/>
</div>

## 2. 点积np.dot()

### 2.1 Shape

Shape：表示数组在每个维度上的大小，它是一个元组，其中的每个元素表示数组在相应维度上的长度。例如，对于二维数组，形状为 (m, n)，其中 m 是行数，n 是列数。

__np.shape(x)返回数组x在每个维度上的大小.__

In [1]:
import numpy as np

x = np.array([[1,2,3],[4,5,6]])
y = np.array([1,2,3])
z = np.array([[[1, 2, 3, 4],
               [5, 6, 7, 8],
               [9, 10, 11, 12]],
                            
               [[13, 14, 15, 16],
               [17, 18, 19, 20],
               [21, 22, 23, 24]]])

result = np.dot(x, y)
print(result)
print("x的维度：" + str(x.shape))
print("y的维度：" + str(y.shape))
print("z的维度：" + str(z.shape))
print("result得维度：" + str(result.shape))
print(result)

[14 32]
x的维度：(2, 3)
y的维度：(3,)
z的维度：(2, 3, 4)
result得维度：(2,)
[14 32]


(3,) means y is a 1 Dimensional array with 3 elements.

(2, 3) means x is a 2-D array with 2 rows and 3 colms.

(2, 3, 4) means z is a 3-D array with ... you know.

### 2.2 点积

#### 2.2.1 2-D Metrix dot 2-D Metrix

Given two 2-D arrays x1 and x2, only if x1's colm num = x2's rows num, the result of (x1 dot x2) exists.

For example: x.shape(2,3), y.shape(3,4)
Therefore, result.shape = (2,4)

算法: 行列相乘

In [2]:
x = np.array([[1,2,3],[4,5,6]])
y = np.array([[3,2,4,5],[4,5,6,7],[8,0,2,1]])
print(x.shape)
print(y.shape)

result = np.dot(x,y)
print(result)
print(result.shape)

(2, 3)
(3, 4)
[[35 12 22 22]
 [80 33 58 61]]
(2, 4)


<div style="text-align:center">
    <img src="b_1_dot.png" alt="Image" width="500"/>
</div>

#### 2.2.2 2-D Metrix dot 1-D Metrix

In [3]:
x = np.array([[1,2,3],[4,5,6]])
y = np.array([3,4,5])
print(x.shape)
print(y.shape)

result = np.dot(x,y)
print(result)
print(result.shape)

(2, 3)
(3,)
[26 62]
(2,)


#### 2.2.3 Error about 1-D Metrix dot 2-D Metrix

In [4]:
x = np.array([[1,2,3],[4,5,6]])
y = np.array([3,4,5])
print(x.shape)
print(y.shape)

try:
    result = np.dot(y,x)
except Exception as e:
    print("The error message you'll see is:")
    print(e)
# print(result)
# print(result.shape)

(2, 3)
(3,)
The error message you'll see is:
shapes (3,) and (2,3) not aligned: 3 (dim 0) != 2 (dim 0)


#### 2.2.4 1-D Metrix dot 1-D Metrix

In [5]:
x = np.array([1,2,3])
y = np.array([3,4,5])
print(x.shape)
print(y.shape)

result = np.dot(x,y)
print(result)
print(result.shape)

(3,)
(3,)
26
()


() means it is a scalar.

## 3. Vectorization is faster！

In [6]:
def my_dot(a, b): 
    """
   Compute the dot product of two vectors
 
    Args:
      a (ndarray (n,)):  input vector 
      b (ndarray (n,)):  input vector with same dimension as a
    
    Returns:
      x (scalar): 
    """
    x=0
    for i in range(a.shape[0]):
        x = x + a[i] * b[i]
    return x

In [7]:
import numpy as np    # it is an unofficial standard to use np for numpy
import time

np.random.seed(1)
a = np.random.rand(10000000)  # very large arrays
b = np.random.rand(10000000)

tic = time.time()  # capture start time
c = np.dot(a, b)
toc = time.time()  # capture end time

print(f"np.dot(a, b) =  {c:.4f}")
print(f"Vectorized version duration: {1000*(toc-tic):.4f} ms ")

tic = time.time()  # capture start time
c = my_dot(a,b)
toc = time.time()  # capture end time

print(f"my_dot(a, b) =  {c:.4f}")
print(f"loop version duration: {1000*(toc-tic):.4f} ms ")

del(a);del(b)  #remove these big arrays from memory

np.dot(a, b) =  2501072.5817
Vectorized version duration: 5.9714 ms 
my_dot(a, b) =  2501072.5817
loop version duration: 2110.5132 ms 


# RECEFERENCE

[1]https://github.com/kaieye/2022-Machine-Learning-Specialization/blob/main/Supervised%20Machine%20Learning%20Regression%20and%20Classification/week2/1.Multiple%20linear%20regression/C1_W2_Lab01_Python_Numpy_Vectorization_Soln.ipynb

[2]https://www.coursera.org/specializations/machine-learning-introduction

[3]https://www.bilibili.com/video/BV1Pa411X76s?p=23&spm_id_from=pageDriver&vd_source=8c32dd2bfbfecb1eaa9b0b9c4fb4d83e