**Purpose**  
* Numpy만을 사용해서 regression에 대한 back propagation 구현하기.  

**Dataset**  
* [perch_full.csv](https://raw.githubusercontent.com/rickiepark/hg-mldl/master/perch_full.csv)

**Dependencies**  
* Python: Version `3.9.19`
* Numpy: Version `1.26.4`


In [26]:
# Load the dataset
import pandas as pd
import numpy as np

df = pd.read_csv('https://raw.githubusercontent.com/rickiepark/hg-mldl/master/perch_full.csv')
df

# 해당 dataset에서 length, height를 features, width를 label로 사용한다.

Unnamed: 0,length,height,width
0,8.4,2.11,1.41
1,13.7,3.53,2.0
2,15.0,3.82,2.43
3,16.2,4.59,2.63
4,17.4,4.59,2.94
5,18.0,5.22,3.32
6,18.7,5.2,3.12
7,19.0,5.64,3.05
8,19.6,5.14,3.04
9,20.0,5.08,2.77


In [19]:
df.columns

Index(['length', ' height', ' width'], dtype='object')

In [23]:
# Split the dataset into features and label
# then convert df to numpy
X = df.drop(' width', axis=1).to_numpy()
y = df[' width'].to_numpy()

print(type(X), type(y))

<class 'numpy.ndarray'> <class 'numpy.ndarray'>


In [25]:
print(X[:3])
print(y[:3])

[[ 8.4   2.11]
 [13.7   3.53]
 [15.    3.82]]
[1.41 2.   2.43]


In [36]:
# Initialize the weight and bias to random values
# y = b_0 + w_1*x_1 + w_2*x_2
w_1 = 1.2 # weight of x_1 (will be updated)
w_2 = 2.1 # weight of x_2 (will be updated)
b_0 = 5.5 # bias (will be updated)
lr = 0.01 # learning rate (fixed)

W = np.array([w_1, w_2]) # Weights vector
y_pred = np.dot(X, W) + b_0 # np.dot() is dotproduct

(다음 cell 설명)  
좌변: $b_0 + x^1_1w_1 + x^1_2w_2$ == 우변: numpy로 계산한 값

In [32]:
b_0 + X[0][0] * w_1 + X[0][1] * w_2 == y_pred[0] # Output: True

True

In [33]:
y_pred

array([20.011, 29.353, 31.522, 34.579, 36.019, 38.062, 38.86 , 40.144,
       39.814, 40.168, 42.649, 43.132, 42.649, 44.458, 44.731, 43.744,
       44.731, 44.248, 43.492, 44.806, 46.759, 45.235, 44.062, 46.888,
       49.609, 47.698, 49.153, 49.024, 49.996, 52.357, 55.732, 53.557,
       53.305, 53.788, 55.522, 55.879, 57.502, 65.923, 68.446, 71.629,
       72.148, 70.981, 72.664, 72.097, 75.694, 75.694, 78.403, 78.553,
       78.133, 79.498, 76.894, 82.78 , 82.153, 83.371, 84.16 , 84.529])

In [35]:
y
# y_pred가 y가 근접하도록 w_1, w_2, b_0을 update하는 것이 목표이다.

array([1.41, 2.  , 2.43, 2.63, 2.94, 3.32, 3.12, 3.05, 3.04, 2.77, 3.56,
       3.31, 3.67, 3.53, 3.41, 3.52, 3.52, 3.52, 4.  , 3.62, 3.62, 3.63,
       3.63, 3.72, 3.72, 3.82, 4.17, 3.68, 4.24, 4.14, 5.14, 4.34, 4.34,
       4.57, 4.2 , 4.64, 4.77, 6.02, 6.39, 7.8 , 6.86, 6.74, 6.26, 6.37,
       7.49, 6.  , 7.35, 7.11, 7.22, 7.46, 6.63, 6.87, 7.28, 7.42, 8.14,
       7.6 ])

In [40]:
# example. np.sum()
print(np.sum([1, 2, 3])) # Output: 6
# example. np.mean()
print(np.mean([1, 2, 3])) # Output: 2.0

6
2.0


In [41]:
# Loss function - MSE; Mean Squared Error
loss = np.mean((y_pred-y)**2)
print(loss)

2797.4598648928563


* $Loss = \frac{1}{n}\Sigma^n_{i=1}(y_{predict} - y_{actual})^2$

* $Loss = \frac{1}{n}\Sigma^n_{i=1}(\hat{y}_i - y_i)^2$

* $Loss(b, w) = \frac{1}{n}\Sigma^n_{i=1}(y_i-(b_0+w_1x^i_1+w_2x^i_2))^2$
  - $y_i$ 즉 actual y는 constant 취급. 여기서 최적값을 찾아야 되는 $b_0, w_1, w_2$와는 관계없음.

* $\frac{\delta}{\delta b_0}Loss(b, w) = (-1)*2*\frac{1}{n}\Sigma^n_{i=1}(y_i-(b_0+w_1x^i_1+w_2x^i_2))$  
  = $-\frac{2}{n}\Sigma^n_{i=1}(y_i-(b_0+w_1x^i_1+w_2x^i_2))$
  - 2는 제곱을 미분한 것, (-1)은 -b_0의 곱미분

* $\frac{\delta}{\delta w_1}Loss(b, w) = -\frac{2}{n}\Sigma^n_{i=1}(y_i-(b_0+w_1x^i_1+w_2x^i_2)*x^i_1)$

* $\frac{\delta}{\delta w_2}Loss(b, w) = -\frac{2}{n}\Sigma^n_{i=1}(y_i-(b_0+w_1x^i_1+w_2x^i_2)*x^i_2)$

* $b_{0(new)} = b_{0(old)} - \eta \frac{\delta}{\delta b_0}Loss(b, w)$
  - $\eta (eta)$ is learning rate
  - $w_1, w_2$의 경우에도 동일한 메커니즘