## Cost function for regularized linear regression

有正則化的線性回歸成本公式是:
$$J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2  + \frac{\lambda}{2m}  \sum_{j=0}^{n-1} w_j^2 $$ 
其中:
$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b  $$ 

與一般線性回歸成本的差異是多了正則化項目,  <span style="color:blue">
    $\frac{\lambda}{2m}  \sum_{j=0}^{n-1} w_j^2$ </span> 
    
注意，b並不需要做正則化。

In [1]:
import numpy as np

In [2]:
def model(x, w, b):
    return np.matmul(x, w) + b

In [8]:
def computeCost(x, y, w, b, lambda_):
    pred = model(x, w, b)
    cost = np.mean((pred - y) ** 2) / 2 + (lambda_ / (2 * len(x))) * np.sum(w ** 2)
    return cost

In [9]:
np.random.seed(1)
x = np.random.rand(5, 6)
y = np.array([0, 1, 0, 1, 0])
w = np.random.rand(x.shape[1]).reshape(-1, ) - 0.5
b = 0.5
lambda_ = 0.7

cost = computeCost(x, y, w, b, lambda_)
print("Regularized cost:", cost)

Regularized cost: 0.07917239320214275


### Computing the Gradient with regularization (both linear/logistic)
對於正則化來說，不管是線性回歸函是邏輯回歸都是一樣的，唯一差別的是計算 $f_{\mathbf{w}b}$.
$$\begin{align*}
\frac{\partial J(\mathbf{w},b)}{\partial w_j}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)}  +  \frac{\lambda}{m} w_j \tag{2} \\
\frac{\partial J(\mathbf{w},b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{3} 
\end{align*}$$

      
* 對於  <span style="color:blue"> **linear** </span> regression 模型  
    $f_{\mathbf{w},b}(x) = \mathbf{w} \cdot \mathbf{x} + b$  
* 對於  <span style="color:blue"> **logistic** </span> regression 模型  
    $z = \mathbf{w} \cdot \mathbf{x} + b$  
    $f_{\mathbf{w},b}(x) = g(z)$  
    其中 $g(z)$ 是sigmoid:  
    $g(z) = \frac{1}{1+e^{-z}}$   
    
正則化只新增 <span style="color:blue">$\frac{\lambda}{m} w_j $</span>.

In [15]:
def computeGradient(x, y, w, b, lambda_):
    pred = model(x, w, b)
    err = pred - y
    err = np.expand_dims(err, axis = 1)
    gradW = np.mean(err * x, axis = 0) + lambda_ / len(x) * w
    gradB = np.mean(err)
    return gradW, gradB

In [17]:
np.random.seed(1)
x = np.random.rand(5, 3)
y = np.array([0, 1, 0, 1, 0])
w = np.random.rand(x.shape[1])
b = 0.5
lambda_ = 0.7
gradW, gradB =  computeGradient(x, y, w, b, lambda_)

print(f"gradB: {gradB}", )
print(f"Regularized gardW:\n {gradW.tolist()}", )

gradB: 0.6648774569425726
Regularized gardW:
 [0.29653214748822276, 0.4911679625918033, 0.21645877535865857]
