# Gradient Descent

$$(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), ..., (x^{(m)}, y^{(m)})$$

$$\hat{y}^{(i)} = \sigma(w^Tx^{(i)}+b)$$

$$J(w, b) = \frac{1}{m}\sum\mathcal{L}(y^{(i)}, \hat{y}^{(i)}) = - \big[ y * log(\hat{y}) + (1 - y) * log(1 - \hat{y}) \big]$$

Repeat {

$\hspace{1cm} \text{ w := w - $\alpha \frac{\partial J}{\partial w}$ }$

$\hspace{1cm} \text{ b := b - $\alpha \frac{\partial J}{\partial b}$ }$

}

where $\alpha$ is learning rate

# Gradient Descent of Logistic Regression

$$\begin{aligned}
    z &= w_1 x_1 + w_2 x_2 + b \\
    a &= \sigma(z) \\
    \mathcal{L}(a, y) &= - \big[ y * log(a) + (1 - y) * log(1 - a) \big] \\
\end{aligned}$$

<br/>

$$z = w_1 x_1 + w_2 x_2 + b$$

$$\frac{\partial z}{\partial w_1} = x_1$$

$$\frac{\partial z}{\partial w_2} = x_2$$

$$\frac{\partial z}{\partial b} = 1$$

<br/>

$$a = \sigma(z) = \frac{1}{1 + e^{-z}}$$

$$\frac{\partial a}{\partial z} = \frac{0 - e^{-z}}{(1 + e^{-z})^2} = \frac{- e^{-z}}{(1 + e^{-z})^2} = a^2 e^{-z}$$

<br/>

$$\mathcal{L}(a, y) = - \big[ y * log(a) + (1 - y) * log(1 - a) \big]$$

$$\frac{\partial \mathcal{L}}{\partial a} = -\frac{y}{a} + \frac{1 - y}{1 - a}$$

<br/>

$$\frac{\partial \mathcal{L}}{\partial w_1} = \frac{\partial \mathcal{L}}{\partial a} \times \frac{\partial a}{\partial z} \times \frac{\partial z}{\partial w_1} = \big[-\frac{y}{a} + \frac{1 - y}{1 - a}\big] \times \big[a^2 e^{-z}\big] \times x_1 = (a - y) x_1$$

$$\frac{\partial \mathcal{L}}{\partial w_2} = \frac{\partial \mathcal{L}}{\partial a} \times \frac{\partial a}{\partial z} \times \frac{\partial z}{\partial w_2} = \big[-\frac{y}{a} + \frac{1 - y}{1 - a}\big] \times \big[a^2 e^{-z}\big] \times x_2 = (a - y) x_2$$

$$\frac{\partial \mathcal{L}}{\partial b} = \frac{\partial \mathcal{L}}{\partial a} \times \frac{\partial a}{\partial z} \times \frac{\partial z}{\partial b} = \big[-\frac{y}{a} + \frac{1 - y}{1 - a}\big] \times \big[a^2 e^{-z}\big] = (a - y)$$

<br/>

$$\begin{aligned}
    dz &= \frac{\partial \mathcal{L}}{\partial z} = (a - y) & \rightarrow & dz^{(i)} = a^{(i)} - y^{(i)}\\
    dw &= \frac{\partial \mathcal{L}}{\partial w} = (a - y) \begin{bmatrix}x_1 \\ x_2\end{bmatrix} = x dz & \rightarrow & dw^{(i)} =  x^{(i)} dz^{(i)}\\
\end{aligned}$$

<br/>

$$\begin{aligned}
    X  &= \begin{bmatrix} x^{(1)} \dots x^{(m)} \end{bmatrix} & X &= \text{a matrix}     \\
    Y  &= \begin{bmatrix} y^{(1)} \dots y^{(m)} \end{bmatrix} & Y &= \text{a row vector} \\
    A  &= \begin{bmatrix} a^{(1)} \dots a^{(m)} \end{bmatrix} & A &= \text{a row vector} \\
    dZ &= \begin{bmatrix} dz^{(1)} \dots dz^{(m)} \end{bmatrix} = A - Y & dZ &= \text{a row vector}\\
    dW &= \begin{bmatrix} x^{(1)} dz^{(1)} \dots x^{(m)} dz^{(m)} \end{bmatrix}
\end{aligned}$$


```
dw = 1/m * np.dot(X, (A-Y).T)
db = 1/m * np.sum(A-Y)
```

$$J(w, b) = \frac{1}{m}\sum\mathcal{L}(a^{(i)}, y^{(i)})$$

$$dw = \frac{1}{m} \sum x^{(i)} dz^{(i)} = \frac{1}{m} X dZ^T$$

$$db = \frac{1}{m} \sum dz^{(i)} = \frac{1}{m} dZ^T$$

\begin{bmatrix} dz^{(1)} \dots dz^{(m)} \end{bmatrix} = A - Y \\
    dW &= \begin{bmatrix} x^{(1)} dz^{(1)} \dots x^{(m)} dz^{(m)} \end{bmatrix}