## Cost function
We've seen that the cost function for logistic regression is:

$$J(\theta) = -\frac{1}{m} \sum_{i=0}^m[y^{(i)}\log(h_\theta(x^{(i)})) + (1 - y^{(i)})\log(1 - h_\theta(x^{(i)}))]$$

Where $h_\theta(x^{(i)})$ is the Sigmoid function. In order to get the update rule, we need to differentiate this. Firstly, we know that

$$\frac{\partial}{\partial x} \sigma(x) = \sigma(x) \cdot (1 - \sigma(x))$$

In our case, we want to partial differentiate with respect to our parameters $\theta$. So we have

$$\frac{\partial}{\partial\theta} \sigma(z) = \sigma(z) \cdot (1 - \sigma(z)) \frac{\partial}{\partial\theta}z$$

Where,

$$z = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + ....$$

Therefore,

$$\frac{\partial}{\partial\theta_0}z = 1$$

And,

$$\frac{\partial}{\partial\theta_1}z = x_1$$

So on and so forth. Thus, our equation would be

$$\frac{\partial}{\partial\theta_i} \sigma(z) = \sigma(z) \cdot (1 - \sigma(z)) \cdot x_i$$

Where $x_0$ is 1.

### Differentiating the negative likelihood

The way we chose to represent our cost function (by taking logs) is called as negative log likelihood. We'll differentiate it now to obtain the update rule.

$
\begin{align}
\frac{\partial}{\partial\theta}J(\theta) &= \frac{\partial}{\partial\theta}\left[-\frac{1}{m}\sum_{i=0}^m[y^{(i)}\log(h_\theta(x^{(i)}))+(1-y^{(i)})\log(1-h_\theta(x^{(i)}))]\right] \\
&= -\frac{1}{m}\sum_{i=0}^m\left[\frac{\partial}{\partial\theta}y^{(i)}\log(h_\theta(x^{(i)}))+\frac{\partial}{\partial\theta}(1-y^{(i)})\log(1-h_\theta(x^{(i)}))\right] \\
&= -\frac{1}{m}\sum_{i=0}^m\left[\frac{y^{(i)}}{h_\theta(x^{(i)})}\frac{\partial}{\partial\theta}h_\theta(x^{(i)})-\frac{1-y^{(i)}}{1-h_\theta(x^{(i)})}\frac{\partial}{\partial\theta}h_\theta(x^{(i)})\right] \\
\text{Replacing h with sigma} \\
&= -\frac{1}{m}\sum_{i=0}^m\left[\frac{y^{(i)}}{\sigma(x^{(i)})}\frac{\partial}{\partial\theta}\sigma(x^{(i)})-\frac{1-y^{(i)}}{1-\sigma(x^{(i)})}\frac{\partial}{\partial\theta}\sigma(x^{(i)})\right] \\
\text{Take sigma common} \\
&= -\frac{1}{m}\sum_{i=0}^m\left[\left(\frac{y^{(i)}}{\sigma(x^{(i)})}-\frac{1-y^{(i)}}{1-\sigma(x^{(i)})}\right)\frac{\partial}{\partial\theta}\sigma(x^{(i)})\right] \\
&= -\frac{1}{m}\sum_{i=0}^m\left[\left(\frac{y^{(i)}-y^{(i)}\cdot\sigma(x^{(i)})-\sigma(x^{(i)})+y^{(i)}\cdot\sigma(x^{(i)})}{\sigma(x^{(i)})\cdot(1-\sigma(x^{(i)}))}\right)\frac{\partial}{\partial\theta}\sigma(x^{(i)})\right] \\
\text{Cancel positive and negative terms} \\
&= -\frac{1}{m}\sum_{i=0}^m\left[\left(\frac{y^{(i)}-\sigma(x^{(i)})}{\sigma(x^{(i)})\cdot(1-\sigma(x^{(i)}))}\right)\frac{\partial}{\partial\theta}\sigma(x^{(i)})\right] \\
\text{We know the gradient of sigma} \\
&= -\frac{1}{m}\sum_{i=0}^m\left[\left(\frac{y^{(i)}-\sigma(x^{(i)})}{\sigma(x^{(i)})\cdot(1-\sigma(x^{(i)}))}\right)\cdot\sigma(x^{(i)})\cdot(1-\sigma(x^{(i)}))\cdot x^{(i)}_{j}\right] \\
\text{Numerator and Denominator cancel out!} \\
&= -\frac{1}{m}\sum_{i=0}^m(y^{(i)}-\sigma(x^{(i)}))\cdot x^{(i)}_{j} \\
\end{align}
$

## Update Rule for thetas

Taking the negative sign inside in the last equation, we have our update rule

### $ \begin{align*} & Repeat \; \lbrace \newline & \; \theta_j := \theta_j - \frac{\alpha}{m} \sum_{i=0}^m (\sigma(x^{(i)})-y^{(i)})\cdot x^{(i)}_j \newline & \rbrace \end{align*} $