# Cheatsheet
Pick what you need

# Sigmoid

\begin{align}
\sigma(z) &= \frac{1}{1+e^{-z}} \\
\sigma'(z) &= \sigma(z)*(1-\sigma(z))
\end{align}

```
def sigmoid(z):
    return 1/(1+np.exp(-z))

def sigmoid_grad(z):
    return sigmoid(z) * 1-sigmoid(z)
```


# Mean Square Error

Naive MSE
\begin{align}
C(w,b)&=\frac{1}{2n}\sum|y-a|^2 \\
\frac{\partial C}{\partial a^L}&=\frac{1}{n}(a^L - y)
\end{align}


Stochastic MSE
\begin{align}
C(w,b)&=\frac{1}{2m}\sum|y-a|^2 \\
\frac{\partial C}{\partial a^L}&=\frac{1}{m}(a^L - y)
\end{align}

where n > m

```
grad(C/a_L)=a_L-y/m
```


# Backpropagation

\begin{align}
w \Rightarrow w' = w^L - \eta \frac{\partial C}{\partial w^L} 
\end{align}

```
w_L -= alpha * grad(C/w)
```


# Stochastic Gradient Descent

Rate of change of the cost with respect to any weight
\begin{align}
\frac{\partial C}{\partial w^L} = a^{L-1}\delta^L
\end{align}

```
grad(C/w_L) = np.dot(a_L-1.T,err_L)
```

Error in the output layer -- for sigmoid layer

\begin{align}
\delta^L = \frac{\partial C}{\partial a^L} \sigma'(z^L)
\end{align}

```
error_L=a_L-y/minibatch*sigmoid_grad(z_L)
```

Error in non-output layer -- for sigmoid layer

\begin{align}
\delta^{L-1} = ((w^L)^T)\delta^L \odot \sigma'(z^{L-1})
\end{align}

```
error_L-1=np.dot(err_L,(w_L).T)*sigmoid_grad(L-1)
```
