## Cross entropy/softmax


$$ CE(x, y) = \sum_{i}{(- y_i * log(p_i) - (1 - y_i) * log(1-p_i))} $$

$$ \begin{align} &\sum_{i}{(- 0 * log(p_i) - (1 - 0) * log(1-p_i))} \\
=  &\sum_{i}{- log(1-p_i))} \end{align}$$

$$ S(\begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ \vdots \\ x_n \end{bmatrix})  = \begin{bmatrix} \frac{e^{x_1}}{e^{x_1} + e^{x_2} + e^{x_3} + \ldots + e^{x_n}} \\ 
\frac{e^{x_2}}{e^{x_1} + e^{x_2} + e^{x_3} + \ldots + e^{x_n}} \\
\frac{e^{x_3}}{e^{x_1} + e^{x_2} + e^{x_3} + \ldots + e^{x_n}} \\
\ldots \\
\frac{e^{x_n}}{e^{x_1} + e^{x_2} + e^{x_3} + \ldots + e^{x_n}}
\end{bmatrix} $$

$$ S(\begin{bmatrix} p_1 \\ p_2 \end{bmatrix}) = \begin{bmatrix} \frac{e^{p_1}}{e^{p_1} + e^{p_2}} \\
\frac{e^{p_2}}{e^{p_1} + e^{p_2}} \\
\end{bmatrix} $$

## Derivation of Softmax Cross Entropy derivative

$$ C(p, y) = - y * \text{log}(p) - (1 - y) * \text{log}(1-p) $$

$$
C(p,y)=
\begin{cases}
-log(1-p) & \text{if }  y = 0\\
-log(p) & \text{if }  y = 1
\end{cases}
$$

$$ SC(p_1, p_2, y_1) = - y_1 * log(\frac{e^{p_1}}{e^{p_1} + e^{p_2}}) - (1 - y_1) * log(1-\frac{e^{p_1}}{e^{p_1} + e^{p_2}}) $$

Structure:

$ y_1 = a $, $e^{x_2} = b$

$$ SC(x_1) = - a * log(\frac{e^{x_1}}{e^{x_1} + b}) - (1 - a) * log(1-\frac{e^{x_1}}{e^{x_1} + b}) $$

**Quotient rule:**

$$ f(x) = \frac{g(x)}{h(x)} $$ 

$$ f'(x) = \frac{g'(x) * h(x) - g(x) * h'(x)}{(h(x))^2} $$

If

$$ f(x) = \frac{e^x}{e^x + b} $$ 

$$ \begin{align} f'(x) =& \frac{e^x * (e^x + b) - (e^x * e^x)}{(e^x + b)^2} \\ 
=& \frac{e^x * (e^x + b - e^ x)}{(e^x + b)^2} \\
=& \frac{e^x * b}{(e^x + b)^2}\end{align} $$

And if:

$$ g(x) = - a * log(f(x)) - (1 - a) * log(1-f(x)) $$

then:

$$ g'(x) = - a * \frac{f'(x)}{f(x)} - (1 - a) * \frac{-1 * f'(x)}{1-f(x)} $$

First, we'll compute $\frac{f'(x)}{f(x)}$:

$$ \begin{align} 
\frac{f'(x)}{f(x)} =& \frac{\frac{e^x * b}{(e^x + b)^2}}{\frac{e^x}{e^x + b}} \\\\
=& \frac{-e^x * b}{(e^x + b)^2} * \frac{e^x + b}{e^x}
 \\
=& \frac{b}{e^x + b} \end{align} $$

Now, in this next part, we'll use the fact that:

$$ \frac{b}{e^x + b} = 1 - \frac{e^x}{e^x + b} $$

$ \begin{align} 
\frac{-1 * f'(x)}{1 - f(x)} =& \frac{-1 * \frac{e^x * b}{(e^x + b)^2}}{1 - \frac{e^x}{e^x + b}} \\
=& \frac{\frac{-e^x * b}{(e^x + b)^2}}{\frac{b}{e^x + b}} \\
=& \frac{-e^x * b}{(e^x + b)^2} * \frac{e^x + b}{b}
 \\
=& \frac{-e^x}{e^x + b} \end{align}$

Finally, putting these pieces together:

$$ \begin{align} SC'(x) =& - a * \frac{f'(x)}{f(x)} - (1 - a) * \frac{-1 * f'(x)}{1-f(x)} \\ 
=& -a * \frac{b}{e^x + b} - (1 - a) * \frac{-e^x}{e^x + b} \\ 
=& -a * \frac{b}{e^x + b} + \frac{e^x}{e^x + b} - a * \frac{-e^x}{e^x + b} \\ 
=& -a * (1 - \frac{e^x}{e^x + b}) + \frac{e^x}{e^x + b} - a * \frac{-e^x}{e^x + b} \\
=& -a + a * \frac{e^x}{e^x + b} + \frac{e^x}{e^x + b} - a * \frac{-e^x}{e^x + b} \\
=& -a + \frac{e^x}{e^x + b} \\
\end{align} \\ 
$$

That's right, the derivative to be sent backward from the softmax layer is simply:

$$ S - Y = s(\begin{bmatrix} p_1 \\ p_2 \end{bmatrix}) - \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} = \begin{bmatrix} \frac{e^{p_1}}{e^{p_1} + e^{p_2}} - y_1 \\ \frac{e^{p_2}}{e^{p_1} + e^{p_2}} - y_2 \end{bmatrix} $$

This makes sense:

* The softmax output will always be between 0 and 1.
* If $y_i$ is 0, then $ s(x_1) - y_1 $ will be a positive number: because indeed, if we increase the value of $x_1$, the loss will increase. Conversely if $y_i$ is one.
* Note that this loss function only makes sense because $ s(x_i) $ is always between 0 and 1. 

This, by the way, is why TensorFlow has a function called `softmax_cross_entropy_with_logits`!

### Logistic normalization

$$ \begin{bmatrix} p_1 \\ p_2 \\ p_3 \\ \vdots \\ p_n \end{bmatrix} \Rightarrow \begin{bmatrix} p_1 & 1-p_1 \\ p_2 & 1-p_2 \\ p_3 & 1-p_3 \\ \vdots & \vdots \\ p_n & 1-p_n \end{bmatrix} $$ 

Opposite:

$$ \begin{bmatrix} p_1 & 1-p_1 \\ p_2 & 1-p_2 \\ p_3 & 1-p_3 \\ \vdots & \vdots \\ p_n & 1-p_n \end{bmatrix} \Rightarrow \begin{bmatrix} p_1 \\ p_2 \\ p_3 \\ \vdots \\ p_n \end{bmatrix} $$ 