# Softmax Classification with Cross-Entropy

In [1]:
import numpy as np

Softmax function takes as input a C-dimensional vector $z$ and outputs a C-dimensional vector $y$ of real value between 0 and 1. This function is a normalized exponential and defined as:
$$y_c = f(z)_c = \frac{e^{z_c}}{\sum_{d = 1^{e^{z_d}}}^C}$$

for $c = 1 ... C$

The denominator acts as a regularizer to assure the sum of all $C = 1$.

In [2]:
def softmax(z):
    return np.exp(z) /\
           np.sum(np.exp(z))

If $i = j$: $\frac{\partial y_i}{\partial z_i} = y_i(1 - y_i)$ (this is similar to the logistic function)  
If $i \ne j$: $\frac{\partial y_i}{\partial z_j} = -y_iy_j$

###### Likelihood Function
$$argmax_{theta} L(\theta | t, z)$$

which means the likelihood is the joint probability of generating $t$ and $z$ given the parameters $\theta$. 
$$P(t, z | \theta) = P(t | z, \theta)P(z | \theta)$$

Since we are interested in the probability of $z$ we can ultimately deduce this as:
$$-log( \theta | t, z ) = -log \prod_{i = c}^C y_c^{t_c} = \sum_{i = c}^C t_c log(y_c)$$

###### Cross-Entropy Error 
Over a batch of multiple campled of size $n$ can be calculated as:
$$\xi(T, Y) = \sum_{i = 1}^n \xi(t_i, y_i) = -\sum_{i = 1}^n\ \sum_{i = 1}^C\ t_{ic} log(y_{ic})$$

where $t_{ic} = 1$ if sample $i$ belongs to class $c_1$ and $y_{ic}$ is the output probability that sample $i$ belongs to class $c$.