## A list of Cost Functions used in Neural Networks

### Feedforward Neural Network

Many layers of neurons connected together.

#### Notation

$a_j^i$ is the activation (output) of the $j$th neuron in the $i$th layer, where $a_j^1$ is the $j$th element in the input vector.

Relate next layer input to previous:
$$a_j^i = \sigma( \sum_k\ ( w_{jk}^i\ dot\ a_k^{i - 1} ) + b_j^i )$$

where:
- $\sigma$ is the activation function
- $w_{jk}^i$ is the weight for the $k$th neuron in the $(i - 1)$th layer to the $j$th neuron in the $i$th layer.
- $b_j^i$ is the bias of the $j$th neuron in the $i$th layer
- $a_j^i$ represents the activation value of the $j$th neuron in the $i$th layers.

Sometimes we write $z_j^i$ to represent $\sum_k\ ( w_{jk}^i\ dot\ a_k^{i - 1} ) + b_j^i )$, in other words the activation value of a neuron before applying the activation function.

A more concise notation:
$$a_i = \sigma( w_i x a^{i - 1} x b^i )$$

Think of $I$ as the inputs $\in R^n$, and set $a^1 = I$ and the goal is to compute $a^1, ..., a^m$ where $m$ is the number of layers.

#### Introduction

Cost function is a measure of how good a neural net did with respect to its given training sample and the expected output. This also depends on weights and biases. A cost function is a single value that evaluates how good the network did as a whole.

$$C( W, B, S^r, E^r )$$

where:
- $W$ is the network's weights
- $B$ is the network's biases
- $S^r$ is the input of a single training sample
- $E^r$ is the desired output of that training sample

###### Backpropagation

Cost function:
$$\delta_j^L = \frac{ \partial\ C }{ \partial\ \alpha_j^L }\ \sigma'\ ( z_i^j )$$

###### Cost Function Requirements

- Cost function must be written as an average: $C = \frac{1}{n}\ sum_x\ C_x$
- Cost function must not be dependent on any activation values of a neural network besides the output values $\alpha^L$

#### Other Costs

###### Quadratic Costs
$$C = \frac{1}{2}\ \sum_j\ ( a_j^L - E_j^r )^2$$

###### Cross-Entropy Cost
$$C = \sum_j\ [ E_j^r\ ln\ \alpha_j^L\ + ( 1 - E_j^r )\ ln\ ( 1 - \alpha_j^L ) ]$$

###### Exponential Cost
$$C = t\ exp( \frac{ 1 }{ t }\ \sum_j\ ( a_j^L - E_j^r )^2 )$$

###### Hellinger DIstance
$$C = \frac{ 1 }{ \sqrt( 2 ) }\ \sum_j\ ( \sqrt( \alpha_j^L ) - \sqrt( E_j^r ) )^2$$