# Introduction to Neural Networks

An artificial neural network (ANN, or just NN) is:
* a __graph__ where
* the __nodes__ process inputs (from the "outside world" or from other nodes)
  * each node processes inputs according to an __activation function__, which is typically nonlinear (eg a Gaussian)
* the __edges__ indicate the flow of data between nodes
  * each edge has a __weight__
  
When the ANN is trained, the hyperparameters of the activation function and the edge weights are fit to the training data.

A neural network is typically trained using a MSSE loss function and gradient descent of some type.

There are different types of ANN. For example:
* feedforward - the information flow in this type of NN is always from input to output
* recurrent - a node may feed *back into itself*

In a neural network, nodes that read directly from the outside world are typically in an "input layer"; those that write directly to the outside world are in an "output layer"; and all the rest are in one or more "hidden layers". For example, in DALL-E there are 64 self-attention layers.

## Let's look at one node in a neural network

Node $j$ in a neural network takes inputs $I_j$ and produces a single output $O_j$, by calculating $f(I_j)$. 

Typically, $I_j = \sum_i^{N-1} w_{ij}O_i$, where $w_{ij}$ is the weight on the edge from node $i$ to node $j$.

Each node typically has a *bias term*, a constant value, which is fit during training just like the edge weights. In the sum above, $O_0$ would be the bias term. 

The function $f$ is the activation function. It could be any type of function; for example, linear, or ReLU, or Gaussian. For more, see the resources below.

## Let's play with neural networks

Open https://playground.tensorflow.org

# Radial Basis Functions

A radial basis function is "a real-valued function whose value depends only on the distance between the input and some fixed point" (https://en.wikipedia.org/wiki/Radial_basis_function).

What might look like that?

# Radial Basis Function Networks

A RBF network is a feedforward network with one hidden layer. Each node in the hidden layer represents a prototype, or prototypical point, fit from the training data. Any new point (eg in the test data) that is close enough to the prototype will activate that node.

The input layer in a RBF will have one node for each dimension in the input data.

The output layer will have one node for each class into which the data may be classified. A well-trained RBF will activate only one node in the output layer for any input data point.

## Training a RBF

Training a RBF consists of:
* Finding prototypes
* Selecting the activation function for the hidden nodes
* Selecting the activation function for the output nodes
* Setting the weights for the edges and biases

To find prototypes we can select training data points at random, but it will work better if we use one of the analysis methods we already know.

A typical activation function for the hidden nodes is the Gaussian, so something like $exp \left - \frac{||\vec{d}-\vec{\mu_j}||^2}{2\delta_j^2 + \epsilon}$, where $\vec{d}$ is the data point, $\vec{\mu_j}$ is the prototype, $\delta_j$ is the hidden unit's standard deviation, $\epsilon$ is a small constant and $||.||^2$ is the squared Euclidean distance.

A good activation function for the output nodes is $w_{bias,k} + \sum_{j=1}^{N_p}(w_{j,k}H_j)$ where $w_{bias,k}$ is the weight from the bias node (which has a value of 1) to the $k$th output node, $N_p$ is the number of prototypes (hidden nodes), $w_{j,k}$ is the weight on the edge from the $j$th hidden node to the $k$th output node and $H_j$ is the activation level (output) of the $j$th hidden node.

# Resources

* https://playground.tensorflow.org
* https://mathworld.wolfram.com/RadialFunction.html
* http://www.ideal.ece.utexas.edu/papers/agogino_1999ijcnn.ps.gz
* https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-adam-f898b102325c
* https://stats.stackexchange.com/questions/115258/comprehensive-list-of-activation-functions-in-neural-networks-with-pros-cons