# The plot

- Neural Nets are simple mathematical models defining a function.
- A Neural Net with a fixed structure is characterized by the unit biases and connection weights parameters
- Thus a prior over biases and weights implies a prior over functions.
- However, the meaning of weights and biases in neural nets is obscure.
- Moreover, the number of hidden units (in a multilayer perceptron with single hidden layer) limits the set of functions representable by the neural net.
- By increasing the number of hidden units to infinity, and by selecting appropriate priors for the neural net parameters, the corresponding prior over functions is a Gaussian Process.

# Background on Neural Nets
- [X] Mathematical formulation
- [X] Multilayer Perceptron
- [X] Representation of a single hidden-layer perceptron as a DAG, with labels for unit biases and connection weights
- [X] Mathematical expression for the corresponding perceptron with $\tanh$ activation function

Neural networks are mathematical models for defining a function mapping inputs $I$ to outputs $O$, $$\,f: X \rightarrow Y$

A neural network is composed of a number of connected **units**. The way that the units are connected to each other determine the *network structure*. In this tutorial, we will focus on **multilayer perceptrons**. The multilayer perceptron has three main properties:

1. *Feedforward connections* - connections between units are unidirectional and there are no cycles.
2. *Layered units* - the network is organized in layers, with each unit being connected only to units in the previous and next layers.
3. *Multiple layers* - there are more than two layers.

![alt text](multilayer_perceptron.png "Title")

A multilayer perceptron with $I$ **input** and $O$ **output** units takes in a set of real inputs, $\mathbf{x} := \{x_i\}_{i=1}^I$, and computes the real outputs, $\mathbf{y}:=\{f_k(\mathbf{x})\}_{k=1}^O$, using one or more layers of **hidden** units. In a network with one hidden layer, as in the figure above, the computations can be summarized as 

$
\begin{align}
    f_k(\mathbf{x}) &= b_k + \sum_j v_{jk} h_j(\mathbf{x})
    \\
    h_j(\mathbf{x}) &= K\left(a_j + \sum_i u_{ij}x_i\right)
\end{align}
$

Here, $u_{ij}$ is the connection weight from the input unit $i$ to the hidden unit $j$, and $v_{jk}$ is the connection weight from hidden unit $j$ to output unit $k$. Each output unit $k$ and hidden unit $j$ is associated with a *unit bias*, $b_k$ and $a_j$, respectively. Each hidden unit passes its input values through an **activation function**, $K$, which is usually a nonlinear function, such as the [hyperbolic tangent](https://en.wikipedia.org/wiki/Hyperbolic_function#Standard_analytic_expressions). 

# Putting priors on Neurals Nets
- Everything is zero-mean Gaussian
- The number of hidden units increases to infinity
- Prove that the the joint distribution of the values of the function at any finite number of points is multivariate Gaussian

## Graph for generating smooth functions with Neural Nets
*Make it interactive, let the user pick the number of hidden units

## Graph for generating smooth functions with Gaussian Processes
*RBF kernel, similar to the GP_tutorial_one

## Show the relationship between GP and Neural Net covariance for smooth functions

# Extra

- Sample Brownian motion functions from a GP and from a NN (with increasing number of hidden units)
- Show relationship between covariance functions

# Possible extra

- Talk about priors for networks with more than one hidden layers?