[View in Colaboratory](https://colab.research.google.com/github/arturzeitler/Bayes-and-MC/blob/master/Bayesian_NN_Example.ipynb)

A Bayesian neural network is a neural network with a prior distribution on its weights (Neal, 2012).

**Regression Model Example:**

Consider a data set $\{(\mathbf{x}_n, y_n)\}$, where each data point comprises of features $\mathbf{x}_n\in\mathbb{R}^D$ and output $y_n\in\mathbb{R}$. Define the likelihood for each data point as
\begin{aligned} p(y_n \mid \mathbf{w}, \mathbf{x}_n, \sigma^2) &= \text{Normal}(y_n \mid \mathrm{NN}(\mathbf{x}_n\;;\;\mathbf{w}), \sigma^2),\end{aligned}

where $\mathrm{NN}$ is a neural network whose weights and biases form the latent variables $\mathbf{w}$. Assume $\sigma^2$ is a known variance.

Define the prior on the weights and biases $\mathbf{w}$ to be the standard normal
\begin{aligned} p(\mathbf{w}) &= \text{Normal}(\mathbf{w} \mid \mathbf{0}, \mathbf{I}).\end{aligned}

Let’s build the model in Edward. We define a 3-layer Bayesian neural network with $\tanh$ nonlinearities.

In [0]:
import tensorflow as tf 
from tensorflow_probability import edward2 as ed 

def neural_network(x):
    h = tf.tanh(tf.matmul(x, W_0) + b_0)
    h = tf.tanh(tf.matmul(h, W_1) + b_1)
    h = tf.matmul(h, W_2) + b_2
    return tf.reshape(h, [-1])

N = 40  # number of data ponts
D = 1   # number of features

W_0 = ed.Normal(loc=tf.zeros([D, 10]), scale=tf.ones([D, 10]))
W_1 = ed.Normal(loc=tf.zeros([10, 10]), scale=tf.ones([10, 10]))
W_2 = ed.Normal(loc=tf.zeros([10, 1]), scale=tf.ones([10, 1]))
b_0 = ed.Normal(loc=tf.zeros(10), scale=tf.ones(10))
b_1 = ed.Normal(loc=tf.zeros(10), scale=tf.ones(10))
b_2 = ed.Normal(loc=tf.zeros(1), scale=tf.ones(1))

x = tf.cast(x_train, dtype=tf.float32) # Assuming x_train already exists
# x = tf.random_uniform([N, D], minval=10, maxval=40, dtype=tf.float32) If x_train does not exist
y = ed.Normal(loc=neural_network(x), scale=0.1 * tf.ones(N))

**Classification Model Example:**

Consider a data set $\{(\mathbf{x}_n, y_n)\}$, where each data point comprises of features $\mathbf{x}_n\in\mathbb{R}^D$ and output $y_n\in{\{0,1}\}$. Define the likelihood for each data point as
\begin{aligned} p(y_n \mid \mathbf{w}, \mathbf{x}_n) &= \mathrm{NN}(\mathbf{x}_n\;;\;\mathbf{w})^{y_n} (1-\mathrm{NN}(\mathbf{x}_n\;;\;\mathbf{w}))^{1-y_n},\end{aligned}

where $\mathrm{NN}$ denotes the neural network's output with a logistic sigmoid as its activation function, i.e.
$\mathrm{NN}=sigma(a) = 1/(1+exp(-a))$. 

Weights and biases form the latent variables $\mathbf{w}$. 

Define the prior on the weights and biases $\mathbf{w}$ to be the standard normal
\begin{aligned} p(\mathbf{w}) &= \text{Normal}(\mathbf{w} \mid \mathbf{0}, \mathbf{I}).\end{aligned}

Let’s build this model in Edward. Again, we define a 3-layer Bayesian neural network with $\tanh$ nonlinearities.

In [0]:
def neural_network_bi(x):
    h = tf.tanh(tf.matmul(x, W_0) + b_0)
    h = tf.tanh(tf.matmul(h, W_1) + b_1)
    h = tf.nn.sigmoid(tf.matmul(h, W_2) + b_2)
    return tf.reshape(h, [-1])

N = 40  # number of data ponts
D = 1   # number of features

W_0 = ed.Normal(loc=tf.zeros([D, 10]), scale=tf.ones([D, 10]))
W_1 = ed.Normal(loc=tf.zeros([10, 10]), scale=tf.ones([10, 10]))
W_2 = ed.Normal(loc=tf.zeros([10, 1]), scale=tf.ones([10, 1]))
b_0 = ed.Normal(loc=tf.zeros(10), scale=tf.ones(10))
b_1 = ed.Normal(loc=tf.zeros(10), scale=tf.ones(10))
b_2 = ed.Normal(loc=tf.zeros(1), scale=tf.ones(1))

x = tf.cast(x_train, dtype=tf.float32) # Assuming x_train already exists
# x = tf.random_uniform([N, D], minval=10, maxval=40, dtype=tf.float32) If x_train does not exist
y = ed.Bernoulli(probs=neural_network_bi(x))

# References

Neal, R. M. (2012). Bayesian learning for neural networks (Vol. 118). Springer Science & Business Media.