## 6. Bayesian Modelling

This notebook is part of a larger effort to offer an approachable introduction to models of the mind and the brain for the course “Foundations of Neural and Cognitive Modelling”, offered at the University of Amsterdam by [Jelle (aka Willem) Zuidema](https://staff.fnwi.uva.nl/w.zuidema/). The notebook in this present form is the result of the combined work of Iris Proff, [Marianne de Heer Kloots](http://mdhk.net/), and [Simone Astarita](https://www.linkedin.com/in/simone-astarita-4499b11b5/).

### Instructions
Please hand in the following:
- A copy of this notebook with the **code** and results of running the code filled in the required sections. The sections to complete all start as follows:

<code>### YOUR CODE HERE ###</code>

- A separate pdf file with the answers to the **homework exercises**. These can be identified by the following formatting, where **n** is the number of points (out of 10) that question **m** is worth:
<br>

>***Homework exercise m***: question(s) **(npt)**.

### Introduction

In this lab we are looking at the basics of Bayesian modelling: probability distributions, priors, posteriors, and Bayes’ rule.

### 1. Probability distributions

Probability distributions are used to describe random processes, such as tossing a coin or randomly sampling people from a population. A probability distribution is a function that maps all possible values of a random process to their respective probabilities. Probability distributions can take many different shapes. We will discuss some common probability distributions and how to work with them.

#### 1.1 Uniform distribution

The simplest probability distribution is the uniform distribution: each value of a certain range occurs with the same probability. We can use <code>np.random.uniform()</code> to sample from the uniform distribution. 

> ***Homework exercise 1:***  The cell below samples values from a uniform distribution between $0$ and $1$ and plots them as a histogram. What does the cumulative histogram express? What happens if you change the number of sampled values and why? **(2pt)**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import math

# sample 10 values from an uniform distribution, going from 0 to 1
x = np.random.uniform(0,1,10)
n_bins = 20

# plot histogram
fig, ax = plt.subplots()

n, bins, patches = ax.hist(x, n_bins, cumulative=False, label='Empirical',rwidth=0.9)
ax.set_title('Histogram')

# cumulative histogram
fig, ax = plt.subplots()
n, bins, patches = ax.hist(x, n_bins, cumulative=True, label='Empirical', color ='lightgreen',rwidth=0.9)
ax.set_title('Culumative histogram')

#### 1.2 Geometric distibution

We now sample from the geometric distribution which is given by:

$$
\begin{equation*}
p(X|\theta)= \theta(1-\theta)^{X-1}
\end{equation*}
$$

The distribution expresses the probability of tossing a coin $x$ times, until head appears for the first time. The parameter $\theta$ is the bias of the coin: a fair coin has a bias of $0.5$. In that case, both outcomes are equally likely.

Given a value $\theta$ (“theta”), flipping the coin until head comes up can be simulated using a for-loop. 

> Play around with parameter theta and observe the effect.

In [None]:
x = 0         # x counts number of coin tosses
head = False  # head tracks if we already reached outcome "head"
theta = 0.5   # define the bias

# repeat until we throw "head"
while not head:
    x = x+1
    head = np.random.uniform(0,1,1) < theta # throw the coin with bias theta
    
print(x,'coin tosses until head')

Our scenario gets more interesting if the bias $\theta$ itself is not a fixed value, but drawn from a probability distribution. This distribution over $\theta$ is the **prior distribution**: it biases the outcomes $X$ into a certain direction.

The function <code>coin_tosses()</code> in the cell below samples a pair $(\theta, X)$ from the geometric distribution (as we did above, but now in a function). 

> Add a line of code to sample a value for $\theta$ from a uniform distribution between 0 and 1.

In [None]:
def coin_tosses():
    
    ### YOUR CODE HERE ### 
    # draw theta from a uniform distribution between 0 and 1
    theta = ...

    x = 0
    head = False
    
    # repeat until we throw "head"
    while not head:
        x = x+1
        head = np.random.uniform(0,1,1) < theta # throw the coin with bias theta
        
    return theta, x # return bias and number of coin tosses

The next cell uses the function <code>coin_tosses</code> to draw 200 $(\theta,x)$ pairs and plots them in a scatterplot.

In [None]:
# initialize variables
n = 200
xs = np.zeros([1,n])
thetas = np.zeros([1,n])

# draw 200 (theta, x) pairs using function coin_tosses
for i in range(0,n):
    thetas[0,i], xs[0,i] = coin_tosses() # store theta and x in the arrays "thetas" and "xs"

# make a scatter plot
fig, ax = plt.subplots()
plt.scatter(thetas,xs,marker = '*')
ax.set_xlabel('theta')
ax.set_ylabel('number coin tosses')

Congratulations, you have just played around with your first **hierarchical Bayesian model**, where one stochastic process (selecting $\theta$) determines the parameters of another stochastic process (producing $X$s).

#### 1.3 Sampling using Python

Next to the uniform distribution we have been using already, Python has built-in sampling functions for many standard probability distributions, for instance:

<code>numpy.random.binomial()</code> 
<code>numpy.random.normal()</code> 
<code>numpy.random.poisson()</code> 
<code>numpy.random.geometric()</code>

You can use these functions to draw samples from these distributions.

> ***Homework exercise 2:*** Reproduce the scatterplot we created in the cell above by using the function <code>numpy.random.geometric()</code> instead of a for-loop. You need to generate a sequence of $200$ random values for $\theta$. Do the plots look more or less the same? **(0.5pt)**

In [None]:
### YOUR CODE HERE ###
# create a sequence of 200 thetas
my_thetas = ...

# create a sequence of 200 coin tosses
my_xs = np.random.geometric(my_thetas)

fig, ax = plt.subplots()
plt.scatter(my_thetas,my_xs,marker = '*')
ax.set_xlabel('theta')
ax.set_ylabel('number coin tosses')

### 2. Deriving the posterior

To develop a bit of an intuition about how probability distributions let us model interesting phenomena in cognitive science, we consider the slightly more complex **Poisson distribution** and use it as a model of neural spike trains.

Neurons are believed to encode relevant information in the firing rate of a spike train. For instance, the brightness of a visual stimulus $s$ can be encoded through some function $r=f(s)$ that yields the rate $r$ of the resulting spike train.
Such spike trains can be modeled with a Poisson distribution, where spikes are generated randomly with rate $r$; we will assume that the neuron has a constant rate of firing. The distribution of the spike count $X$ in a time interval of length $T$ is then given by:

$$
\begin{equation*}
    P(X|r) = \frac{(rT)^X}{X!}e^{-rT}
\end{equation*}
$$

We call this probability distribution the **likelihood** or **model evidence**. It gives us the probability of observing data $X$ given an hypothesis: in this case, a spike rate of $r$.


> ***Homework exercise 3:*** Pick some values for $r$ and generate plots for $P(X)$ for these values using <code>np.random.poisson()</code>. You need to generate a sequence of $20000$ random values from the poisson distribution. Where does this function have most of its probability mass? We assume $T = 1$. **(0.5pt)**

In [None]:
### YOUR CODE HERE ###
# define spike rate
r = ...

# draw 20000 samples from the poisson distribution
xs_poisson = 

n_bins = 10

# histogram
fig, ax = plt.subplots()

n, bins, patches = ax.hist(xs_poisson, n_bins,rwidth=0.9)
plt.xlabel('number of spikes')
plt.ylabel('samples')
plt.legend(['r = {}'.format(r)])

We will now look a bit closer at the concepts of prior and posterior probabilities by looking at an imaginary neuron that responds differently to different stimuli. Say our imaginary neuron will respond to stimulus $A$ with a firing rate of $r=3$, whereas stimulus $B$ will elicit a spiking response with rate $r=8$. We will try to infer what stimulus is being presented to the neuron by looking at the response of the neuron. A priori, stimulus $A$ is a lot more likely to occur than stimulus $B$: $P(A) = 0.7$, whereas $P(B) = 0.3$; we call the probability distribution over $A$ and $B$ the **prior**.

We now measure the response of our imaginary neuron; over a period $T$ we measure $X$ spikes. We call this the **data**. 

Bayes’ rule allows us to use data, likelihood and prior to compute the **posterior probability** of each hypothesis ($A$: $r=3$; $B$: $r=8$) given the data:

$$
\begin{equation*}
P(r | X) = \frac{P(X | r) P(r)}{\sum_{r'}P(r')P(X|r')}
\end{equation*}
$$

The posterior is the probability of our hypothesis, given the data that we observed. This is an extremly useful value, because it allows to directly compare different hypothesis about the data against each other. The denominator is a normalization term and it is the same for each hypothesis. Given that $P(A) + P(B) = 1$, the denominator, in our case, can be rewritten as the sum of the two numerators, the one for $A$ and the one for $B$.

With the following function <code>eval_poiss</code> we can compute the likelihood $P(X|r)$ for the poisson distribution.

In [None]:
# Evaluates the poisson distribution at one place, thus computes P(data|r)

def eval_poiss(r, data):
    p = r**data/math.factorial(data)*math.exp(-r)
    return p

> ***Homework exercise 4:*** Compute the posterior probability for each of the two rates ($A$: $r=3$, $P(A) = 0.7$; $B$: $r=8$, $P(B) = 0.7$) given an observation $X=6$ in the cell below. Which stimulus has most likely caused the observed spikes? How can you explain how close the result is? **(1pt)**

In [None]:
### YOUR CODE HERE ###
data = ...
priorA = ...
priorB = ...
rA = ...
rB = ...

numerator_A = priorA * eval_poiss(rA, data)
numerator_B = priorB * eval_poiss(rB, data)

### YOUR CODE HERE ###
denominator = ...
posterior_A = ...
posterior_B = ...
###

print('posterior A =', posterior_A)
print('posterior B =', posterior_B)

> ***Homework exercise 5:*** The following cell creates a bar plot of the posterior probability over $r$ that would result from each of the observations $X=1$ to $X=10$. What do you observe? Explain it. **(2pt)**

In [None]:
# initialize variables
posteriors_A = np.zeros([1,11])
posteriors_B = np.zeros([1,11])

# compute posterior for each observation 1 - 10 and each hypothesis A and B
for data in range(1,11):
    numerator_A = priorA*eval_poiss(rA,data)
    numerator_B = priorB*eval_poiss(rB,data)
    denominator = numerator_A+numerator_B
    posteriors_A[0,data-1] = numerator_A/denominator
    posteriors_B[0,data-1] = numerator_B/denominator

# plot 
fig, ax = plt.subplots()
plt.bar(np.arange(1,12,1),posteriors_A[0,:])
plt.bar(np.arange(1,12,1),posteriors_B[0,:], width = 0.4)
plt.xlabel('number of spikes')
plt.ylabel('probability')
plt.legend(['stimulus A','stimulus B'])