# Model 1 - grid approximation

This notebook outlines how to use model 1 as described in the paper Vincent (2015). The paper should drive one's reading, this notebooks is here to supplement one's understanding and to bridge the gap between mathematical description and algorithmic implementation.

The generative capability of the model (Figure 6, left) can be used to both generate example data, but also to generate predictions. We also use the model (Figure 6, right) to conduct inference to update our beliefs about our parameter given some observed data.

![Model1](img/model1.png)
Taken from Figure 6 of Vincent (2015).

### Proportion correct (Equation 2 in the paper)
From the paper, the proportion of correct responses is given as 

$PC = \Phi(\frac{\Delta_mu_c}{\sqrt{2\sigma^2}})$

and we can easily define this as some Julia functions

In [1]:
using StatsFuns

# define Φ() as the cumulative normal distribution function
Φ(x) = normcdf.(0, 1, x)

# define the deterministic node PC, corresponds to Equation 2 in Vincent (2015)
PC(Δμ, σ²::Number) = Φ(Δμ/sqrt(2σ²));

### Generative model (Equation 3 in the paper)
The number of correct responses $k$ will be Binomially distributed, so the generative model for this task is

$k \sim \mathrm{Binomial}(PC_c,T)$

We can define a function `generative` which will generate random samples from $P(k|T,PC)$. Note that we define two versions of this function depending upon whether the inputs are scalar values or not.

### Define a prior over our parameter (Equation 4 from the paper)

In [2]:
log_prior(σ²) = log(1/1000)

log_prior (generic function with 1 method)

In [3]:
using Distributions

function generative(T::Int64, Δμ, σ²)
    # sample from P(k|T,PC), or equivalently P(k|T,Δμ,σ²)
    k = [rand( Binomial(T,p)) for p ∈ PC(Δμ, σ²)]
end

function generative(T::Int64, Δμ::Number, σ²::Number)
    # sample from P(k|T,PC), or equivalently P(k|T,Δμ,σ²)
    k = rand(Binomial(T,PC(Δμ, σ²)))
end

generative (generic function with 2 methods)

### Define the likelihood and joint distribution (Equation 5 in the paper)
First we define the likelihood (part of Equation 5) however we do this as the sum of log probabilities.

In [4]:
function log_likelihood(σ²::Number, Δμ::Array{Float64,1}, kvals::Array{Int64,1}, T::Integer)
    log_likelihood = sum(binomlogpdf.(T, PC(Δμ, σ²), kvals))
end

log_likelihood (generic function with 1 method)

Then simply define the log posterior as the sum of the log prior and log likelihood. This allows us to compute $P(\sigma^2, \Delta\mu, k, T)$

In [5]:
log_posterior(σ², Δμ, k, T::Integer) = log_prior(σ²) + log_likelihood(σ², Δμ, k, T)

log_posterior (generic function with 1 method)

For plotting the posterior, we'll need to convert back from log density, to a normalised posterior density, so below we define this utility function.

In [6]:
function logpost2post(log_posterior_density::Array{Float64,1})
    posterior_density = exp.(log_posterior_density)
    return posterior_density ./ sum(posterior_density)
end

logpost2post (generic function with 1 method)

## STEP 1: define data
If we wanted to generate example data from this observer we can simple sample from $P(k|T,\Delta\mu,\sigma^2)$ using the `generative` function defined above. The example below shows how you would do this for $T=100$ and a set of $\Delta\mu$ values, with $\sigma^2=1$.

    generative(100, [0.0100, 0.0215, 0.0464, 0.1000, 0.2154, 0.4642, 1.0000, 2.1544, 4.6416, 10.0000], 1)
    
If you try runing this code a number of times, you'll see that we get a different set of possible experimental results $k$ each time we sample from $P(k|T,\Delta\mu,\sigma^2)$.

In [7]:
generative(100, [0.0100, 0.0215, 0.0464, 0.1000, 0.2154, 0.4642, 1.0000, 2.1544, 4.6416, 10.0000], 1)

10-element Array{Int64,1}:
  56
  55
  50
  48
  60
  59
  82
  96
 100
 100

But for this example, we will define our own observed data where:
- `Δμ` = stimulus intensities
- `k` = number of correct responses for each stimulus intensity (could range from 0 to `T`)
- `T` = number of trials per stimulus intensity, 100 in this example

Proportion correct (per stimulus level) is given by `k./T`. We'll define this data in a Dictionary

In [8]:
data = Dict()
data["T"] = 100
data["k"] = [50, 51, 57, 55, 63, 62, 82, 94, 99, 100]
data["Δμ"] = [0.0100, 0.0215, 0.0464, 0.1000, 0.2154, 0.4642, 1.0000, 2.1544, 4.6416, 10.0000];

And we can now visualise this data in a plot (see Figure 8 in the paper)

In [9]:
using Plots

plot_data_space = scatter(log.(data["Δμ"]), data["k"]./data["T"],
    xlabel = "log signal instensity, Δμ", #xscale = :log,
    ylabel = "proportion correct, k/T",
    legend = false)

# TODO: xaxis should be log scale
# TODO: get latex axis labels

## STEP 2: parameter recovery
Evaluate posterior over range of σ2 values. We are using grid approximation here, so we will define a fine grid of many σ2 values.

In [10]:
σ² = linspace(0.5, 3, 10.0^3)

0.5:0.0025025025025025025:3.0

### Do grid approximation
This is really simple, all we are doing is iterating over a vector of σ2 values, evaluating the log posterior for each

In [11]:
function gridApprox(σ²_vector, data)
    σ²_log_posterior_density = [log_posterior(σ², data["Δμ"], data["k"], data["T"]) for σ² ∈ σ²_vector]
end

σ²_log_posterior_density = gridApprox(σ², data)
σ²_posterior_density = logpost2post(σ²_log_posterior_density);

### Plot our posterior over the parameter (see Figure 8 in the paper)

In [12]:
plot_parameter_space = plot(σ², σ²_posterior_density,
    xlabel = "sigma^2",
    ylabel = "posterior density",
    legend=false)
# TODO: get latex axis labels

## STEP 3: Predictive distribution
Now we have our posterior over parameters we can use the generative ability of the model to sample predicted behaviour $k$ given our knowledge of $\sigma^2$. Below, we generate a set of predictions for many interpolated signal intensity levels (`Δμ_interp_vals`) for which we do not have data for. This is a useful demonstration of making and visualising the model's predictions.

In [13]:
Δμ_interp_vals = logspace(-2, 1, 20);

Draw many samples from the posterior distribution of internal variance

In [14]:
using StatsBase # for sampling
nSamples = 10^2
σ²_samples = sample(σ², ProbabilityWeights(σ²_posterior_density), nSamples)

100-element Array{Float64,1}:
 0.735235
 1.06306 
 1.45095 
 1.40841 
 0.882883
 0.917918
 1.02052 
 1.08058 
 1.32332 
 1.48849 
 1.23323 
 1.07808 
 0.94044 
 ⋮       
 1.32082 
 1.1957  
 1.21572 
 1.37838 
 0.945445
 0.867868
 1.06557 
 0.79029 
 0.755255
 1.003   
 1.40591 
 1.1031  

Now we will generate many predicted data points $k$ over each interpolated Δμ values, with σ2 sampled from our posterior 

In [15]:
# do the posterior prediction
predk = [generative(data["T"], Δμ, σ²) for Δμ ∈ Δμ_interp_vals, σ² ∈ σ²_samples]

20×100 Array{Int64,2}:
  49   46   53   49   36   57   56   45  …   42   45   42   45   44   49   53
  58   47   53   45   52   43   49   55      52   57   55   54   49   52   56
  56   46   47   61   40   53   48   51      65   52   42   37   59   50   48
  53   49   50   56   54   48   45   53      62   57   55   57   49   53   49
  52   46   43   48   48   48   50   53      50   53   51   62   48   49   53
  50   55   62   48   48   44   45   52  …   47   54   54   53   44   52   51
  44   50   50   55   47   54   60   48      49   60   52   48   44   53   49
  57   55   47   62   59   61   47   52      50   56   56   53   49   55   51
  54   56   61   50   53   51   56   52      58   57   51   48   47   61   52
  57   66   59   56   52   57   62   63      49   54   52   57   53   58   58
  66   65   54   61   64   65   64   66  …   49   57   66   63   58   58   62
  63   67   66   60   66   64   69   64      65   72   61   71   69   63   62
  80   72   70   70   77   77   80   66  

# ---- CODE BELOW IS IN PROGRESS ----

In [16]:
# # Calculate 95% CI's for each signal level
# #CI = prctile(predk,[5 95]) ./ data.T;

Save

In [17]:
# save('output/m1MAPestimate.mat', 'vMode')
# save(['~/Dropbox/tempModelOutputs/tempModel1run_gridApprox.mat'], '-v7.3')

In [18]:
plot(plot_data_space)

Overall results should look something like this
![Model1](img/model1results.png)
Taken from Figure 8 of Vincent (2015).

## References
Vincent, B. T. (2015). A tutorial on Bayesian models of perception. Journal of Mathematical Psychology, 66, 103–114. http://doi.org/10.1016/j.jmp.2015.02.001