## Problem 6.1: Modeling and parameter estimation for Boolean data

For the purposes of this problem, assume that we can pool the results from the three years to have 13/126 reversals for wild type, 39/124 reversals for ASH, and 91/124 reversals for AVA.

Our goal is to estimate θ, the probability of reversal for each strain. That is to say, we want to compute g(θ∣n,N), where n is the number of reversals in N trials.

**a)** Develop a generative model (that is, specify the joint distribution π(n,θ∣N)=f(n,∣θ,N)g(θ)) for the observed reversals. Be sure to do prior predictive checks and justify why you chose the model you did. Biological hint: C. elegans have no mode of sensing light at all. So, a wild type worm without and Channelrhodopsin has no means of detecting light. Modeling hint: The Beta distribution is very useful for modeling probabilities of probabilities, like θ in this problem.

**b)** Plot the posterior probability density function for each of the three strains. What can you conclude from this?

In [1]:
import numpy as np
import pandas as pd
import scipy.special
import scipy.stats as st

import bebi103

import altair as alt
import bokeh.io
bokeh.io.output_notebook()

We think that the likelihood will be given by a binomial distribution and the prior will be given by a beta distribution. For the wild-type, the prior should be peaked around 0 and drop off quickly. For the ASH the prior should be peaked somewhere around 0.3, and the AVA should be peaked close to 0.75.

In [2]:
# WT
alpha, beta = 0.2, 20.
theta = np.linspace(0,1)
p = bokeh.plotting.figure(width=300, height=200, 
                          x_axis_label='theta (µm)', 
                          y_axis_label='g(theta)')
p.line(theta, scipy.stats.beta.pdf(theta, alpha, beta), line_width=2)
bokeh.io.show(p)

In [3]:
# ASH
alpha, beta = 4.5, 10.
theta = np.linspace(0,1)
p = bokeh.plotting.figure(width=300, height=200, 
                          x_axis_label='theta (µm)', 
                          y_axis_label='g(theta)')
p.line(theta, scipy.stats.beta.pdf(theta, alpha, beta), line_width=2)
bokeh.io.show(p)

In [4]:
# AVA
alpha, beta = 8., 3.
theta = np.linspace(0,1)
p = bokeh.plotting.figure(width=300, height=200, 
                          x_axis_label='theta (µm)', 
                          y_axis_label='g(theta)')
p.line(theta, scipy.stats.beta.pdf(theta, alpha, beta), line_width=2)
bokeh.io.show(p)

### Wild type

In [36]:
n_ppc_samples = 1000
N = 126

# Draw parameters out of the prior
alpha, beta = 0.25, 15.
theta = np.random.beta(alpha, beta, size=n_ppc_samples)

# Draw data sets out of the likelihood for each set of prior params
ell = np.array([np.random.binomial(N, t, size=500) for t in theta])

In [37]:
p = bebi103.viz.ecdf(ell[0], 
                     x_axis_label='number of reversals', 
                     alpha=0.01, 
                     line_alpha=0)
for ell_vals in ell[9::10]:
    p = bebi103.viz.ecdf(ell_vals, alpha=0.02, p=p, line_alpha=0)

bokeh.io.show(p)


In [38]:
data = np.hstack((np.expand_dims(theta, 1), ell))
columns = ['theta'] + [f'ell[{i+1}]' for i in range(len(ell[0]))]

# Make data frame to match output of Stan
df_ppc = pd.DataFrame(data=data, columns=columns)
df_ppc['warmup'] = 0
df_ppc['chain'] = 0
df_ppc['chain_idx'] = np.arange(1, n_ppc_samples+1)

bokeh.io.show(
    bebi103.viz.predictive_ecdf(df_ppc, 
                                'ell', 
                                x_axis_label='number of reversals'))

opinions...?

## ASH

In [None]:
n_ppc_samples = 1000
N = 126

# Draw parameters out of the prior
alpha, beta = 5.5, 20.
theta = np.random.beta(alpha, beta, size=n_ppc_samples)

# Draw data sets out of the likelihood for each set of prior params
ell = np.array([np.random.binomial(N, t, size=500) for t in theta])

p = bebi103.viz.ecdf(ell[0], 
                     x_axis_label='number of reversals', 
                     alpha=0.01, 
                     line_alpha=0)
for ell_vals in ell[9::10]:
    p = bebi103.viz.ecdf(ell_vals, alpha=0.02, p=p, line_alpha=0)

bokeh.io.show(p)

In [None]:
data = np.hstack((np.expand_dims(theta, 1), ell))
columns = ['theta'] + [f'ell[{i+1}]' for i in range(len(ell[0]))]

# Make data frame to match output of Stan
df_ppc = pd.DataFrame(data=data, columns=columns)
df_ppc['warmup'] = 0
df_ppc['chain'] = 0
df_ppc['chain_idx'] = np.arange(1, n_ppc_samples+1)

bokeh.io.show(
    bebi103.viz.predictive_ecdf(df_ppc, 
                                'ell', 
                                x_axis_label='number of reversals'))

## AVA

In [None]:
n_ppc_samples = 1000
N = 126

# Draw parameters out of the prior
alpha, beta = 8., 3.
theta = np.random.beta(alpha, beta, size=n_ppc_samples)

# Draw data sets out of the likelihood for each set of prior params
ell = np.array([np.random.binomial(N, t, size=500) for t in theta])

p = bebi103.viz.ecdf(ell[0], 
                     x_axis_label='number of reversals', 
                     alpha=0.01, 
                     line_alpha=0)
for ell_vals in ell[9::10]:
    p = bebi103.viz.ecdf(ell_vals, alpha=0.02, p=p, line_alpha=0)

bokeh.io.show(p)

This looks a bit broad? but looks mostly ok

In [None]:
data = np.hstack((np.expand_dims(theta, 1), ell))
columns = ['theta'] + [f'ell[{i+1}]' for i in range(len(ell[0]))]

# Make data frame to match output of Stan
df_ppc = pd.DataFrame(data=data, columns=columns)
df_ppc['warmup'] = 0
df_ppc['chain'] = 0
df_ppc['chain_idx'] = np.arange(1, n_ppc_samples+1)

bokeh.io.show(
    bebi103.viz.predictive_ecdf(df_ppc, 
                                'ell', 
                                x_axis_label='number of reversals'))