## Problem 6.1: Modeling and parameter estimation for Boolean data

For the purposes of this problem, assume that we can pool the results from the three years to have 13/126 reversals for wild type, 39/124 reversals for ASH, and 91/124 reversals for AVA.

Our goal is to estimate θ, the probability of reversal for each strain. That is to say, we want to compute g(θ∣n,N), where n is the number of reversals in N trials.

Attribution: Maddie did the coding and writing up of this problem, and Zhiyang and Maddie discussed how to choose distributions for the prior and likelihood together.

In [1]:
import numpy as np
import pandas as pd
import scipy.special
import scipy.stats as st

import tqdm

import bebi103

import altair as alt
import bokeh.io
bokeh.io.output_notebook()

**a)** We would like to develop a generative model for the observed reversals of each strain of C. elegans, so we need to specify our likelihood and prior. We think that the likelihood will be given by a binomial distribution and the prior will be given by a beta distribution. Let's plot our priors first.

For the wild-type, the prior should be peaked around 0 and drop off quickly, since the worms have no means of detecting light.

In [2]:
# WT
alpha, beta = 0.2, 8.
theta = np.linspace(0,1)
p = bokeh.plotting.figure(width=300, height=200, 
                          x_axis_label='theta (µm)', 
                          y_axis_label='g(theta)')
p.line(theta, st.beta.pdf(theta, alpha, beta), line_width=2)
bokeh.io.show(p)

For ASH, the prior should be peaked somewhere around 0.3, since the neuron still needs additional input to bring it past the threshold potential to fire. 

In [3]:
# ASH
alpha, beta = 5.5, 18.
theta = np.linspace(0,1)
p = bokeh.plotting.figure(width=300, height=200, 
                          x_axis_label='theta (µm)', 
                          y_axis_label='g(theta)')
p.line(theta, st.beta.pdf(theta, alpha, beta), line_width=2)
bokeh.io.show(p)

The prior for AVA should be peaked close to 0.8 or 0.9 since we can directly stimulate the neuron and cause it to fire and cause reversal.

In [4]:
# AVA
alpha, beta = 8., 3.
theta = np.linspace(0,1)
p = bokeh.plotting.figure(width=300, height=200, 
                          x_axis_label='theta (µm)', 
                          y_axis_label='g(theta)')
p.line(theta, st.beta.pdf(theta, alpha, beta), line_width=2)
bokeh.io.show(p)

Now let's do some prior predictive checks to make sure our parameters are reasonable. We will generate samples out of this generative model starting with the wild-type.

#### WT

In [5]:
n_ppc_samples = 1000
N = 126

# Draw parameters out of the prior
alpha, beta = 0.2, 8.
theta = np.random.beta(alpha, beta, size=n_ppc_samples)

# Draw data sets out of the likelihood for each set of prior params
ell_wt = np.array([np.random.binomial(N, t, size=500) for t in theta])

Now let's make a percentile plot of the ECDFs of the generated data sets and check if they're reasonable.

In [6]:
data = np.hstack((np.expand_dims(theta, 1), ell_wt))
columns = ['theta'] + [f'ell[{i+1}]' for i in range(len(ell_wt[0]))]

# Make data frame to match output of Stan
df_ppc = pd.DataFrame(data=data, columns=columns)
df_ppc['warmup'] = 0
df_ppc['chain'] = 0
df_ppc['chain_idx'] = np.arange(1, n_ppc_samples+1)

# Plot
bokeh.io.show(
    bebi103.viz.predictive_ecdf(df_ppc, 
                                'ell', 
                                x_axis_label='number of reversals'))

This looks reasonable, since most of the number of reversals are around 0, which is what we expect. Occasionally the worm might reverse but this is within a range we can tolerate.

#### ASH

We can also generate datasets using our prior for the ASH strain.

In [7]:
n_ppc_samples = 1000
N = 124

# Draw parameters out of the prior
alpha, beta = 5.5, 18.
theta = np.random.beta(alpha, beta, size=n_ppc_samples)

# Draw data sets out of the likelihood for each set of prior params
ell_ash = np.array([np.random.binomial(N, t, size=500) for t in theta])

# Plot 
# p = bebi103.viz.ecdf(ell_ash[0], 
#                      x_axis_label='number of reversals', 
#                      alpha=0.01, 
#                      line_alpha=0)
# for ell_vals in ell_ash[9::10]:
#     p = bebi103.viz.ecdf(ell_vals, alpha=0.02, p=p, line_alpha=0)

# bokeh.io.show(p)

Now we can plot the predictive ECDF.

In [8]:
data = np.hstack((np.expand_dims(theta, 1), ell_ash))
columns = ['theta'] + [f'ell[{i+1}]' for i in range(len(ell_ash[0]))]

# Make data frame to match output of Stan
df_ppc = pd.DataFrame(data=data, columns=columns)
df_ppc['warmup'] = 0
df_ppc['chain'] = 0
df_ppc['chain_idx'] = np.arange(1, n_ppc_samples+1)

bokeh.io.show(
    bebi103.viz.predictive_ecdf(df_ppc, 
                                'ell', 
                                x_axis_label='number of reversals'))

Looking at the predictive ECDF, the number of reverasls is around 30, which seems reasonable. There's a decently wide spread around 20 to 50 reversals.

#### AVA

Now we can do the same with our AVA strain.

In [9]:
n_ppc_samples = 1000
N = 124

# Draw parameters out of the prior
alpha, beta = 8., 3.
theta = np.random.beta(alpha, beta, size=n_ppc_samples)

# Draw data sets out of the likelihood for each set of prior params
ell_ava = np.array([np.random.binomial(N, t, size=500) for t in theta])

# Make dataframe for plotting
data = np.hstack((np.expand_dims(theta, 1), ell_ava))
columns = ['theta'] + [f'ell[{i+1}]' for i in range(len(ell_ava[0]))]

# Make data frame to match output of Stan
df_ppc = pd.DataFrame(data=data, columns=columns)
df_ppc['warmup'] = 0
df_ppc['chain'] = 0
df_ppc['chain_idx'] = np.arange(1, n_ppc_samples+1)

# Plot
bokeh.io.show(
    bebi103.viz.predictive_ecdf(df_ppc, 
                                'ell', 
                                x_axis_label='number of reversals'))

There's a high number of reversals, which is what we expect. 

**b)** Now we want to plot the posterior probability density function for each of the three strains. 

First we will define a function to calculate the posterior for the number of reversals. We aren't using the log posterior here because of some calculation issues resulting in NaNs.

In [10]:
def post_indep_size(params, ell):
    """Posterior for number of reversals"""
    # Make sure parameters are physical
    if (params < 0).any():
        return -np.inf
    
    # Unpack parameters
    alpha, beta, theta, N = params

    # Prior for theta
    prior = st.beta.pdf(theta, alpha, beta, loc=0)
        
    # Likelihood
    likelihood = np.prod(st.binom.pmf(ell, N, theta))
    
    return prior * likelihood

#### WT

Now let's use this function to compute the posterior for the wild-type strain, using our chosen prior and the data that there were 13 reversals in 126 trials.

In [11]:
# Set up plotting range
theta = np.linspace(0.0, 1.0, 200)

# Set up distribution variables
alpha, beta, N, n = 0.2, 8., 126., 13.

# Compute posterior
POST = np.empty_like(theta)

for i in tqdm.tqdm(range(len(theta))):
    POST[i] = post_indep_size(np.array([alpha, beta, theta[i], N]), n)

  app.launch_new_instance()
100%|██████████| 200/200 [00:00<00:00, 260.84it/s]


Now let's plot the posterior.

In [12]:
p = bokeh.plotting.figure(width=300, height=200, 
                          x_axis_label='theta', 
                          y_axis_label='posterior')
p.line(theta, POST, line_width=2)
bokeh.io.show(p)

The probability of reversal is peaked around 0.1. Even though the worms have no means of detecting light, they did reverse 13/126 times during the experiment. A probability of reversal around 0.1 would account for the random reversals during the experiment.

#### ASH

Let's plot the posterior for ASH, using our prior and knowing that there were 39/124 reversals.

In [13]:
# Set up plotting range
theta = np.linspace(0.0, 1.0, 200)

# Set up distribution variables
alpha, beta, N, n = 5.5, 18., 124., 39

# Compute posterior
POST = np.empty_like(theta)

for i in tqdm.tqdm(range(len(theta))):
    POST[i] = post_indep_size(np.array([alpha, beta, theta[i], N]), n)
        
p = bokeh.plotting.figure(width=300, height=200, 
                          x_axis_label='theta', 
                          y_axis_label='posterior')
p.line(theta, POST, line_width=2)
bokeh.io.show(p)

100%|██████████| 200/200 [00:00<00:00, 212.43it/s]


The probability of reversal is peaked around 0.3, which seems reasonable. Stimulating the ASH sensory neuron, which stimulates the AVA command interneuron, will cause the interneuron to be depolarized. However, there may not always be enough depolarization for AVA to hit threshold and cause an action potential. AVA needs additional simultaneous inputs to reach threshold most of the time, but sometimes stimulating ASH is enough to cause reversal.

#### AVA

Now let's plot the posterior for AVA, using our prior and the data that there were 91/124 reversals.

In [14]:
# Set up plotting range
theta = np.linspace(0.0, 1.0, 200)

# Set up distribution variables
alpha, beta, N, n = 8., 3., 124., 91

# Compute posterior
POST = np.empty_like(theta)

for i in tqdm.tqdm(range(len(theta))):
    POST[i] = post_indep_size(np.array([alpha, beta, theta[i], N]), n)
        
# Normalize
POST = POST / POST.max()

p = bokeh.plotting.figure(width=300, height=200, 
                          x_axis_label='theta', 
                          y_axis_label='posterior')

p.line(theta, POST, line_width=2)
bokeh.io.show(p)

100%|██████████| 200/200 [00:01<00:00, 158.70it/s]


The probability of reversal is peaked near 0.75. It makes sense that the AVA strain has the highest probability of reversal since we can directly stimulate that neuron to cause reversal. The probability of reversal is lower than 1 which means that even when stimulating the command interneuron, the worm doesn't always reverse. There may be some other additional inputs needed or the strength of stimulation is not high enough to cause reversal.

We can see from the data that the probability of reversal is highest for the AVA strain, lower for the ASH strain, and lowest for the WT strain. Thus, shining light on the AVA worm will most likely cause reversal while shining light on a wild-type worm will probably not cause it to reverse.

In [1]:
%load_ext watermark

In [2]:
%watermark -v -p numpy,scipy,bokeh,jupyterlab

CPython 3.7.0
IPython 7.0.1

numpy 1.15.2
scipy 1.1.0
bokeh 0.13.0
jupyterlab 0.35.0
