# Week 8: Parameter Estimation

In [None]:
# Loading the libraries
import numpy as np
#import sympy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import scipy.stats as stats
import scipy.optimize as opt
from scipy.integrate import quad

## Day 3: Bayesian Parameter Estimation

We have already used the Maximum Likelihood method to estimate the value of a distribution parameter. To do this, we made an assumption about the shape of the distribution and obtained some data. One thing we did not do is make any assumptions about the parameters themselves. In other words, we were ignorant about the possible range of values that the parameter might *prefer* (except for explicit mathematical limitations).

The Bayesian Parameter Estimation is a process which, as the name suggests, uses the Bayes' theorem to estimate parameters. However, unlike the maximum likelihood approach, now we have to make an assumption about what is the **distribution of the parameter** we estimate, before we even start the estimation. The goal is to combine this information with the data to obtain a new distribution for the values of the parameter *after* we have obtained the data.

## How does it work?
Let $X$ be a random variable with parameter $\theta \in \Theta$ where $\Theta$ is the set of possible values for $\theta$.

First, we assume that $\theta$ has a certain **prior** distribution in $\Theta$. Let's denote it by $p_\Theta (\theta)$

Next, collect some data for $X$. Since $\theta$ is a parameter of this random variable, the **likelihood** that $X=x$ is given through the density we denote as $p_{X\mid\Theta} (x \mid \theta)$

Now comes the key part: how do we incorporate the information from the prior about $\theta$ and the likelihood about $x$ in order to gain a better understanding about the values of $\theta$? In simple words: how does **evidence** (the data, $x$s) influence our understanding of the **assumption** (about $\theta$ given through the prior)?

In a nutshell, the answer lies in the Bayes' theorem. The phrase *evidence influences assumption/prior* is nothing else but the density $p_{\Theta \mid X} (\theta \mid x)$, which is called **posterior** distribution of the values of the parameter $\theta$, after some evidence has been collected through the value(s) $x$. Mathematically, the prior, the likelihood and the posterior are related as:
$$
p_{\Theta \mid X} \left( \theta \mid x \right) = \frac{p_{X \mid \Theta} \left( x \mid \theta \right) \cdot p_\Theta (\theta) }{p_X (x)}
$$
The denominator $p_X (x)$, which in practice may be computationally intensive to calculate, is given by:
* If $\theta$ is a continuous random variable
$$
p_X(x) = \int_{\theta_\min}^{\theta_\max} {p_{X \mid \Theta} \left( x \mid \theta \right) \cdot p_\Theta (\theta)}\, d\theta
$$
* If $\theta$ is a discrete random variable
$$
p_X(x) = \sum_{\text{all } \theta} {p_{X \mid \Theta} \left( x \mid \theta \right) \cdot p_\Theta (\theta)}
$$

Once we construct the posterior distribution we can also construct a point-estimate of the value of the parameter called **Maximum A Posteriori** or **MAP** estimate as the mean of the posterior. Further, we can construct confidence intervals to get an interval estimate as well (here bootstrapping is the best option)

## Example 1
It is believed that cross-fertilized plants produce taller offspring than the self-fertilized plants. In order to obtain an estimate on the proportion $\theta$ of cross-fertilized plants that are taller, a researcher observes a random sample of $n=15$ pairs of plants that are exactly the same age. Each pair is grown in the same conditions with some cross-fertilized and the others self-fertilized.

Based on previous experience, the experimenter believes that the following are possible values of $\theta$ and that the prior distribution for each value of $\theta$ given by is $p_\Theta (\theta)$

| $\theta$            | 0.80 | 0.82 | 0.84 | 0.86 | 0.88 | 0.90 |
|--------------------:|------|------|------|------|------|------|
| $p_\Theta (\theta)$ | 0.08 | 0.17 | 0.25 | 0.25 | 0.17 | 0.08 |

From the experiment, it is observed that in 13 of the 15 pairs, cross-fertilized is taller. Find the posterior distribution of $\theta$.

In [None]:
# Given stuff:
t = np.array([0.80, 0.82, 0.84, 0.86, 0.88, 0.90])
p_t = np.array([0.08, 0.17, 0.25, 0.25, 0.17, 0.08])
n = 15
k = 13

# X - number of cross-fertilized plans that are taller in a group of 15
# X ~ B(n, theta)

# Build the dataframe
df = pd.DataFrame(columns=['theta', 'prior', 'lik', 'lik*prior', 'post'])
df['theta'] = t
df['prior'] = p_t
df['lik'] = 
df['lik*prior'] = 
df['post'] = 

display(df)

#Visulaize the prior and the posterior
ticks = [str(t) for t in df['theta']]
plt.figure()
plt.bar(..., label='Prior', alpha=0.5)
plt.bar(..., label='Posterior', alpha=0.5)
plt.legend()
plt.show()

In [None]:
#calculate the prior and the posterior mean
prior_mean = ...
post_mean = ...

print('Prior mean = ', prior_mean)
print('Posterior mean = ', post_mean)

## Example 2
A certain random process produces outcomes according to a Poisson distribution. If $X$ is the random variable that models the outcomes, then $X \sim Po(\lambda)$. The value of the parameter $\lambda$ is thought to be uniformly distributed in the interval $(0, 5]$.
* Upon observation, we obtain an outcome $X = 4$. Construct the posterior distribution for the outcomes.
* Then a second observation is made, $X = 2$. Find the posterior after the two observations

In [None]:
# prior is lambda ~ U(0, 5]), likelihood is X ~ Po(lambda)
def prior(x):
    return ...

# posterior is likelihood * prior
def posterior(lmbd, data):
    numerator = lambda x, data: ...
    denominator = ...
    return ...

In [None]:
# Visualize the prior and the posterior to see how things have changed
xs = np.linspace(0, 5, 100)
ys = ... # prior
zs = ... # posterior

plt.figure()
plt.title('Posterior after the first observation')
plt.plot(xs, ys, label='prior')
plt.plot(xs, zs, label='posterior')
plt.legend()
plt.show()

In [None]:
# Two observations
obs = [4, 2]

# The posterior now is: posterior(x, obs)
ws = ...

plt.figure()
plt.title('Posterior after the first observation')
plt.plot(xs, ys, label='prior')
plt.plot(xs, zs, label='posterior')
plt.plot(xs, ws, label='second posterior')
plt.legend()
plt.show()

## Example 2a
For the second posterior you constructed in **Example 2**, find the maximum a posteriori estimate (**MAP**) $\hat\lambda$ of the parameter $\lambda$, and using bootstrap construct a 95% confidence interval for it.

**Note:** the MAP is the mean/expected value of the posterior distribution for $\lambda$

In [None]:
# Getting the mean of the posterior
def MAP(x, data):
    return 

lmbd_hat = 

print('MAP = ', lmbd_hat)

In [None]:
# Constructing the interval
# Choose a random sample from the posterior
n = 100 # the sample size
sample = np.zeros(n) #the empty sample
i = 0 # the counter

np.random.seed(123)
while i < n:
    # generate a random point in [0, 5] x [0, 0.5]
    

In [None]:
# check visually
plt.figure()
plt.hist(sample, density=True, edgecolor = 'k', label='the sample')
plt.plot(xs, ws, label='the posterior', c='r')
plt.legend()
plt.show()

In [None]:
# constructing the interval by bootstrapping
m = 1000 # of bootstraps
deltas = np.zeros(m)
sample_mean = np.mean(sample)

np.random.seed(999)
for i in range(m):
    

    
# constructing the interval


print(f'The 95% "confidence" interval is ({}, {})')

## Example 3:
In a study done at the *National Institute of Science and Technology* in 1980, asbestos fibers on filters were counted as part of a project to develop measurement standards for asbestos concentration. Asbestos dissolved in water was spread on a filter, and 3-mm diameter punches were taken from the filter and mounted on a transmission electron microscope. An operator counted the number of fibers in each of 23 grid squares; the results are given in the cell below.

Let $X$ be the random variable that counts how many fibers there are in one grid square. Then $X \sim Po(\lambda)$ where $\lambda$ represents the mean number of fibers per square. Assuming that the prior distribution of $\lambda$ is a **Gamma distribution** with parameters $\alpha = 15$ and $\theta = 1.2$, construct the posterior distribution for $\lambda$ given the observed experimental data. Find the MAP estimate $\hat\lambda$ and the 90% bootstrapped confidence interval for $\lambda$.

In [None]:
# The asbestos counts
asbestos = np.array([31, 29, 19, 18, 31, 28,
                    34, 27, 34, 30, 16, 18,
                    26, 27, 27, 18, 24, 22,
                    28, 24, 21, 17, 24])

#constructing the prior
alpha = 15.0
theta = 1.2

def prior(x, alpha, theta):
    return 

In [None]:
# visualize the prior, just for fun
xs = np.linspace(0, 40, 500)
ys = 

plt.figure()
plt.plot(xs, ys, label='prior')
plt.legend()
plt.show()

In [None]:
# Constructing the posterior
def posterior(lmbd, data):
    



In [None]:
#plot both prior and posterior
zs = np.array([posterior(x, asbestos) for x in xs])

plt.figure()
plt.plot(xs, ys, label='prior')
plt.plot(xs, zs, label='posterior')
plt.legend()
plt.show()

In [None]:
# Getting the mean of the posterior
def MAP(x, data):
    return ...

map_est = ...

print('MAP = ', map_est)

In [None]:
# Choose a random sample from the posterior
n = 100 # the sample size
sample = np.zeros(n) #the empty sample
i = 0 # the counter

np.random.seed(123)
while i < n:


In [None]:
# check visually
plt.figure()
plt.hist(sample, density=True, edgecolor = 'k', label='the sample')
plt.plot(xs, zs, label='the posterior', c='r')
plt.legend()
plt.xlim(20, 30)
plt.show()

In [None]:
# constructing the interval
m = 1000 # of bootstraps
deltas = np.zeros(m)
sample_mean = np.mean(sample)

np.random.seed(333)
for i in range(m):


print(f'The 90% "confidence" interval is ({l}, {u})"')