# <font color="darkblue"> Binomial Beta Model

A Bernoulli trial is a random experiment with exactly two possible outcomes: typically "success" (with probability $\theta$) and "failure" 
(with probability $1-\theta$). Each trial is independent of others.

**Examples**

1. Quality Control in Manufacturing: Each light bulb tested is either **defective (failure) or non-defective (success)**.

1. Customer Purchase Decision: A customer either makes a **purchase (success) or does not (failure)** during a store visit.

1. Medical Test Result: A medical test result is either **positive (success) or negative (failure)**, indicating the presence of a condition.

---

Consider $n$ independent and identically distributed (**i.i.d.**) Bernoulli random variables $Y_1, Y_2, \dots, Y_n$

$$
Y_i \sim \text{Bernoulli}(\theta), \quad \theta \in (0,1) \quad  Y_i \in \{0,1\}$$ 

Hence the range of $Y$ is $\mathscr{A}_Y = \{0,1\}$ and that of the parameter $\theta$ is $(0,1)$ or equivalently $0 < \theta < 1$

Each $Y_i$ takes values in with probability mass function:

$$Pr(Y_i = y_i \mid \theta) = \theta^{y_i} (1 - \theta)^{1 - y_i}$$

When you perform $n$ independent Bernoulli trials, the **sum of successes** across these trials follows a Binomial distribution. 
Specifically, if $X= \sum_{i=1}^{n} y_i$  represents the number of successes, then $X$ follows a Binomial distribution 

Then the Probability Mass Funtion (PMF) of $X|\theta$ is 

$$Pr[X = x]=\binom{n}{x} \theta^k (1-\theta)^{n-x}$$ where $\quad \theta \in (0,1) \quad  X \in \{0,1,2\cdots \cdots, n\}$ or $\mathscr{A}_X = \{0,1,2\cdots \cdots, n\}$


The aim of statistical inference is to make conclusions about a population based on a sample of data. Given a sample, it aims to estimate unknown parameters (such as the population mean or proportion) and assess the uncertainty around these estimates. Statistical inference uses methods like point estimation, confidence intervals, and hypothesis testing to draw valid conclusions and make decisions based on the data.

---

In the context of a binomial proportion, Bayesian statistical inference aims to estimate the proportion of successes in a population, given a sample. Using Bayes' theorem, prior beliefs about the proportion are updated with observed data to produce a posterior distribution for the proportion, capturing uncertainty and incorporating prior knowledge into the estimate.

The **Beta distribution** is a natural choice for modeling the **binomial proportion** in a Bayesian context because it is a **conjugate prior** for the binomial likelihood. This means that when you update a Beta prior with binomial data, the resulting posterior distribution is also a Beta distribution 

Additionally, the Beta distribution is flexible and defined on the interval [0, 1], which aligns perfectly with the possible values of a proportion. The shape of the Beta distribution can be easily adjusted using its parameters (α and β) to reflect different prior beliefs about the proportion 

---

## <font color="darkgreen"> **Likelihood Function**

Given $n$ observations, the likelihood function of $\theta$  is $$\mathscr{L}(\theta)=\binom{n}{x} \theta^k (1-\theta)^{n-x}$$

## <font color="darkblue"> **Beta Prior for $\theta$**

Assume a **Beta prior** distribution for the parameter $\theta$:

$$
\theta \sim \text{Beta}(a, b)
$$

The Probability Density Function (PDF) of the Beta distribution is:

$$
Pr(\theta) = \frac{\theta^{a - 1} (1 - \theta)^{b - 1}}{B(a, b)}
$$

where $B(a, b)$ is the **Beta function**:

$$
B(a, b) = \int_0^1 \theta^{a - 1} (1 - \theta)^{b - 1} \, d\theta
$$

---


## <font color="darkblue"> **Posterior Distribution via Bayes' Theorem**

Applying Bayes' theorem:

$$
Pr(\theta \mid \mathbf{X}) = \frac{\mathscr{L}(\theta) \, Pr(\theta)}{m(\mathbf{X})}$$ where $m(\mathbf{X})$ is called the marginal likelihood given by 

$$
m(\mathbf{X}) = \int_0^1 L(\theta) \, p(\theta) \, d\theta = \int_0^1 \frac{\theta^{x + a - 1} (1 - \theta)^{n - x + b - 1}}{B(a, b)} \, d\theta
$$

Now, The **numerator** is:

$$
\mathscr{L}(\theta) \, Pr(\theta) = \left[ \theta^x (1 - \theta)^{n - x} \right] \left[ \frac{\theta^{a - 1} (1 - \theta)^{b - 1}}{B(a, b)} \right]
$$

Simplifying:

$$
\mathscr{L}(\theta) \, Pr(\theta) = \frac{\theta^{x + a - 1} (1 - \theta)^{n - x + b - 1}}{B(a, b)}
$$

Whereas the integral in the **denominator** 

$$
m(\mathbf{X}) = \int_0^1 L(\theta) \, p(\theta) \, d\theta = \int_0^1 \frac{\theta^{x + a - 1} (1 - \theta)^{n - x + b - 1}}{B(a, b)} \, d\theta
$$ can be recognized as as the Beta function $B(x + a, n - x + b)$

$$
m(\mathbf{X}) = \frac{B(x + a, n - x + b)}{B(a, b)}
$$

Thus, the **posterior distribution** is:

$$
Pr(\theta \mid \mathbf{X}) = \frac{L(\theta) \, p(\theta)}{p(\mathbf{X})} = \frac{\theta^{x + a - 1} (1 - \theta)^{n - x + b - 1}}{B(x + a, n - x + b)}
$$

This is the PDF of a **Beta distribution**:

$$
\theta \mid \mathbf{X} \sim \text{Beta}(x + a, n - x + b)
$$

### <font color="darkblue"> **Final Notes**

- The **posterior distribution** is a Beta distribution.
  
- The **updated parameters** are:
  - **Posterior shape parameter**: $a' = x + a$
  - **Posterior rate parameter**: $b' = n - x + b$
    
- This demonstrates that the **Beta distribution is a conjugate prior** for the Binomial likelihood.

---

## <font color="darkred">**Posterior Mean and Variance**

The mean and variance of the Beta posterior are:

$$
E[\theta \mid \mathbf{X}] = \frac{x + a}{n + a + b}
$$

$$
\text{Var}[\theta \mid \mathbf{X}] = \frac{(x + a)(n - x + b)}{(n + a + b)^2 (n + a + b + 1)}
$$
