# <font color="darkblue"> Bayesian Models_Closed Form

## <font color="darkgreen">Introduction

Let us begin with a data model represented by a probability distribution $f(\mathbf{X}|θ)$
We prefer likelihood notion $\mathscr{L}(\theta|\mathbf{X})=\prod_{i=1}^n f(x_i|\theta)$ is known but our interest is to know about the random unknown $\theta$. The Main aspect of Bayesian procedure is considering $\theta$ as a random variable in a model. The main objective of building a model is to know / infer about the unknown parameter

Bayesian appraoch assumes distribution for parameters, it has to work with two distributions prior and posterior for θ; before and after having data. Bayes Theorem provides a simplistic way to move from prior to posterior; nevertheless, Bayesian methods and Bayes Theorem are not necessarily same. A nice treatment of this distinction can be found in many texts such as **Statistical Rethinking: A Bayesian Course with Examples in R and Stan Richard McElreath**

This notes explains three basic Bayesian model that follow the strucutre of Bayes Theorem. It may be helpful if we recall few terms for better understanding

### <font color="darkred"> Keywords

1. Abstraction of a situation - Random variable
1. Distributions for random variables
1. Parameters
1. Summaries from a distribution
1. Probability computations about a random variable
1. Likelihood function

### <font color="darkred"> Conventional Symbols

$\theta:$ Parameter

$f(\mathbf{X}|\theta):$  Probability density/mass function

$\mathscr{L}(\theta|\mathbf{X}):$  Likelihood function

$l(\theta|\mathbf{X}) = ln[\mathscr{L}(\theta|\mathbf{X})]:$ Log Likelihood function (useful in a technical context; for quick reading, please ignore)

### <font color="darkred"> Three Basic Models

In most of the data modeling, three types of data are predominent; Qualitative, Quantitative and Count. First one is usually represented by categorical variable and the second one is by a metric/numeric variable. Here, we consider fundamental distributions for these three cases.

**Binary categorical variable:** Parameter is proportion of success and data is the number of times success event happened out of some fixed number of trials

**Metric (Numeric) variable:** A quantitative measure in nature which may be summarized by relevent quantities such as mean, percentiles etc. These summary measures may be considered as parameter of interest

**Count variable:** A variable that invloves counting a quantity; that may be vary sparsely occuring or it may not be possible to predict the interval between to consective occurance of that event

When there is a data set related to a study that has unknown parameters, the whole idea in building a model is to know about the parameter through its posterior distribution . In case we know the mathematical form of the posterior distribution, then summaries or probabilistic behaviour of the parameter is very direct. One advantage of knowing the mathematical form is to use the closed form formula to find mean or variance etc.

### <font color="darkred"> Binomial Distribution

To begin with, we consider a one dimensional binary categorical data with two levels. here, the quantity of interest could be proportion of these two levels in the population. If we have a sample of a fixed size of data (size, $n$) then $\theta$ could be proportion of success event (related to one level) and $1−\theta$ is proportion of failure (another level of the binary variable).

Such problems are modelled as Bernoulli trials; that is, if $\mathbf{X}$  is the number of successes out of n Bernoulli trials, then we have 

**Data model:**

$$\mathbf{X} \sim \text{Binomial}(n,\theta)$$ with the range of $\mathbf{X}$ is $\mathscr{A}_X=\{0,1,2,\cdots,n\}$ and the parameter space is $0<\theta<1$

Since $\theta$ is a continuous random variable ranges between $0$ and $1$, we choose Beta distribution as prior for $\theta$.

**Prior distribution:**

$$\theta\,\sim \text{Beta}(a,b)$$ where $a,b>0$

**Posterior distribution:**

$$\theta|\mathbf{x} \,\sim \text{Beta}(a+x,n−x+b)$$ a Beta distribution with parameters $a+x$
and $n−x+b$

Once the posterior is known in mathematically tractable form, then we can find its (selected) summaries using respective expressions.

Mean of the posterior distribution is given by

$$E[\theta|\mathbf{X}]=\frac{a+x}{n+a+b}$$

More about this mean

$$E[\theta|\mathbf{X}]=\frac{x+a}{n+a+b}$$

$$=\frac{x}{n+a+b}+\frac{a}{n+a+b}$$

$$=\frac{n}{n+a+b}\frac{x}{n}+\frac{a+b}{n+a+b}\frac{a}{a+b}$$

$$=[\frac{n}{n+a+b}]\frac{x}{n}+[1-\frac{n}{n+a+b}]\frac{a}{a+b}$$

$$=\alpha \frac{x}{n}+(1−\alpha)\frac{a}{a+b}$$  where $0 ≤ \alpha ≤ 1$

Hence, posterior mean is a **weighted Arithmetic Mean** of prior mean $\frac{a}{a+b}$ and sample mean $\frac{x}{n}$ with weights $\alpha=\frac{n}{n+a+b}$ and $1−\alpha$. In the absence of no successes $x = 0$ then posterior mean equals prior mean

**Other summaries:**

Posterior Variance is 

$$V[\theta|\mathbf{X}]=\frac{(x+a)(n−x+b)}{(n+a+b)^2(n+a+b+1)}$$

Posterior Mode (MAP) 

$$\frac{x+a-1}{n+a+b-2}$$

we should ensure to get a positive value in the formula for mode ; this constraints the choice of the prior parameters a and b

### <font color="darkred">Normal Distribution

Assume a measurable variable and the quantity of interest (parameter) is average of the variable. This may be modelled with a Gaussian (Normal) distribution

**Data model:**

$$\mathbf{X} \sim \text{Normal}(\mu,\sigma^2)$$ with the range of $\mathbf{X}$ is $\mathscr{A}_X=(-\infty,\,\infty)$ and the parameter space is $-\infty<\mu<\infty$. 

Assumption of this model: $\sigma^2$ is known 

The parameter $\mu$ is continuous random variable, ranges between $-\infty$ and $\infty$. We choose Normal distribution as a prior for $\mu$.

**Prior distribution:**

Normal distribution with parameters $\delta,\tau^2$

$$\mu\sim \,\text{Normal}(\eta,\tau^2)$$ where $-\infty <\eta < \infty$ and $\tau^2>0$

**Posterior distribution:**

$$\mu|\mathbf{X} \sim\,\text{Normal}(\frac{B}{A},\frac{1}{A})$$ where $A=\frac{n}{\sigma^2}+\frac{1}{\tau^2}; \,B=(\frac{n\bar{x}}{\sigma^2}+\frac{\eta}{\tau^2})$

Posterior Mean:

$$E(\mu|\mathbf{X})=\frac{B}{A}=\frac{\frac{1}{\frac{\sigma^2}{n}}\bar{x}+\frac{1}{\tau^2}\eta}{\frac{1}{\frac{\sigma^2}{n}}+\frac{1}{\tau^2}}$$

$$=\frac{w_1}{w_1+w_2}\bar{x}+\frac{w_2}{w_1+w_2}\eta$$

That is posterior mean is a weighted mean of data precision $(w_1=\frac{1}{\frac{\sigma^2}{n}})$  and prior precision $(w_2=\frac{1}{\tau^2})$

Posterior Variance:

$$V(\mu|\mathbf{X})=\frac{1}{A}=\frac{1}{\frac{\sigma^2}{n}+\frac{1}{\tau^2}}$$

Equivalently, posterior precision is the sum of prior precision and data precision

$\frac{1}{\frac{\sigma^2}{n}}+\frac{1}{\tau^2}$

### <font color="darkred">Poisson Distribution

Consider an iid sample  $x_1,x_2,\cdots\cdots\cdots,x_n$ of a count variable $X$ that takes only non-negative integers $\{0,1,2,3,\cdots,\cdots\}$ Main interest is to infer about mean count; hence data is realized as count in an instant of the random variable X and quantity of interest (parameter) is to estimate average count $\theta$. We shall model using Poisson distribution, one of the distributions for count data modeling.

**Data Model:**

$$\mathbf{X} \sim \text{Poisson}(\theta)$$ with the range of $\mathbf{X}$ is $\mathscr{A}_X=\{0,1,2,\cdots\cdots\cdots\}$ and the parameter space is $\theta>0$. 

The parameter $\theta$ is continuous random variable, its range is $(0,\infty)$. We choose Gamma distribution as a prior for $\theta$.

**Prior distribution:**

$$\theta\sim\text{Gamma}(\alpha,\beta)$$ where $\alpha>0$ is the shape parameter, $\beta>0$ is the scale parameter 

**Posterior distribution:**

$$\theta|\mathbf{X}\sim\text{Gamma}(a, b)$$ where $a = \sum_{i=1}^n x_i+\alpha$ is the shape parameter and $b=\frac{1}{n+\frac{1}{\beta}}$ is the scale parameter 

Posterior mean:

$E[\theta|\mathbf{X}]= \frac{\sum_{i=1}^n x_i+\alpha} {n+\frac{1}{\beta}}$

$= \frac{n}{n+\frac{1}{\beta}}\bar{x}+\frac{\frac{1}{\beta}}{n+\frac{1}{\beta}}\alpha\beta$


Hence, posterior mean is weighted mean of sample mean (data) and prior mean with weights are sample size and prior rate.

## <font color="darkgreen"> Probability Computations

If $X$ is a random variable, it is customary to calculate probabilities involving $X$ or functions of $X$, in addition to standard summaries such as the mean and variance. In the Bayesian context, since a parameter $\theta$ is treated as a random variable, it is possible to compute probabilities about $\theta$ using its prior or posterior distribution.

Probabilities involving a discrete / continuous random variable (using the PDF / PMF) are computed in a different notion

1. In the case of a DRV, PMF itself is the probability of the RV at a given value 

2. If the RV is a CRV, then probability at an exact point is zero. It is calculated as **Area Under the Probability curve between two values of the RV**

- Assume X is a 1-D CRV in the range $(L, H)$ with pdf $f(x|\theta)$; let $a$ and $b$ be two numbers in $(L, H)$ then we can compute 

  - $Pr[X>a]=\int_a^H f(x|\theta)dx$

  - $Pr[X<b]=\int_L^b f(x|\theta)dx$

  - $Pr[a<X<b]=\int_a^b f(x|\theta)dx$

For example, one might be interested in the probability that $\theta$, a proportion parameter in a Binomial model, exceeds some desired value, say $\theta_0$


$$Pr[(\theta|X)>\theta_0]=\int_{\theta_0}^1 \pi(\theta|X) d\theta$$

Since the posterior $\pi(\theta|X)$ of $\theta$ is a beta distribution, 

$$= \int_{\theta_0}^1 \frac{\theta^{a+x-1}(1-\theta)^{n-x+b-1}}{\beta(a+x,n-x+b)} ~d\theta$$ a,x,n,b are known

This is **Incomplete Beta Function**  This can be referred at https://mathworld.wolfram.com/IncompleteBetaFunction.html, https://en.wikipedia.org/wiki/Beta_function etc

Similarly, the goal may be to find the **Probability** between two values of \(\theta\) using the posterior distribution

$\theta$ lies between chosen two values $\theta_1~ \&~ \theta_2$. That is to find

$$Pr[\theta_1 < (\theta|X) < \theta_2]$$

In Bayesian theory, the parameter is treated as a random variable, allowing for **Interval Estimation** to be defined directly in terms of the parameter. The preceding computation provides a method for finding **Credible Intervals**."


That is for a pre-specified (desired) probability $\alpha$ we need to find two values of $a~ \& ~b$ such that $Pr[a < \theta < b]=\alpha$

One way is to use Inverse-CDF to find the $a ~ \& ~ b$; that is,

1. Find $\frac{\alpha}{2}$

1. $1-\frac{\alpha}{2}$

1. Use Inverse-CDF with these two values to find $a ~ \& ~ b$ using

$$\int_{-\infty}^a\pi(\theta|\mathbf{X})\, d\theta=\frac{\alpha}{2}$$

$$\int_b^{\infty}\pi(\theta|\mathbf{X})\, d\theta=1-(\frac{\alpha}{2})$$

assuming the paramete $\theta$ is a 1-D continuous random variable, limits can be adjusted according to the the parameter's space $\mathscr{A}_{\theta}$ 

### <font color="darkviolet"> End Notes

It can be observed that computations are necessary in all probability calculations involving the posterior (and, in some cases, prior) distribution of the parameter. These calculations may arise in summaries such as the median or mode, among others. Most of them do not have closed-form solutions. This highlights the need to study suitable numerical methods as part of Bayesian modeling.

### <font color="darkblue">Final Remarks

An attempt is made to exemplify Bayesian modelling through three basic models that have closed form posterior distribution when conjugate prior is used. However, articulation of prior distribution is a different learning curve that requires fluent modeling practices. A lot of examples available in standard texts and in many research papers. Enthusiastic Bayesian practitioner could explore this in their own Bayesin modeling. Nevertheless, following keywords are primarily important in Bayesian modeling

1. Prior and its variants
1. Bayes formula
1. Posterior
1. Bayesian Estimator
1. Bayesian Estimate

Also to note that always it may not be possible to have a tractable posterior. Even one such exists, summaries may not be in closed form. Hence we need to seek suitable alternatives