# Prior Probability


Bayesian statistical methods rely only good choices for priors. Prior probability distributions allow the scientist/analyst to add information that is needed for analysis. However, sometimes the scientist does not have clear insight to the problem to add prior information. 

There are many types of priors, but the above situations create two types of priors:

1. Informative priors

2. Noninformative priors

Informative priors are sometimes calculated using belief of what values should be. Uninformative priors can sometimes be flat, where the scientist does not try to elicit any meaningful information. From noninformative priors, scientsts sometimes create imporoper priors.



## Choosing a Prior

There are three ways to choose a prior

1. Subjectivity

 - Where the scientist subjectively chooses a distribution that elicits that person's personal beliefs.

2. Objective and informative

 - The scientist may have historical information or data that can be used to help formulate a prior.
 
3. Noninformative

 - A noninformative prior is one that expresses ignorance as to the value of $\theta$. Other terms for a noninformative prior are *reference prior*, diffuse prior and vague prior.
 
 - In general, a noninformative prior produces a posterior distribution whose information is dominated by the likelihood function.
 
 - Will produce estimates approximate to those of Maximum Likelihood Estimation



## Conjugate Priors

A prior is said to be a conjugate prior for a family of distributions if the prior and posterior distributions are from the same family, which means that the form of the posterior has the same distributional form as the prior distribution. 

For example, given data $y = (y_1, ..., y_n)^T$, where $y_i \sim Bin(n, \theta)$, the likelihood is binomial. A conjugate prior on  is the beta distribution. It follows that the posterior distribution of  is also a beta distribution. 

Other commonly used conjugate prior/likelihood combinations include the normal/normal, gamma/Poisson, gamma/gamma, and gamma/beta cases. See this [Wikipedia page](https://en.wikipedia.org/wiki/Conjugate_prior#Table_of_conjugate_distributions) for a table of cojugate priors.

## Informative Prior

An informative prior is a prior that is not dominated by the likelihood and that has an impact on the posterior distribution. If a prior distribution dominates the likelihood, it is clearly an informative prior. These types of distributions must be specified with care in actual practice. 


## Noninformative Prior

Roughly speaking, a prior distribution is noninformative if the prior is *flat* relative to the likelihood function. Thus, a prior  is noninformative if it has minimal impact on the posterior distribution of $\theta$.  

Many statisticians favor noninformative priors because they appear to be more objective. 

However, it is unrealistic to expect that noninformative priors represent total ignorance about the parameter of interest. In some cases, noninformative priors can lead to improper posteriors (nonintegrable posterior density). You cannot make inferences with improper posterior distributions. 

A common choice for a noninformative prior is the flat prior, which is a prior distribution that assigns equal likelihood on all possible values of the parameter. 

## Improper Priors

An example of an improper prior for $\theta ~ p(\theta)$ is simply $p(\theta) = 1$

Why is a flat prior like $p(\theta) = c$, where c is a constant, improper?

Because $\int_{-\infty}^{\infty}p(\theta) = \infty$ which violates the property of a probability distribution. The integral of probability distribution should equal to 1.


## Jeffrey's Priors

In 1961, Sir Harold Jeffreys proposed a class of priors for Bayesian probs. Many peple view them as being "noninformative". This class of priors satisfies the local uniformity property: a prior that does not change much over the region in which the likelihood is significant and does not assume large values outside that range. It is based on the [Fisher information matrix](https://arxiv.org/pdf/1705.01064.pdf). Jeffreys’ prior is defined as

$$
p(\theta) \propto \left| I(\theta) \right|^{1/2}
$$

where $| \cdot |$ denotes the determinant and $I(\theta)$ is the Fisher information matrix based on the likelihood function $L(\theta|y)$.


$$
I(\theta)  = - E \left[ \frac{d^2 L(\theta|y)}{d\theta^2} \right]
$$

Reference: 

1. [Informative and noninformative priors](http://andrewgelman.com/2007/07/18/informative_and/)

2. [Prior Distributions](https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_introbayes_sect004.htm)