# Bayesian Inference

In this unit, I will discuss Bayesian posterior inference. I will
explain Bayes rule, and explain how it can be used in data applications.
Then, using data generated from a normal distribution, I will demonstrate
how to use Bayes rule to calculate a distribution of the mean. I will
list major assumptions of the approach at the end. 

Derived from statistical theory, Bayes rule is given by:
\begin{align*}
Pr(A|B)= & \frac{Pr(B|A)Pr(A)}{Pr(B)}
\end{align*}

$Pr(A|B)$ is called the conditional probability of $A$ given $B$.
Suppose two random events $A$ and $B$. The rule states that if event
$B$ happened, I can calculate the probability of $A$ , $Pr(A|B)$
, happening as follows: I multiply probability of $A$ happening given
that $B$ happened, $Pr(B|A)$, with the probability of event $A$
happening in general, $Pr(A)$. I divide the multiplication of both
terms by the probability of $B$ happening in general, $Pr(B)$. Bayes
rule seems abstract at first, but it can be useful if I am interested
in knowing $Pr(A|B)$, and I already have information on $Pr(A|B)$,
$Pr(A)$, and $Pr(B)$. It turns out that this could become the case
for many statistical inference applications. 

Similar to the previous unit, suppose, $X=\left\{ x_{1},x_{2},x_{3},\ldots,x_{15}\right\} $,
is data of $15$ observations generated from a normal distribution
with mean $1$ and variance $1$:
\begin{align*}
X\sim & \boldsymbol{N}(\mu=1,\sigma^{2}=1)
\end{align*}
As a researcher, I am interested in knowing something about the unknown
data generating process. In the previous unit, I was interested in
hypothesis testing. Namely, I was interested in knowing if my mean
is unequal to a particular hypothesized value. Instead of hypothesis
testing, I could be interested in perhaps a more direct approach.
Namely, I am interested in learning about what is $\mu$? Answering
the previous question is generally difficult, and maybe impossible.
Bayes rule; however, allows me to gain valuable insight on the distribution
of $\mu$ given observed data, $X$. The latter is precisely, $Pr(\mu|X)$,
and can be written with Bayes rule as follows:
\begin{align*}
Pr(\mu|X)= & \frac{Pr(X|\mu)Pr(\mu)}{Pr(X)}
\end{align*}
where,

$Pr(\mu|X)$ is the posterior and what Bayes rule will ultimately
answer. What is the distribution of $\mu$ given the drawn data, $X$

$Pr(\mu)$ is the prior. It is the distribution of $\mu$ independent
of any event. Here, we assume $\mu\sim\boldsymbol{N}(\mu_{0}=1,v_{0}^{2}=1)$

$Pr(X|\mu)$ what is known as the log-likelihood function goes here.
What is the probability of obtaining $X$ given a certain mean, $\mu$.
We assume independence between different $X$ values.

$Pr(X)$ what is the probability of obtaining $X$ for all possible
values of $\mu$. Integrated in practice over all values of $\mu$,
the term does not depend on the parameters of the models. It is a
convention to drop it for simplicity. 

\begin{align*}
Pr(\mu|X)\wasypropto & Pr(X|\mu)Pr(\mu)
\end{align*}

For simplicity, it is known that $X$ comes from normal with unknown
$\mu$ but known $\sigma^{2}$. The above more concretely becomes,
\begin{align*}
\xi\left(\mu|X\right)\wasypropto & f_{n}\left(X|\mu\right)\xi\left(\mu\right)
\end{align*}

where,

\begin{align*}
f_{n}\left(X|\mu\right)\wasypropto & \exp\left(\frac{-1}{2\sigma^{2}}\sum_{i=1}^{n}\left(x_{i}-\mu\right)^{2}\right)\\
\\
\xi\left(\mu\right)\wasypropto & \exp\left(-\frac{\left(\mu-\mu_{0}\right)^{2}}{2v_{0}^{2}}\right)
\end{align*}
Given the above, it turns out that,
\begin{align*}
\xi\left(\mu|X\right)\sim & \boldsymbol{N}(\mu_{1},v_{1}^{2})
\end{align*}
where,
\begin{align*}
\mu_{1}= & \frac{\sigma^{2}\mu_{0}+nv_{0}^{2}\bar{x}_{n}}{\sigma^{2}+nv_{0}^{2}}\\
\\
v_{1}^{2}= & \frac{\sigma^{2}v_{0}^{2}}{\sigma^{2}+nv_{0}^{2}}
\end{align*}
This can be proved by multiplying $f_{n}\left(X|\mu\right)$ and $\xi\left(\mu\right)$
then substituting in the result. The specific choice of a normal prior
makes it conjugate to the normal likelihood function. A conjugate
prior ensures a closed solution result with the same family of the
prior, namely normal. In more complex applications, a closed form
solution is not possible. With the theory on-hand, I will go ahead
and demonstrate the example numerically. 
