# Notations

## Bayesian Inference

Bayesian inference is a method of statistical inference in which we update our beliefs about a parameter as new data becomes available. The key components are:

- **Prior**: The initial belief or probability distribution about the parameter $\mu$ before observing data.
- **Likelihood**: The probability of observing the data given a particular value of the parameter $\mu$.
- **Posterior**: The updated belief about the parameter $\mu$ after incorporating the observed data.

Bayes' theorem provides the foundation for Bayesian inference:

$$
P(\mu|D) = \frac{P(D|\mu) P(\mu)}{P(D)}
$$

where:
- $ P(\mu|D) $ is the **posterior** distribution of the parameter $ \mu $ given the data $ D $.
- $ P(D|\mu) $ is the **likelihood** of the data given the parameter.
- $ P(\mu) $ is the **prior** distribution of $ \mu $.
- $ P(D) $ is the **marginal likelihood**, which normalizes the posterior distribution.

By ignoring the denominator (which does not depend on $\mu$), we can express this relationship as:

$$
P(\mu|D) \propto P(D|\mu) P(\mu)
$$


**(double click here for the version for multiple samples)**
<!-- ## Normal Mean Model

Let’s now consider the normal mean model where we assume data points $z_1, z_2, \dots, z_n$ are drawn from a normal distribution with unknown mean $\mu$ and known variance $\sigma^2$:

$$
x_i \sim N(\mu, \sigma^2)
$$

In this case:

1. **Prior**: The prior distribution for the mean $\mu$ is often assumed to be normal with mean $\mu_0$ and variance $\sigma_0^2$, and let's work with the precision $\tau=\frac{1}{\sigma^2}$ instead of the variance $\sigma^2$.

$$
P(\mu) = \frac{1}{\sqrt{2 \pi \sigma_0^2}} \exp\left(-\frac{(\mu - \mu_0)^2}{2 \sigma_0^2}\right)\\
% \propto \exp\left(−0.5\tau_0\mu^2+\tau_0\mu_0\mu\right) \\
\propto \exp\left(−0.5\tau_0(\mu-\mu_0)^2\right)
$$

In other words, the prior of $\mu \sim N(\mu_0, \frac{1}{\sigma_0^2})$ (i.e., $\sim N(\mu_0, \tau_0$).
(Here the 0 subscript is being used to indicate that $\mu_0$, $\sigma_0$ are parameters in the prior.)


2. **Likelihood**: The likelihood of observing the data given $\mu$ is:

$$
P(D|\mu) = \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left(-\frac{(x_i - \mu)^2}{2 \sigma^2}\right) \\
\propto \prod_{i=1}^{n}\exp\left(-\frac{(x_i - \mu)^2}{2 \sigma^2}\right) \\
% \propto \prod_{i=1}^{n}\exp\left(−0.5\tau x_i^2+\tau x_i\mu\right) \\
\propto \prod_{i=1}^{n} \exp\left(−0.5 \tau (x_i - \mu)^2\right)
$$

3. **Posterior**: The posterior distribution of $\mu$ given the data $D = \{x_1, x_2, \dots, x_n\}$ combines the prior and likelihood using Bayes' theorem. The formula for the posterior is proportional to:

$$
P(\mu|D) \propto P(D|\mu) P(\mu) \\
\propto \prod_{i=1}^{n} \exp\left(−0.5 \tau (x_i - \mu)^2\right) \times \exp\left(−0.5 \tau_0 (\mu - \mu_0)^2\right)\\
\propto \exp\left[−0.5 \tau \sum_{i=1}^{n} (x_i - \mu)^2 − 0.5 \tau_0 (\mu - \mu_0)^2\right]\\
\propto \exp\left[−0.5 (\tau + \tau_0) \mu^2 + \left( \sum_{i=1}^{n} x_i \tau + \mu_0 \tau_0 \right) \mu \right]
$$

From the result in “Preliminaries” above we see that
$$
\mu | X \sim N(\mu_1, \frac{1}{\tau_1})
$$
where:
- $\tau_1 = \tau + \tau_0$ is the posterior precision (i.e., $\sigma_1^2 = \frac{1}{\frac{1}{\sigma_1^2} + \frac{n}{\sigma^2}}$ is the posterior variance.)
- $\mu_1 = \frac{\sum_{i=1}^{n} x_i \tau + \mu_0 \tau_0}{\tau + \tau_0}$ is the posterior mean. -->