## Chapter 1: Introduction

Statistical induction is the process of learning about the general characteristics of a population from a subset of members of that population. Numerical values of population characteristics are typically expressed in terms of a parameter $\theta$, and numerical descriptions of the subset make up a dataset $y$. Before a dataset is obtained, the numerical values of both the population characteristics and the dataset are uncertain. After a dataset $y$ is obtained, the information it contains can be used to *decrease* our uncertainty about the population characteristics. *Quantifying this change in uncertainty is the purpose of Bayesian inference.*

The sample space $\mathcal{Y}$ is the set of all possible datasets, from which a single dataset $y$ will result. The parameter space $\Theta$ is the set of possible parameter values, from which we hope to identify the value that best represents the true population characteristics.

*Bayes' rule*: $$p(\theta|y)=\frac{p(y|\theta)p(\theta)}{\int_{\Theta}p(y|\tilde{\theta})p(\tilde{\theta})d\tilde{\theta}}$$

*Sensitivity analysis*: an exploration of how posterior information is affected by differences in prior opinion.

If $\theta$ has a $\text{beta}(a,b)$ distribution, then the expectation
of $\theta$ is $a/(a+b)$ and the most probable value of $\theta$
is $(a-1)/(a-1+b-1)$

## Chapter 2: Belief, probability and exchangeability

In Bayesian inference, ${H_{1},...,H_{K}}$ often refer to disjoint hypotheses or states of nature and $E$ refers to the outcome of a survey, study or experiment. To compare hypotheses post-experimentally, we often calculate the following ratio:
$$\frac{Pr(H_{i}|E)}{Pr(H_{j}|E)}=\frac{Pr(E|H_{i})}{Pr(E|H_{j})}\times\frac{Pr(H_{i})}{Pr(H_{j})}=\text{"Beyes factor"}\times\text{prior beliefs}$$
This calculation reminds us that Bayesâ€™ rule does not determine what our
beliefs should be after seeing the data, it only tells us how they should change
after seeing the data.

Two events $F$ and $G$ are conditionally **independent** given $H$ if $Pr(F\cap G|H)=Pr(F|H)Pr(G|H)$.

In Bayesian inference a **random variable** is defined as an unknown numerical
quantity about which we make probability statements.

The *probability density function (pdf)* of Y for discrete distributions. For continuous random variables, we use a *cumulative distribution function (cdf)*: $F(y)=Pr(Y\leq y)$. Note that $F(\infty)=1,F(-\infty)=0$, and $F(b)<F(a)\:if\:b<a$.
A theorem from mathematics says that for every continuous cdf $F$
there exists a positive function $p(y)$ such that $F(a)=\int_{-\infty}^{a}p(y)dy$.
This function is called the *probability density function* of
Y, and its properties are similar to those of a pdf for a discrete
random variable: $0\leq p(y)$ for all $y$ ; $\int_{y\in\mathbb{R}}p(y)dy=1$.

The **mean** or **expectation** of an unknown quantity
$Y$ is given by $E[Y]=\sum_{y\in\mathcal{Y}}yp(y)$ if $Y$ is discrete;
$E[Y]=\int_{y\in\mathcal{Y}}yp(y)dy$ if $Y$ is continuous.

The most popular measure of spread is the **variance** of a distribution:
$Var[Y]=E[(Y-E[Y])^{2}]=E[Y^{2}]-E[Y]^{2}.$

### Joint distribution
Let $\mathcal{Y}_{1},\mathcal{Y}_{2}$ be two countable sample spaces;
$Y1,Y2$ be two random variables, taking values in $\mathcal{Y}_{1},\mathcal{Y}_{2}$
respectively.

The **joint pdf or joint density** of $Y_{1}$ and $Y_{2}$ is defined as   
$p_{Y_{1}Y_{2}}(y_{1},y_{2})=Pr({Y_{1}=y_{1}}\cap{Y_{2}=y_{2}})$
for $y_{1}\in\mathcal{Y}_{1},y_{2}\in\mathcal{Y}_{2}$.

The **marginal density** of $Y_1$ can be computed from the joint density:
$p_{Y_{1}}(y_{1})=Pr(Y_{1}=y_{1})=\sum_{y_{2}\in\mathcal{Y}_{2}}p_{Y_{1}Y_{2}}(y1,y2)$. Or   
$p_{Y_{1}}(y_{1})=\int_{-\infty}^{\infty}p_{Y_{1}Y_{2}}(y1,y2)dy_{2}$

The **conditional density** of $Y_{2}$ given ${Y_{1}=y_{1}}$ can be
computed from the joint density and the marginal density: $p_{Y_{2}|Y_{1}}(y_{2}|y_{1})=\frac{Pr({Y_{1}=y_{1}}\cap{Y_{2}=y_{2}})}{Pr(Y_{1}=y_{1})}=\frac{p_{Y_{1}Y_{2}}(y_{1},y_{2})}{p_{Y_{1}}(y_{1})}$

ghly speaking, $Y_1, \dots, Y_n$ are **exchangeable** if the subscript labels convey
no information about the outcomes.

$Y_{1},\ldots,Y_{n}|\theta$ are i.i.d. AND $\theta\sim p(\theta)$
$\Leftrightarrow$ $Y_{1},\ldots,Y_{n}$ are exchangeable for all
$n$ .

## Chapter 3: One-parameter models

