# Part 1 Fundamentals of Bayesian Inference

Bayesian inference is the process of fitting a probability model to a set of data and sumarizing the result by a probability distribution on the parameters of the model and on unobserved quantities such as predictions for new observations. In chapters 1-3, we introduce several useful families of models and illustrate their application in the analysis of relatively simple data structures. Some mathematics arises in the analytical manipulation of the probability distributions, notably in transformation and integration in multiparameter problems. We differ somewhat from other introductions to bayesian inference by emphasizing stochastic simulation, and the combination of mathematical analysis and simulation, as general methods for summarizing distributions. chapter 4 outlines the fundamental connections between bayesian and other approaches to statistical inference. The early chapters focus on simple example to develop the basic ideas of bayesian inference; examples in which the Bayesian approach makes a practical difference relative to more traditional approaches begin to appear in chapter 3. The major practical advantages of the Bayesian approach appear in chapter 5, where we introduce hierarchical models, which allow the parameters of a prior, or pupulation, distribution themselves to be estimated from data.

## 1.1 The three steps of bayesian data analysis

The process of Bayesian data analysis can be idealized by dividing it into the following three steps:

1. setting up a full probability model- a joint probability distribution for all observable and unobservable quantities in a problem. The model should be consistent with knowledge about the underlying scientific problem and the data collection process.

2. Conditioning on observed data: calculating and interpreting the appropriate posterior distribution- the conditional probability distribution of the unobserved quantities of ultimate interest, given the observed data

3. Evaluating the fir of the model and the implications of the resulting posterior distribution: how well does the model fir the data, are the substative conclusions reasonable, and how sensitive are the results to the modeling assumptions in step 1? in response one can alter or expand the model and repeat the three steps

A primary motivation for Bayesian thinking is that it facilitates a common-sense interpretation of statistical conclusions. For instance, a Bayesian probability interval for an unknown quantity of interest can be directly regarded as having a high probablity of containing the unkown quantity, in contrast to a frequentist interval, which may strictly be interpreted only in relation to a sequence of similar inference that might be made in repeated practice. 

## 1.2 General notation for statistical inference

Statistical inference in concerned with drawing conclusions, from numerical data, about quantities that are not observed. For example, a clinical trial of a new cancer drug might be designed to compare the five-year survival probability in a population given the new drug that in a population under standard treatment. These survival probabilities refer to a large population of patients and it is neither feasible nor ethically acceptable to experiment on an entire population. therefor inference about the true probabilities and, in particular, their differences must be based on a sample of patients.

We distinguish between two kinds of estimands: unobserved quentatites for which statistical inference are made_ first, potentially obervable quantities, such as future observation of a process, or the outcome under the treatment not recieved in the clinical trial example; and second quantities that re not directly observable, that is, parameters that govern the hypothetical process leading to the observed data. the distinction between these two kinds of estimands is not always precise, but is generally useful as a way of understanding how a statistical model for a particular problem first into the real world.

<b>Parameters, data, and predictions</b>

As general notation, we let θ denote unobservable vector quantities or population parameters of interest (such as the probabilities of survival under each treatment for randomly chosen members of the population in the example of the clinical trial), y denote the observed data (such as the numbers of surviros and deaths in each treatment group), and ˜y denote unkown, but potentially obervable quantities (such as the outcomes of the patients under the other treatment, or the outcome under each of the treatments for a new patient similar to those already in the trial). Generally use greek letters for parameters, lower case roman letters for observed or observable scalars and vectors, and uper case roman letters for observer or observable matrices.

<b>Observational units variables</b>

In many studies data are gathered on each of a set of N objects or units, and we can write the data s a vector, y(y1, ..., yn).

<b>Exhangeability</b>

The usual starting point of a statistical analysis is the assumption that the n values yi may be regarded as exhangeable, meaning that we express uncertainty as joint probability density that is invariant to permutations of the indexes. A nonexchangeable model would be approariate if information relevant to the outcome were conveyed in the unit indexes rather than by explanatory variables. The idea of exhangeability is fundamental to statistics, and we return to it repeatedly throughtout the book.

<b>Explanatory variables</b>

It is common to have obervations on each unit that we do not bother to model as random such as age and previous health status. We call this second class of variables explanatory variables, or covariates, and label them x. we use X to denote the entire set of explanatory variables. will be fully explained in detail in chapter 8

<b>Hierarchical modeling</b>

Hierarchical models are used when information is available on several different levels of observational units. will be discussed in chapter 5 and subsequent chapters. It is possible to speak of exhangability at each level of units.

## 1.3 Bayesian inference

Bayesian statistical conclusions about parameter θ, are made in terms of probability statements. these probability statementsare conditional on the observed value of y, and in our notation are written sumple as p(θ|y). We also implicitly condition on the known values of any covariates, x. It is at the fundamental level of conditioning on oberved dat that bayesian inference departs from the approach to statistical inference described in many textbooks, which is based on a retrospective evaluation of the precedure used to estimate θ over the distribution of possible y values conditional on the true unkown value of θ.

<b>Probability notation</b>

1. p(.|.) denoted a conditional probability density with the arguments determined by the context
2. p(.) denotes a marginal distribution

<b>Bayes'Rule</b>

In order to make probability statements about θ giveny, we must begin with a model providing a joint probability distribution for θ and y. the joint probability mass or density function can be written as a product of two densities that are often referred to as the prior distribution p(θ) and the sampling distribution p(y|θ) 

Simply conditioning on the unknown value of the data y using the propery of conditional probability known as Bayes'rule yields the posterior density:

p(θ|y) = p(θ)p(y|θ) / p(y)

where p(y) = the sum of p(θ)p(y|θ), and the sum over all possible values of θ. In case we deal with continuous data it is the integral instead of the sum.

<b>prediction</b>

To make inferences about an unkown observable, often called predictive inferences, we follow a similar logic. Before the data y are considered, the distribution of the unkown but obserbale y is

p(y) = Zp(y, θ)dθ =Zp(θ)p(y|θ)dθ

where Z is used as an integral.

This is often called the marginal dsitribution of y, but a more informative name is the prior predictive distribution: prior because it is not conditional on a previous observation of the process, and predictive because it is the distribution ofr a quantity that is observable, from the same process.

<b>likelihood</b>

Using Bayes'rule with a chosen probability model means that the data y affect the posterior inference only throught p(y|θ) which, when regarded as a function of θ, for fixed y, is called the likelihood function. In this way bayesian inference is obeying what is sometimes called likelihood principle.

<b>Likelihood and odds ratios</b>

The ratio of the posterior density p(θ|y) evaluated at the points θ1 and θ2 under a given model is called the posterior odds for θ1 compared to θ2. the most familiar application of this concept with discrete parameters with θ2 taken to be the complement of θ1. 


## 1.4 Discrete examples: genetics and spell checking

## 1.5 Probability as a measure of uncertainty

## 1.6 Example probabilities from football point spreads

## 1.7 Example: calibration for record linkage

## 1.8 Some useful results from probability theory

## 1.9 Computation and software

## 1.10 Bayesian inference in applied statistics

## 1.11