## Classical or frequentist:

- Probabilities refer to relative frequencies of events. They are objective properties of the real world.

- Parameters (such as the fraction of coin flips, for a certain coin, that are heads) are fixed, unknown constants. Because they are not fluctuating, probability statements about parameters are meaningless.

- Statistical procedures should have well-defined long-run frequency properties. For example, a 95% confidence interval should bracket the true value of the parameter with a limiting frequency of at least 95%.

## Bayesian


- Probability describes the degree of subjective belief, not the limiting frequency. Probability statements can be made about things other than data, including model parameters and models themselves.

- Inferences about a parameter are made by producing its probability distribution — this distribution quantifies the uncertainty of our knowledge about that parameter. Various point estimates, such as expectation value, may then be readily extracted from this distribution.

**Example**: A group of scientists have developed a serum that turns sunflowers into roses. They want to study the rate at which this process occurs. Of the 30 sunflowers, all are given the serum and only 24 show signs of changing into a rose.

the success rate is thus 18/30 = 0.6

Both deal with the uncertainty of estimates -- the biggest difference comes with whether or not we can say any aspect of our fixed universe has a set probability.

## Maximum likelihood estimation (MLE)

### MLE approach

1. **The model**: Define a model $M$, $p(D|M)$, which is a hypothesis about how the data is generated. The accuracy of the resulting inferences rely heavily on the quality of our hypothesis, or how well the model describes the actual data generation process. Models are denoted $M(\theta)$ where $\theta$ is a set of model parameters.

2. **Parameter estimates**: search for the best model parameters ($\theta$) which *maximize* $p(D|M)$. From this, we obtain MLE *point estimates*

3. **Confidence intervals**: Determine confidence region for model parameters

4. **Hypothesis testing**: Perform *hypothesis test* as needed to make conclusions about models and point estimates.

### Likelihood function

Given a known or assumed behavior of the generation of our data (i.e. the distribution from which our sample was drawn), we can calculate the probability or likelihood of observing any given value. For example, assume that our data {$x_i$} are drawn from a Gaussian parent distribution, then the likelihood of a given value $x_i$ is given by


$$p(x|\mu,\sigma) = \frac{1}{\sigma\sqrt{2\pi}}\text{exp}\bigg(\frac{-(x-\mu)^2}{2\sigma ^2}\bigg).$$

We assume $x_i ... x_n$ are independent and identically distributed random samples from a pmf or pdf (e.g., they would **not** be independent in the case of drawing from a small parent sample without replacement). This assumption allows us to define the likelihood of the entire data set, $L$ as the *product* of likelihoods for each particular value,

$$L \equiv p(\{x_i\}|M(\theta)) = \prod \limits_{i=1}^n p(x_i|M(\theta)),$$

where M is the model, $\theta$ is the unknown parameter, $x_i ... x_n$ the random samples. Essentially, the likelihood function is the joint pdf or pmf of $x_i ... x_n$



### applied to homoscedastic Gaussian likelihood
### Properties of maximum likelihood estimators
### confidence intervals
### applied to heteroscedastic Gaussian likelihood
### case of truncated or censored data
### other cost functions and robustness

## Goodness of fit and model selection
### The goodness of fit for a model