# In Brief Notes

This notebook will give a quick summary of all the material in the chapters. 

## Chapter 1: Basic concepts of Bayes

* Backward looking - exploratory data analysis (descriptive stats, visualisation)
* Forward looking - Inferential stats

Here we us IS methods, and then EDA to summarise, interpret, check, communicate results

Generating data is a **stochastic process**, there is always uncertainty involved.

Bayes has 3 steps
1. Design model by combining prob dists like legos
2. use Bayes Theorem to condition (combine model with data)
3. Critisise the model

Some basic stuff on probabilities, should be familiar by now. To summarise

> Probablities are used to measure the uncertainty we have about parameters and Bayes' theorem is the mechanism to correctly update those probabilities in light of new data, hopefully reducing our uncertainty.

You can use Kruschke diagrams to reprsent models.

The Posterior is the outcome of Bayes' theorem. Usually this is what you report on, giving various averages and spreads.

The Highest Posterior Density (HPD) is common. It's the shortest possible interval of the x-axis which contains X% of the the probability density. It allows you to make a statement like 'we think the parameter theta is between 2 and 5, with a probability of 0.95'. Don't confuse this with a confidence interval

You can use the posterior to generate the **posterior predictive**. You can then use this to conduct **posterior predictive checks**, PPCs.

## Chapter 2: PyMC3 (and arViz)

PyMC3 is a Probablistic Programming Language - a language to create models and run inference on them with numerical analysis. An *Inference Button*

A typical pattern
1. specify your paramters
2. specify your likelyhood and pass it your data
3. create a trace with `sample`
4. do an arviz `plot_trace` and `summary` on the trace
5. plot posterior with HPD with `az.plot_posterior`

**Region of practical equivalence (ROPE)** is an interval you choose that you consider important (you might say a coin should be considered fair if its probability of coming up heads is betweeb 0.45 and 0.55). You can compare your HPD with your ROPE to judge your posterior.

**Loss functions** are another way to make analyse the posterior. The idea is to capture how different are the true parameter value and the estimated value are. A larger loss function means a worse estimation.

The Gaussian is useful because of the CLT. But the Gaussian can be over sensitive to outliers. It can be desentitised using Students T (where the DOF parameter is also assigned a prior).

>***No notes from comparing groups, effect size and hierarchical models and shrinkage - sure I did this?!?***

## Chapter 3: Linear Regression

Data in the form of pairs of observations: $\{(x_1,y_1),(x_2,y_2),...,(x_n,y_n)\}$.

If x and y are both 1d lists, then you have **Simple Linear Regression**

If x is multidimensional you have **Multiple Linear Regression**

### Simple linear regression
$y_i = \alpha + x_i \beta$ is the core of LinReg - the expression of 'linear' relationship between x and y. The goal is to estimate the params alpha and beta. A traditional method is by least squares fitting. Another (used here) is generating a probablistic model. 

$y \sim N\left(\mu = \alpha + x\beta, \epsilon\right)$, where alpha and beta are normal and epsilon is half cauchy or uniform.

With these models alpha and beta tend to be highly correlated (*autocorrelated*). This is a logical consequence of the method, because effectively what you are doing is finding the 'center' of the data, and drawing a line through it. If you think what  happens to the slope and y intercept when you do this, it's easy to see that movements in alpha and beta must be correllated. A way to get around this is to make sure your x data is centered around the mean, so the line you draw through the data will always cross the y axis at around the same value.

**Correlation coefficients** measure the degree of linear dependence between two variables, how closely the two ca. The *Pearson*, r, is common. for linear regression, the *coefficient of determinaton* is r^2.

You can estimate r by the formula

$$r = \beta \frac{\sigma_x}{\sigma_y}$$

You can estimate the pearson by estimating the values in the co-variance matrix.

$$\Sigma = \begin{bmatrix}
    \sigma_{x1}^2 & \rho\sigma_{x1}\sigma_{x2} \\
    \rho\sigma_{x1}\sigma_{x2} & \sigma_{x2}^2 
    \end{bmatrix}$$
    
where $\rho$ is the pearson correlation coefficient between the two variables (you would have one rho for each pair of variables)

To esimate this in PyMC3 you would use as your likelihood a Multivariate Normal

$y \sim MvN(\mu, \Sigma)$

$\mu \sim N([\bar{x},\bar{y}], 10)$

$\sigma_1 \sim HalfN(10)$

$\sigma_2 \sim HalfN(10)$

$\rho \sim U(-1,1)$

As with single variable problems, you can use Student T in place of Normal to desensitize

As with single variable problems, you can create hierarchical models with hyper-priors

### Polynomial Linear Regression
Simple linear regression fits a straight line $\alpha + \beta x$. Polynomial lin reg looks at higher order equations, i.e. curves, generalising the formula to

$$\mu = \beta_0 x^0 + \beta_1 x^1 + \dots + \beta_m x^m$$

The modelling is basically the same as simple linear regression - you just have more parameters (more $\beta$'s). The interpretation is not so simple as slope and y-intercept.

### Multiple linear regression
So far we've looked at an x and y pair, where we want to use x to predict y. Call x your independent variable, IV, y your dependent variable, DV.

If your model has several IVs, you have **multiple lin reg**. if you have several DVs it's **multivariate lin reg**.

$$\mu = \beta_1 x_1 + \dots + \beta_m x_m = \sum_{i=1}^n \beta_i x_i $$

From a modelling perspective it just looks like simple LinR, vectorised.

Be careful of multicollinearity with multiple LinR - if you have x1, x2 IVs, and y DV, but x1 and x2 are very highly correlated to eachother, then you won't be able to lock down beta1 and beta 2 (they will be *indeterminate*). If x1 and x2 are basically the same then $\mu = \beta_1 x_1 + \beta_2 x_2 = (\beta_1 + \beta_2) 2x$, so beta1 can take any real value, as long as beta2 is has an equal and opposite one. In this case you should probably drop one of the IVs

