

# What is statistics?
The tasks that unifies all research are;

- to decide on the type and structure of data to collect in order to best prove (or disprove) your hypotheses
- to infer conclusions from the data that has been collected
- to present your data and its support for your conclusions

>**Statistics** is the subject invented to deal with data and tackle sampling, inference and presentation in a logical way.

The workshop only discusses a sample selection of ideas behind the inference arm of statistics. The central aims are;

    To discuss the vices and virtues of bayesian and frequentist methodologies
    
    To teach the fundamentals of constructing bayesian models and the algorithms that solve them
    
    To introduce the Stan probabilistic programming language, specifically its Python interface

<img src="graphs/rippingIntoStudents.png" width="500">

# Bayesian vs Frequentist



Inevitably, we will never be able to collect the ideal data set. Pragmatism will always be the order of the day. The underlying usefulness of statistical inference is the ability to quantify the uncertainty of our decisions and conclusions given our imperfect data set. The task of quantifying what you do not know is an understandably non-trivial task. There are therefore two main schools of thought on how to solve this problem. 

Frequentist statistics, otherwise known as classical statistics, is based on the assumption that an ideal distribution actually exists for any given data problem. If we were able to sample an infinite number of data sets in identical conditions we would reveal this 'population' distribution exactly. Any variation between data sets is simply due to the finite nature of our sampling. In the frequentist framework “the parameters are fixed and the data is random”.

Obviously, you cannot sample an infinite number of times. Therefore, for a frequentist if you do not already know the parameters defining the population distribution you can *never* know them. You can only make statements about your samples, not the population distribution and its parameters. Whether you are doing a hypothesis test, maximum likelihood estimation or creating confidence intervals these only give you information about your sample **not** the population. 

- You can come up with a hypothesis for the parameter(s) for the population and ask what is the probability that your sample came from this distribution.

You conduct a binomial or $\chi^2$ test, let us say the subsequent p-value is low $\leq 0.05$. The p-value means if you took 100 samples from this population distribtuion then only around 5 would be similar to the sample you have got. Typical logic leads you to say that "since it was so unlikely to get a sample of this type from this distribution our hypothesis is wrong". However, you only have a statement about your samples, nothing on the hypothesis. It is impossible to know whether or not your sample is from that 5%, and therefore ultimately if your hypothesis is right or wrong.

- You could take a sample and estimate the most likely value(s) for the parameter(s) of the population distribution to give this sample (using maximum likelihood estimates)

This time you have a way of getting an estimate for the population distribtuion given a sample. Unfortunately, you have no way of knowing if you MLE is anything like the actual population parameter(s) because you have no way of quantifying if your sample is a typical sample. You have to assume the sample you have is a good example, in order to accept you MLE is close to the actual value.

- Finally you could take a handful of samples and produce a confidence interval such as "we conclude that the mean height of UK males has a 95% confidance interval of $1.72m \leq \mu \leq 1.77m$

Most would interpret this as there is a 95% chance that the population mean of male heights is within the range $1.72m \leq \mu \leq 1.77m$. They'd be wrong, because you are making a statement about the population parameters, something a frequentist cannot do. We can comment only on the samples you have taken.  A 95% confidence interval implies that if we take numerous samples and calculate confidence intervals for each, 95% of them will contain the true mean. Again you have no way of knowing if your particular sample is in the 95% of correct intervals or 5% of useless ones.

<img src="graphs/frequentistTests.jpg" width="500">

Bayesian statistics refutes the assumption of the frequentist statistics, that unless all the information was known (i.e. infinite sampling) we cannot talk about the absolute population distribution. Instead, the parameter(s) of the population distribution are considered random variables, not a permanently fixed quanitity. We can then define a probability distribution holding our 'belief' that the population distribution has certain values for its parameters. Bayesian methods start by collating all the information known prior to the experiment and supplementing it with any extra information gained from the new data collected to determine whether the hypotheses are now more or less likely than before. In the Bayesian framework “the data is fixed but the parameters are random”

Textbook statistics problem: 

You have a die and are testing whether or not it is bias. You roll it ten times and get 4 sixes, is it a normal die?

*Frequentist Approach*: 

The null hypothesis for the probability of getting X numbers of sixes is a binomial with parameter p fixed at 1/6. Given this null distribution, what is the probability you get a data sample with 4 sixes and values around it?

*Bayesian Approach*:

You have a sample of 10 rolls with 4 sixes, plus a general a-priori estimate for the binomial parameter, something around 1/6. Given this data, what is the probability of p being 1/6 and values around it?

# Why swap to Bayesian?

There is nothing implicitly wrong with using frequentist methods! They are generally faster, well documented and 9 times out of 10 give the same answer as a Bayesian method. There are cases where only a bayesian method would do, i.e. trying to fit ~10000 parameter model to your data using maximum likelihood would be hell. Bayesian methods are also easily modularised, so different sources of error/randomness can be incorporated together. However, the biggest pitfall of frequentist statistics is the detection of those 1 in 10 times the method gets it wrong.  
# Why don't we already use Bayesian Statistics?

1) **Intimidation**

The endless pursuit of abstraction and increasing generality makes mathematics very difficult to digest. The Bayesian method has a solid logical base in probability theory, unlike frequentist methods. This is great! However, it means Bayesian textbooks tend to get bogged down with the mathematical justification before hitting the applications. This can be very difficult for practitioners, often non-mathematicians, to overcome. 

2) **Computation**

The Bayesian quest for a distribution across all of the parameter space, rather than point statistics to summarise the space, means lots of calculations need to be done. Historically this limited the application of Bayesian method to cases where the model was analytical solvable. Therefore reducing the possible choices of models to very few practical cases. The development of fast and cheap computer processors have change that. (Now the lack of coding skills gets in the way)

3) **Subjectivity**

There is a strong critique of Bayesian inference based on the effects of ‘prior’ distributions inside the model. Since you define these before you input any data into your model some people find the effects of non-data orientated information uneasy. They choose to wrangle with the numerous caveats of frequentist methods.

<img src="graphs/fullBayesian.jpg" width="500">
