# Classical vs Frequentist vs Bayesian
- Language of probability
- P(X=x) where X is a random variable that can assume a value (discrete or continuous)
- x is a particular instance of X, where x is some event occuring

## Classical
- outcomes that are equally likely have equal probabilities (think of rolling a fair die) P(X=x)=1/6
- Think about enumerating all of the possible outcomes, and possible combinations of well defined outcomes

## Frequentists
- Hypothetical infinite sequence of possible events, then consider how often an event occurs
- Think about rolling a die an infinite number of times
- Probability will be considered as the number of times an event occured over the entire set of outcomes
- useable if we have uncertainty statements that can be defined in a hypothetical infinite sequence
- P(X= Rain tomorrow) ? Does this make sense in a relative frequency interpretation? 
- P(Do we live in an infinite expanding universe)? It will either be 0 or 1 based on this interpretation
- In terms of statistical modeling, frequentists wish to perform repeated experiments (or get really large samples) in an effort to exploit the central limit theorem, to estimate the "true" value for some parameter (typically the conditional average).

https://stats.stackexchange.com/questions/232356/who-are-frequentists

## Bayesian
- Based on prior information, quantifying states of beliefs. 
- Suppose the statement is P(die is loaded), under bayesian inference you can mathematical incorporate prior beliefs about the truth of a statement.
- In terms of statistical modeling, all features of the model are random. You have varying degrees of certainty that the parameter you have estimated is the case. Your posterior expecations consequentially will have a distribution. Some frequentists will wish to do the same by bootstrapping parametric estimates but this is an entirely different exercise then that of the bayesian credible interval. 

https://stats.stackexchange.com/questions/167051/who-are-the-bayesians

--> Sometimes Bayesian vs Frequentist interpretations are considered Subjective vs Objective. This is philosophical and out of the scope of this series. But consider a bayes counter argument: All frequentist interpretations implicitly rely on prior beliefs about things such as what parametric form a regression equation should take, normality assumptions, sample size assumptions etc. They are implicitly bayesian. 

--> Bayesian techniques to statistical inference are heavily used in machine learning. Many frequentist approaches are seen in economic research, randomized trials, medical research, etc. (There is a notion of infinitely many experiments underlying much of the test statistics used in these, they will be seen later when we look at the Bootstrap)

--> comparision of the two: https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading20.pdf

--> great thread, criticizing the p-value: https://stats.stackexchange.com/questions/225002/are-we-frequentists-really-just-implicit-unwitting-bayesians

Stack Exchange will come in handy for quick questions: https://stats.stackexchange.com/

## Basic Review of Probability and Bayes Rule
- P(A|B) = P(A and B)/ P(B) --> Probability of A given that B occured
- P(A|B) = P(A)*P(B) = P(B|A) ---> Independence
- P(A|B) = P(B|A)*P(A)/(P(B|A)*P(A) + P(B|notA)*P(notA)) ==  P(A and B)/P(B)

In [9]:
totalPop = 30
totalFemale = 9
totalMale = 21

totalCompsciMajors = 12
totalCompsciFemale = 4
totalCompsciMale = 8

NoCs = 18
femaleNotCs = 13
maleNotCs = 5

In [8]:
#Probability that someone is female given theyre a computer science major
# Looking at a subset of the initial population, and asking about the probability in that subset
(totalCompsciFemale/totalPop)/(totalCompsciMajors/totalPop)

0.3333333333333333

In [18]:
#the complement P(F|notCs)
(femaleNotCs/totalPop)/(NoCs/totalPop)

0.7222222222222223

In [34]:
#P(CS|F)
numerator = ((totalCompsciFemale/totalCompsciMajors))*(totalCompsciMajors/totalPop)
denominator = numerator + (maleNotCs/NoCs)*(NoCs/totalPop)
numerator/denominator

0.4444444444444445

## Common Distributions
- https://mathcs.clarku.edu/~djoyce/ma217/distributions2.pdf
- https://www.stat.purdue.edu/~fmliang/STAT610/st610lect3.pdf
- http://www.utstat.toronto.edu/mikevans/jeffrosenthal/AppendixC.pdf
- https://scholar.harvard.edu/files/charlescywang/files/basic_statistics_and_probability_for_econometrics_econ_270a.pdf

There are seriously handbooks for these things, no need to know them all
- http://www.stat.rice.edu/~dobelman/textfiles/DistributionsHandbook.pdf

Basic idea is that, depending on the type of event, a random variable can have a PDF that is one of these shapes, or a mixture (common in bayesian is mixing, especially when combining non-conjugate priors)

The choice of a discrete or continuous distribution depends on how youre conceptualizing the variables, how they are measured.