# Bayesian Statistics vs. Frequentist Statistics

---

# What are Frequentist Statistics
**Frequentist Statistics** test whether an event (hypothesis) occurs or not. It calculates the probability of an event in the long run of the experiment (i.e the experiment is repeated under the same conditions to obtain the outcome).

Here, the sampling distributions of **fixed size** are taken. Then, the experiment is theoretically repeated **infinite number of times** but practically done with a stopping intention. For example, I perform an experiment with a stopping intention in mind that I will stop the experiment when it is repeated 1000 times or I see minimum 300 heads in a coin toss.

Lets look at the example of a coin toss. We are trying to estimate the fairness of the coin. 

![freq%20table.png](attachment:freq%20table.png)

* We know that probability of getting a head on tossing a fair coin is 0.5. 
* No. of heads represents the actual number of heads obtained. 
* Difference is the difference between 0.5*(No. of tosses) - no. of heads.

An important thing is to note that, though the difference between the actual number of heads and expected number of heads( 50% of number of tosses) increases as the number of tosses are increased, the proportion of number of heads to total number of tosses approaches 0.5 (for a fair coin).

This experiment presents us with a very common flaw found in frequentist approach i.e. **Dependence of the result of an experiment on the number of times the experiment is repeated.**

# Flaws in Frequentist Statistics
### 1) p-values depend on sample size
The p-value is the probability of obtaining an effect at least as extreme as that in your data, *if* the null hypothesis is true. Well, p-values measured against a sample (fixed size) statistic with some stopping intention change with change in intention and sample size. i.e If two people work on the same data and have a different stopping intention, they may get two different  p- values for the same data, which is undesirable.

For example: Person A may choose to stop tossing a coin when the total count reaches 100 while B stops at 1000. For different sample sizes, we get different t-scores and different p-values. Similarly, intention to stop may change from fixed number of flips to total duration of flipping. In this case too, we are bound to get different p-values.

### 2) Confidence Interval (C.I) depends on sample size
Like p-value, the confidence interval depends heavily on the sample size. This makes the stopping potential absolutely absurd since no matter how many people perform the tests on the same data, the results should be consistent.

### 3) Confidence Intervals are not probability distributions
Therefore they do not provide the most probable value for a parameter and the most probable values. They simply say that there is a percentage chance that the real value lies within that interval. 

# Bayesian Statistics
“Bayesian statistics is a mathematical procedure that applies probabilities to statistical problems. It provides people the tools to update their beliefs in the evidence of new data.”

Lets look at an example to delve into this. 

Suppose, out of all the 4 championship races (F1) between Niki Lauda and James hunt, Niki won 3 times while James managed only 1 victory.

So, if you were to bet on the winner of next race, who would he be ? I bet you would say Niki Lauda.

Here’s the twist. What if you are told that it rained once when James won and once when Niki won and it is definite that it will rain on the next date. So, who would you bet your money on now?

By intuition, it is easy to see that chances of winning for James have increased drastically. But the question is: how much?

To answer that question, need to familiarize ourselves with some concepts of probability! 

# Probability Background
The goal from this probability background is to gain a good intuition of the principles at hand, so that when they are utilized in the next section for Bayesian Inference, it is not a road block to our understanding. We will start with an example, and then dive into a few specific rules.

---

# Bayes Theorem Example
Lest say we are faced with the following test scenario:
* 1% of women have breast cancer (and therefore 99% do not).
* 80% of mammograms detect breast cancer when it is there (and therefore 20% miss it).
* 9.6% of mammograms detect breast cancer when it’s not there (and therefore 90.4% correctly return a negative result).

We can put this into a table, that looks like this: 

![prob%20table.png](attachment:prob%20table.png)

Here is how we read the table: 
* 1% of women have cancer
* If you already have cancer, you are in the first column. There’s an 80% chance you will test positive. There’s a 20% chance you will test negative.
* If you don’t have cancer, you are in the second column. There’s a 9.6% chance you will test positive, and a 90.4% chance you will test negative.

How could we represent this graphically...

![test.png](attachment:test.png)

Keep that diagram in mind as we go through this example, it is invaluable in thinking of the situation intuitively. 

### How accurate is the test? 
Now suppose you get a positive test result. What are the chances you have cancer? 80%? 99%? 1%?

Lets walk through it. 
* Ok, we got a positive result. It means we’re somewhere in the top row of our table. Let’s not assume anything — it could be a true positive or a false positive.
* The chances of a true positive = chance you have cancer * chance test caught it = 1% * 80% = .008
* The chances of a false positive = chance you don’t have cancer * chance test caught it anyway = 99% * 9.6% = 0.09504
Lets update our table with this information:

![update%20table.png](attachment:update%20table.png)

What was our question again? **What’s the chance we really have cancer if we get a positive result?**. Well, the chance of an event is the number of ways it could happen given all possible outcomes. This leads us to some general intuition about probability! 

### $$Probability = \frac{desired \; event}{all \; possibilities}$$

What does that look like in our case? Well we have:
* desired event = given a positive test, we have cancer
* all possibilities = a postive test and having cancer + a positive test and having no cancer

In other words:

### $$Probability \; of \;cancer \;given\;positive\;test = \frac{positive\;test\;and\;cancer}{positive\;test\;and\;cancer+positive\;test\;and\;no\;cancer}$$

And if we fill in our values from the table we arrive at:

### $$Probability = \frac{0.01*0.8}{0.01*0.8+0.99*0.096}=0.0776=7.6\%$$

**Interesting!** — a positive mammogram only means you have a 7.8% chance of cancer, rather than 80% (the supposed accuracy of the test). It might seem strange at first but it makes sense: the test gives a false positive 9.6% of the time (quite high), so there will be many false positives in a given population. For a rare disease, most of the positive test results will be wrong.

Let’s test our intuition by drawing a conclusion from simply eyeballing the table. If you take 100 people, only 1 person will have cancer (1%), and they’re most likely going to test positive (80% chance). Of the 99 remaining people, about 10% will test positive, so we’ll get roughly 10 false positives. Considering all the positive tests, just 1 in 11 is correct, so there’s a 1/11 chance of having cancer given a positive test. The real number is 7.8% (closer to 1/13, computed above), but we found a reasonable estimate without a calculator.

# Probability Rules 
Lets look at a few probability rules and try to come back and solve the above example.
## Conditional Probability
This defines the probabilty of event A occuring, given that event B occurs. In this case A and B are dependent. Mathematically it looks like: 
### $$P(A|B) = \frac{P(A\cap B)}{P(B)}$$
and we can use the multiplication rule:
### $$P(A \cap B) = P(A)*P(B|A)$$
to get:
### $$P(A|B)=\frac{P(A)*P(B|A)}{P(B)}$$
This is known as **Bayes Theorem!!!**

An example: Given that you draw a red card, what is the probability that it is a 4?
### $$P(4|red) = \frac{P(4)*P(red|4)}{P(red)}=\frac{\frac{1}{13}*\frac{1}{2}}{\frac{1}{2}}=\frac{1}{13}$$

## Joint Probability
This defines the probability of event A occuring and event B occuring, otherwise defined as the intersection of A and B. Mathematically it looks like:
### $$P(A\;and\;B)=P(A \cap B)$$
Example: If you draw a card at random, what is the probability that it is 4 and red?
### $$P(4\cap red) = \frac{4}{52}+\frac{26}{52} = \frac{1}{26}$$

## Conditional vs joint
The main thing that changes between conditional and joint probability is that when we work with conditional the sample space changes because you have been given new information, so some choices can be eliminated and certain probabilities may update! 

# Back to our example, using Bayes Theorem
Okay, armed with our new knowledge, lets try and solve our orignal problem and see if we end up with the original answer!

Remember, Bayes theorem is defined as: 
### $$P(A|B)=\frac{P(A)*P(B|A)}{P(B)}$$
In our example that would look like:
### $$P(cancer\;|\;positive\;test)=\frac{P(cancer)*P(positive\;test\;|\;cancer)}{P(positive\;test)}$$
### $$\frac{0.01*0.8}{0.01*0.8+0.99*0.096}=7.8\%$$
This yields the same result as our general intuition equation! Maybe Bayes theorem isn't so hard after all...

# Bayesian Inference
The above example is very helpful to explain where Bayesian Inference comes from and showing the mechanics in action. However, in Data Science applications it is generally used to to interpret data! By pulling in prior knowledge about what we know, we can draw stronger conclusions with small data sets! 

# Probability Distributions 
