# Must-read Guide to Hypothesis Tests You Will Never Use
## Theory + Code ## TODO
<img src='images/cotton.jpg'></img>
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://www.pexels.com/@cottonbro?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels'>Cottonbro</a>
        on 
        <a href='https://www.pexels.com/photo/persons-hand-on-black-typewriter-6143834/?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels'>Pexels</a>
    </strong>
</figcaption>

### Setup

In [1]:
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt

sns.set_context('talk')

### Demotivation

Let me get your hopes down right from the beginning. You won't be using the concepts learnt from this article until you get  a  job in data science. You may practice it a few times on public datasets and that's probably it. 

Then, why learn them? Well, I don't want to you to be like a new-hatched bird thrown from the Everest when you *do* get a real job. So, what are we talking about here? 

We are talking about the icing on the cake🎂, the thing that concludes any data related projects in business and scientific research - **hypothesis testing**. It tests the results of surveys or experiments to see if they happened by chance or are actually meaningful and significant.

For example, a medical team might assess the effectiveness of a new drug💊 by giving it to two different test groups and comparing for different metrics like recovery time, side effects, etc. There is an important reason why the results are not accepted "as is". 

Any data collected for a survey or research is only a sample of the group of interest (also called the population). Medical teams can only test a new drug on a few hundred patients. Because it is impossible to test it on the whole population of patients diagnosed with a particular illness. Obviously, doctors have to make sure that a new drug works on everyone not just on the small subset a single hospital. 

In statistical terms, this is the same as making predictions for population parameters by looking at sample statistics. In other words, making sure that the results are not too specific to the small sample obtained but applies to a wide range of individuals. 

Generally, hypothesis tests combines many concepts of statistics and probability such as conditional probability, probability distributions, confidence intervals and so on. I am not trying to scare you with fancy terms but it generally takes a while to wrap your head around hypothesis testing. So, in this article  we will only focus on theory with intuitive examples and I will leave a link to a good source to implement the theory in code. 

### Real-World Example

Honestly, it took me a long time to put the pieces together but the resource that made everything click for me was Khan Academy's hypothesis testing playlist. So, to give you the initial idea, I will use one of the examples from there. Let's begin!

There are 4 brothers in the family: Jon, Bruce, Harry, and Bob. They decide that to find out who does the dishes every day, their eldest brother, Jon, randomly draws a name from a bag. 

For four nights in a row, Jon does not get selected so Bruce starts to get suspicious. He hypothesizes that Jon is cheating but to make sure, he decides to wait a few days longer. On the tenth day, Jon still doesn't get picked. Then, Bruce thinks that Jon is definitely cheating and wants to test this hypothesis with his probability skills. 

To be safe, he starts with the assumption that Jon is innocent but if the probability turns out to be less 20%, he can safely tell his parents that Jon is cheating. 

Given that everyone has an equal chance of getting picked which is 25% on any day, the probability of not getting selected is 75%:

![image.png](attachment:image.png)

Bruce calculates the probability of not getting picked for 10 days with joint probability:

![image.png](attachment:image.png)

The result is about 6% or, in other words, this particular event happens 6 times every 1000 days (do the math to check). The result is lower than Bruce's suspicion level, so he can tell his parents that Jon was cheating.

In real life, all hypothesis tests generally follow this pattern. First, two competing ideas are generated (1. Jon is innocent and 2. Jon is cheating). Then, using the sample data (observed results of 10 days of random picking) some test statistic is checked (in our case, probability) if it happened by chance or it is actually a new measurement. 

At this point, a question arises: How do we figure out if the result is random or a new measurement: that's where significance levels come into play. In the brothers' example, Bruce set a significance level of 20% and if his test turned out to be less than that, he rejected his hypothesis of Jon being innocent and concluded that he is cheating.

In business and research, this significance level is usually set to 5%. In medical fields, it is 1%. We will look at each step of hypothesis testing in detail in the coming sections.

### Setting Up Hypotheses

The first step of hypothesis testing is to state the competing hypotheses beforehand. They should be opposite, non-overlapping hypotheses. As an excellent example, I will take the judicial system where each case starts with the hypotheses "Innocent, until proven guilty". 

In terminology, the competing hypotheses are called **Null** and **Alternative** hypotheses. Null hypothesis is something we believe without looking at any data or evidence. Above, everyone is considered to be innocent until evidence suggests otherwise. That's why Bruce also started with the assumption that his elder brother is innocent. 

As the name suggests, **Alternative Hypothesis** is the complete opposite of the Null. In practice, it is also the idea that we want to prove or achieve. Bruce wants to prove that his brother is cheating, so the Alternative is 'Jon is cheating'. In court, the Alternative is 'the charged individual is guilty'.

Moving on to notation, Null is notated as $H_{0}$ while alternative is $H_{1}$ or sometimes, $H_{a}$. For mathematical representation, the Null usually comes with some kind of 'equal' sign: $=, \leq, \geq$.  Alternative may involve these symbols: $<, >, \neq$

Let me explain the above with another example. A website is testing a new UI (user interface) design to check if it drives more traffic than the old one. The owner wants to check this by comparing the mean amount of traffic for a single month. 

Just like I said, the Null hypothesis is something we believe to be true before collecting any data. In this case, without trying out the new UI for a single month, our Null will be "The mean traffic from the old design is as good or better than the new design". And we want to prove or achieve a greater traffic from the new UI so the Alternative will be: "The mean traffic of new UI is greater than the mean of the old UI". In notation, this will be:

$H_{0}: \mu_{old} \geq \mu_{new}$

$H_{1}: \mu_{old} < \mu_{new}$

For this case, we can have another set of hypotheses like mean is greater than sum number:

$H_{0}: \mu_{old} \leq 70$

$H_{1}: \mu_{old} > 70$

Here, we want to achieve a mean traffic greater than 70 so we start with the assumption that the traffic is either equal or lower than this value. As a final one, check if it is equal or not:

$H_{0}: \mu_{old} = 70$

$H_{1}: \mu_{old} \neq 70$

The last version is case-specific. For example, some medical tests need to have a certain accuracy score to be accepted by the government. Using the above formula, we can formulate a relevant hypothesis to check that.

### Types of Errors

### p-values, true nightmare

### Simualting With Python