## Making a hypothesis

Now we come to what is often the main paradigm in science: hypothesis testing. 

[Hypothesis](https://en.wikipedia.org/wiki/Hypothesis) is a proposed explanation about some phenomenon. A hypothesis is not a guess, but is usually a reasonable argument that explains why something occurs. The way science works is by making observations, thinking about what might be causing those observations and then coming up with a reasonable hypothesis. This hypothesis now needs evidence to support it. How do we quantify this process of "evidence support" for a hypothesis?

In hypothesis testing, we first establish something called a **Null Hypothesis**. A Null Hypothesis, referred to as $\mathbf{H}_0$, is the standard default hypothesis about a phenomenon or can be the prevailing hypothesis about the phenomenon. For example, let us say we are developing a drug for Malaria. The null hypothesis here would be that the drug doesn't work. That is the default assumption, the job of science is to "disprove" the null hypothesis in the favor of the scientists' hypothesis (called the **alternative hypothesis**, referred to as $\mathbf{H}_1$). This way, the process of showing that the drug works is reduced to disproving that it doesn't work! 

(NOTE: It is not exactly correct to use "disprove" since things cannot usually be proven or disproven outside of mathematics. Evidence can only support or oppose hypotheses.)

#### Questions!

**Q**: Why can we not say prove or disprove? Does that make mathematics different from observational science?

**Q**: Why do we care about gathering evidence against a null hypothesis rather than directly evidence supporting the alternative? Think of trying to prove that all flamingoes are pink. Does it help to see more and more pink flamingoes or look for a white one?

### Thinking about error

Let us say we have a Null Hypothesis that tortoises are as fast as rabbits (why should we think otherwise?). We go to the woods and make observations and think that maybe that isn't true. Maybe rabbits are faster. So we come up with our alternative hypothesis that rabbits are faster.

We do some statistical magic and we can reach one of two possible outcomes. Either we reject the null hypothesis (in favor of the alternative) or we fail to reject the null hypothesis (in which case we stick with it for now). Out in the real world, the null hypothesis is either true or false, irrespective of what kind of statistics we do. Therefore, we can summarize the possible outsomes in the form of a 2 by 2 table

||Null is True|Null is False|
|:--:|:--:|:--:|
|We rejected the Null|ERROR|AWESOME!!|
|We failed to reject the Null|AWESOME!!|ERROR|

The two different errors are often called Type 1 and Type 2 Errors. Look at the meme below and try to figure out which of the errors is type 1 and which is type 2!

![](img/type-i-and-type-ii-errors.jpg)


## How exactly can we do this?

In our example, what we want to do is to compare the population distribution of the rabbit speeds to the population distribution of the tortoise speeds. How do we do this? We might need to make some assumptions. What if we assume that both of the animal speeds follow a Normal distribution? We can then do some quantitative comparisons and see if the two distributions (or their means) are indeed separate. See the following examples. Also remember that we studied Atlantis temperature and the fact that variance changes maybe another way two distributions differ from one another. Have a look at the examples below to visualize how this can be achieved.

![](img/dist1.png)

![](img/climatedist.png)

## Testing in Statistics
In summary, this will be our logic and workflow.

1. We assume a null hypothesis and an alternative hypothesis.
2. We calculate a statistic from data (maybe a sample mean, or something else) whose distribution we know if the null hypothesis is true
3. If this statistic, based on its distribution, is very very unlikely we would think that the distribution is not true (i.e. the null hypothesis can be rejected)!
4. If not, we fail to reject the null hypothesis

Let's take an example of one such test. 
1. To compare the rabbit and tortoise speeds, we will first assume the null hypothesis that they both come from the same distribution (i.e. tortoises are as fast as rabbits).
2. Now, if this null hypothesis were true, we can calculate that the sample mean of the speeds of tortoises should follow a normal distribution with mean = population mean of the rabbit speeds and some specific variance. Remember, given the mean and the variance for a normal distribution, we can exactly calculate the probability of different measurements.
3. Now, we plug in the calculated sample mean of tortoise speeds into our distribution and see how likely are we to get such a sample mean for tortoises if the null hypothesis is actually true (i.e. they are as fast as rabbits). We will find that it is extremely unlikely we get such a result. So we assume that the null hypothesis is probably not true and reject it!

## Chi-Squared, A statistical test for categorical data

Finally, we are ready to think about how to look at M&Ms. To do so we will use a statistical test, which is where all of our previous work was going. The Chi-Squared (χ²) test we will use is calculated as:

$$\chi^2 = \frac{((\text{number of actual observations} - \text{number of expected observations})^2}{\text{number of expected observations}}$$

This χ² is the statistic now (like sample mean in the above example). We will calculate it from data and compare it with its distribution assuming the null hypothesis to get the idea how likely it is for us to get such a value if null hypothesis is actually true. If this is very unlikely, we reject the null.

For example, let's assume in a classroom of 100 students, we expect to see 50 males and 50 females. How would we calculate the χ² value for a classroom where we saw 51 males and 49 females?

Males:
Observed number ($O$) = 51
Expected number ($E$) = 50
- $(O - E) = 1$
- $(O - E)^2 = 1$
- $\frac{(O - E)^2}{E} = 0.02$

So 0.02 is our χ² value. What does this mean? Nothing yet


We need to know two more things. The first is easy, we need to know how many choices we could have had - in our classroom, how many possible catagories did we choose from? In our case, only two: Male or Female. We take our number of choices (n), and always subtract 1, to give us `df`, our 'degrees of freedom'. Now we can take the 0.02 value to the Chi-squared [chart](https://people.richland.edu/james/lecture/m170/tbl-chi.html). According to this chart with df = 1, the 0.02 value lies somewhere between 0.9 and 0.1. This is the probability to get such a value if the null hypothesis was true. It is very likely (0.9 means 90% chance!). Therefore, we do not reject the null hypothesis. If you look at the values, you can perhaps guess this. The null hypothesis (as many males as females in the class) is not violated by obvserving 2 more males than females! That can easily happen by chance. We will talk more about this later, but let's try an example with our M&Ms 

## M&Ms and Chi-Squared

According to published information, here are the probabilities for M&M colors:

|Color|Probability|
|-----|-----------|
|Blue|0.24|
|Brown|0.13|
|Green|0.16|
|Yellow|0.14|
|Red|0.13|
|Orange|0.20|

#### What is our null hypothesis? What is our alternative?
Take a guess below. Are the M&Ms you have in your tube the same as any random sample of M&Ms? (i.e. are they a representative sample)?


Given the above, the complete the following python code by taking your counts of the M&Ms and placing them in code block below:

In [None]:
#Create a dictionary to hold the observations
observed_mnm = {'Brown': ,
                'Blue': ,
                'Red': ,
                'Orange': ,
                'Yellow': ,
                'Green': }

In the above code block, we used a new data structure called dictionary. A dictionary works like a list in many ways, except you can call dictionary items by their name, rather than an index number:

In [None]:
my_favs = {'color':'red',
           'day':'monday',
          'fruit':'bannana'}

print(my_favs['color'])
print(type(my_favs))

In [None]:
#import some special python functions
import scipy.stats as st

In [None]:
colors = ['Blue', 'Brown', 'Green', 'Yellow', 'Red', 'Orange']
# compute the total number of M&Ms
total_mnm = 0
for color in colors:
    # get the color value from the observed_mnm dict
    # add it to total

In [None]:
# In order to do hypothesis testing, we need the "expected" frequencies
# i.e. given the null hypotheses how many mnms of each color would we expect to get
# we create another dictionary and assign to each color the probability under expected hypothesis (see above)
# complete the dictionary below
expected_probabilities = {
    'Blue': ,

}

In [None]:
# create an expected percentages for each of the colors in order
expected_values = []
observed_values = []
# we will use list.append to slowly build these lists one color at a time
# the order in the two lists will be similar
for color in colors:
    observed_values.append(observed_mnm[color])  # add it to the observed list
    expected_value_for_this_color = expected_probabilities[color] * total_mnm
    # now append this value in the appropriate list

The stats package will do the actual test for us. It will return both the statistic (the $\chi^2$ statistic which we will compare to a known distribution) and something called a p-value (which is the probability that we would make similar observations if the null hypothesis is true).

In [None]:
# Use the stats package to do the ChiSquare test
result = st.chisquare(observed_values, expected_values)
print(f"Chi-squared statistic is {result[0]}")
print(f"p-value is: {result[1]}")
print(f"Probability null hypothesis is true: {result[1] * 100}")


if (result[1] * 100) > 5:  # it is likely to get such a result by chance if null is true
    print("You should accept the null hypthothesis!")
else:  # very unlikely to get such a result by chance if null is true, so we reject the null
    print("You should reject the null hypthothesis!")

In [None]:
#generate the plot to visually compare expected and observed probabilities

from matplotlib.pyplot as plt

data = 
# plot it using the plt.bar function we learnt about earlier
plt.show()
data2 = 
# plot it using the plt.bar function we learnt about earlier
plt.show()

## How testing allows us to evaluate a hypothesis
![](img/pval1.png)
![](img/rejecth0.jpg)