# HYPOTHESIS TESTING INTUITION
Determine if a change we made had a meaningful impact or not

**SIGNIFICANCE TESTING**  
- Statistics helps us determine if the difference in the weight lost between the 2 groups is because of random chance or because of an actual difference in the outcomes.   
- If there is a meaningful difference, we say that the results are statistically significant. We'll dive into what this means exactly over the course of this mission.  

**SET UP NULL HYPOTHESIS**
- We first set up a null hypothesis that describes the status quo. 
- We then state an alternative hypothesis, which we used to compare with the null hypothesis to decide which describes the data better.
- In the end, we either need to:  
reject the null hypothesis and accept the alternative hypothesis or  
accept the null hypothesis and reject the alternative hypothesis.

**Null Hypothesis** 
- Our null hypothesis should describe the default position of skepticism, which is that there's no statistically significant difference between the outcomes of the 2 groups. Put another way, it should state that any difference is due to random chance. 

**Alternative Hypothesis**
- Our alternative hypothesis should state that there is in fact a statistically significant difference between the outcomes of the 2 groups.

**TEST STATISTIC**
- To decide which hypothesis more accurately describes the data, we need to frame the hypotheses more quantitatively. 
- The first step is to decide on a test statistic, which is a numerical value that summarizes the data and we can use in statistical formulas. 
- We use this test statistic to run a statistical test that will determine how likely the difference between the groups were due to random chance.

**Difference in Means**  
- Since we want to know if the amount of weight lost between the groups is meaningfully different, we will use the difference in the means, also known as the mean difference, of the amount of weight lost for each group as the test statistic.
- The following symbol is used to represent the sample mean in statistics:

**Null hypothesis:**   
x bar b−x bar a=0  

**Alternative hypothesis:**   
x bar b−x bar a>0

---
** STATISTICAL TESTS**  
Note that while we've stated our hypotheses as equations, we're not simply calculating the difference and matching the result to hypothesis. We're instead using a statistical test to determine which of these statements better describes the data.

**PERMUTATION TEST**  
Now that we have a test statistic, we need to decide on a statistical test. The purpose of a statistical test is to work out the likelihood that the result we achieved was due to random chance.

The permutation test is a statistical test that involves simulating rerunning the study many times and recalculating the test statistic for each iteration. The goal is to calculate a distribution of the test statistics over these many iterations. 

This distribution is called the sampling distribution and it approximates the full range of possible test statistics under the null hypothesis. We can then benchmark the test statistic we observed in the data (a mean difference of 2.52) to determine how likely it is to observe this mean difference under the null hypothesis. 

If the null hypothesis is true, that the weight loss pill doesn't help people lose more weight, than the observed mean difference of 2.52 should be quite common in the sampling distribution. If it's instead extremely rare, then we accept the alternative hypothesis instead.

To simulate rerunning the study, we randomly reassign each data point (weight lost) to either group A or group B. We keep track of the recalculated test statistics as a separate list. By re-randomizing the groups that the weight loss values belong to, we're simulating what randomly generated groupings of these weight loss values would look like. We then use these randomly generated groupings to understand how rare the groupings in our actual data were.

Ideally, the number of times we re-randomize the groups that each data point belongs to matches the total number of possible permutations. Usually, the number of total permutations is too high for even powerful supercomputers to calculate within a reasonable time frame. While we'll use 1000 iterations for now since we'll get the results back quickly, in later missions we'll learn how to quantify the tradeoff we make between accuracy and speed to determine the optimal number of iterations.

Since we'll be randomizing the groups each value belongs to, we created a list named all_values that contains just the weight loss values.

**P-VALUE**  
We can now use the sampling distribution to determine the number of times a value of 2.52 or higher appeared in our simulations. If we then divide that frequency by 1000, we'll have the probability of observing a mean difference of 2.52 or higher purely due to random chance.

This probability is called the p value. If this value is high, it means that the difference in the amount of weight both groups lost could have easily happened randomly and the weight loss pills probably didn't play a role. On the other hand, a low p value implies that there's an incredibly small probability that the mean difference we observed was because of random chance.

In general, it's good practice to set the p value threshold before conducting the study:

**if the p value is less than the threshold, we:**

- reject the null hypothesis that there's no difference in mean amount of weight lost by participants in both groups,
-  the alternative hypothesis that the people who consumed the weight loss pill lost more weight,
- conclude that the weight loss pill does affect the amount of weight people lost.

**if the p value is greater than the threshold, we:**

- accept the null hypothesis that there's no difference in the mean amount of weight lost by participants in both groups,
- reject the alternative hypothesis that the people who consumed the weight loss pill lost more weight,
- conclude that the weight loss pill doesn't seem to be effective in helping people lose more weight.

**P-value Thresholds**  
The most common p value threshold is 0.05 or 5%, which is what we'll use in this mission. Although .05 is an arbitrary threshold, it means that there's only a 5% chance that the results are due to random chance, which most researchers are comfortable with.

**Errors**   
- Type I: accept the alternative hypothesis incorrectly  
- Type II: Reject the alternative hyptothesis incorrectly  

---

**CHI-SQUARED TESTS FOR CATEGORICAL DATA**  
Be able to figure out if the association between two columns of categorical data is statistically significant or not

Difference between observed and expected values
test statistic = (observed - expected)^2 / expected


What we really want to find is one number that can tell us how much all of our observed counts deviate from all of their expected counterparts. This will let us figure out if our difference in counts is statistically significant.   

We can get one step closer to this by squaring the top term in our difference formula:  

(observed−expected)2expected=(1100−1000)21000=10  

Squaring the difference will ensure that all the differences don't sum to zero (you can't have negative squares), giving us a non-zero number we can use to assess statistical significance.

We can calculate χ2, the chi-squared value, by adding up all of the squared differences between observed and expected values.  

** GENERATING A DISTRIBUTION**

Now that we have a chi-squared value for our observed and expected gender counts, we need a way to figure out what the chi-squared value represents. We can translate a chi-squared value into a statistical significance value using a chi-squared sampling distribution.   

If you recall, we covered statistical significance and p-values in the last mission. A p-value allows us to determine whether the difference between two values is due to chance, or due to an underlying difference.  

We can generate a chi-squared sampling distribution using our expected probabilities. If we repeatedly generate random samples that contain 32561 samples, and graph the chi-squared value of each sample, we'll be able to generate a distribution.  

By comparing our chi-squared value to the distribution, and seeing what percentage of the distribution is greater than our value, we'll get a p-value. For instance, if 5% of the values in the distribution are greater than our chi-squared value, the p-value is .05.

Chi-squared tests can only be applied in the case where each possibility within a category is independent. For instance, the Census counts individuals as either Male or Female, not both.  

Chi-squared tests are more valid when the numbers in each cell of the cross table are larger. So if each number is 100, great -- if each number is 1, you may need to gather more data.