### Part 6 - Chi Square Test ##

The Pearson Chi-Square test can be used to test for significant differences between groups where multiple factors are involved in the outcome - i.e. a comparison of survey responses ("A" vs "B") for some people who have been given one of *two* sets of information (such as "true" info and "false" info, for example).   

The Chi-Square takes a contingency table input similar to the following:

```
      | A   |  B
-----------------
true  | 30  | 10
false | 20  | 20
```

... so it looks like giving people false information skews the result by 10/40 or 25% in this example.  
  
The Contingency Table can have as many columns and rows as necessary.

+ Rows = Outcomes
+ Columns = Conditions

Use the SciPy Stats chi2_contingency function to perform a Chi Square test:    

In [7]:
from scipy.stats import chi2_contingency

X = [[30,10],
     [20,20]]

chi2, pval, dof, expected = chi2_contingency(X)
print(pval.round(5))

0.03767


On the basis of this we *Accept the Null Hypothesis* and say this variation could have occured by chance.
  
What if we got the same ratio difference for a larger sample?

In [13]:
X = [[300,100],
     [200,200]]

chi2, pval, dof, expected = chi2_contingency(X)
print(pval)

4.832154459214226e-13


...we can be pretty sure that this is a significant difference and we definitely *Reject the Null Hypothesis*.

#### Example from CodeCademy ####

The management at the VeryAnts ant store wants to know if their two most popular species of ants, the Leaf Cutter and the Harvester, vary in popularity between 1st, 2nd, and 3rd graders.
  
We have created a table representing the different ants bought by the children in grades 1, 2, and 3 after the last big field trip to VeryAnts. Run the code to see what happens when we enter this table into SciPy's chi-square test.
   
Does the resulting p-value mean that we should reject or accept the null hypothesis?

In [15]:
from scipy.stats import chi2_contingency

# Contingency table
#         harvester |  leaf cutter
# ----+------------------+------------
# 1st gr | 30       |  10
# 2nd gr | 35       |  5
# 3rd gr | 28       |  12

X = [[30, 10],
     [35, 5],
     [28, 12]]
chi2, pval, dof, expected = chi2_contingency(X)
print(pval)

0.15508230807673704


On the basis of this we Accept the Null Hypothesis - we tell management that there is *no significant difference in popularity for these ants between 1st, 2nd, 3rd grade pupils *