# Testing Muitiple Proportions, Independence, and Goodness of Fit

## Testing the Equality of Three or More Population Proportions
$H_0: p_1 = p_2 = ... =p_k$

$H_a: $ not all population proportions are equal

where k $\geq$ 3 is the total number of populations

$p_i$ is the proportion of population i, $1 \leq i \leq k$

### Example
JD Power uses the proportion of owners likely to repurchase a particular automobile as an indication of customer loyalty for the automobile. An automobile with a greater proportion of owners likely to repurchase is concluded to have greater customer loyalty. Suppose that in a particular study we want to compare the customer loyalty for three automobiles: Chevrolet Impala, Ford Fusion, and Honda Accord. The current owners of each of the three automobiles form the three populations for the study.

Let $p_1, p_2, p_3$ be the proportions likely to repurchase Impala, Fusion, and Accord for their respective populations.

The hypotheses are stated as follows:

$H_0: p_1 = p_2 = p_3$

$H_a: $ not all population proportions are equal

To conduct this hypothesis test we begin by taking a sample of owners from each of the three populations. 

#### The Observed Frequencies

|                        | Impala Owners | Fusion Owners | Accord Owners | Total  | 
|:-----------------------|:-------------:|:-------------:|:-------------:|:------:|
| likely to repurchase   | 69            | 120           |   123         |  312   |
| unlikely to repurchase | 56            | 80            |    52         |  188   |
|    total               | 125           | 200           |   175         |   500  |


#### Expected Frequencies under the Assumptuion $H_0$ is True
1. $\frac{312}{500} = .624$ owners in the sample are likely to repurchase.
2. If $H_0$ is true, then there should be 62.4% of Impala, Fusion, and Accord owners, respectively, who are likely to repurcases.
3. The expected # of Impala owners who are likely to repurchase is thus .624*125 = 78; The expected # of Impala owners who are unlikely to repurchase is thus 125 - 78 = 47, etc.


|                        | Impala Owners | Fusion Owners | Accord Owners | Total  | 
|:-----------------------|:-------------:|:-------------:|:-------------:|:------:|
| likely to repurchase   | 78            | 124.8         |   109.2       |  312   |
| unlikely to repurchase | 47            | 75.2          |    65.8       |  188   |
|    total               | 125           | 200           |   175         |   500  |


### Formulas for Computing Expected Frequencies and Chi-Sqaure Test Statistic

<img src="Fig12-1.bmp">

<img src="Fig12-2.bmp">

In [1]:
from scipy.stats import chi2

In [2]:
observed = [69, 120, 123, 56, 80, 52]
expected = [78, 124.8, 109.2, 47, 75.2, 65.8]

In [3]:
chi_square = 0

for i in range(len(observed)):
    chi_square += (observed[i]-expected[i])**2./expected[i]

crit = chi2.ppf(0.95, 2) # Find the critical value for 95% confidence

p_value = 1 - chi2.cdf(chi_square, 2)

print chi_square, crit, p_value

7.89104512509 5.99146454711 0.0193411067905


### Summary of Procedure for Testing the Equality of Three or More Population Proportions

<img src="Fig12-3.bmp">

### **A chi-square test for equal populaton proportions will always be an upper tail test.**
* If $H_0$ is true, then the differences between the observed frequencies and the expected frequencies will be relatively small; thus the chi square test statistic will be relatively small. 
* If, on the other hand, $H_0$ is false, then the chi square test statistic will be relatively large.

### Concluding Remarks
1. In this example, we are essentially testing whether the hypothesis P(repurchase|Impala) = P(repurchase|Fusion) = P(repurchase|Accord) is true.
2. If the hypothesis is true, we can then conclude that customer loyalty for automobiles is independent of the brands.
3. Thus, chi square test can be used to test whether two (categorical) variables are independent.