# Comparing Multiple Proportions with the Chi-Squared Test

Import necessary libraries.

In [2]:
from scipy import stats
import numpy as np

## Optimizely Example

The observed number of conversions in condition 1 was 280 and in condition 2 it was 399. These conditions respectively contained $n_1$ = 8872 and $n_2$ = 8642 experimental units. This is all the information we need to perform the chi-squared test in Python. We don't need the raw data itself, just these summaries. We will do so with the `chi2_contingency` function from `scipy.stats`.

For illustration we test $$H_0:\pi_1=\pi_2 \text{ vs. }H_A:\pi_1\neq\pi_2$$

In [4]:
tab = [[280, 399], [8592, 8243]]
t, pv, df, expected = stats.chi2_contingency(tab, correction = False)
print("t =", t)
print("p-value =", pv)

t = 25.074540269210264
p-value = 5.515630872662502e-07


Note that this test statistic and p-value are identical to those determined by the Z-test for proportions we did last day. This is not a coincidence. The Z-test for proportions and two-sample chi-squared tests are statistically equivalent.

Note also that we could have done this test statistic and p-value calculation manually:

In [5]:
O = [280, 399, 8592, 8243]
n1 = 8872
n2 = 8642
pihat = (280 + 399) / (n1 + n2)
E = np.multiply([n1, n2, n1, n2], [pihat, pihat, 1-pihat, 1-pihat])
t = sum((O-E)**2 / E)
print("t =", t)
pv = stats.chi2.sf(t, df = 1)
print("p-value =", pv)

t = 25.074540269210242
p-value = 5.515630872662567e-07


## Nike SB Example

The observed number of views in conditions 1 to 5 were respectively, 160, 95, 141, 293, 
and 199. These conditions contained $n_1$ = 5014, $n_2$ = 4971, $n_3$ = 5030, $n_4$ = 5007, and $n_5$ = 4980 experimental units. Using this and only this information we can now perform 
the chi-squared test in Python. We do so in a manner analogous to the two-sample version.

In [4]:
tab = [[160, 95, 141, 293, 197], [4854, 4876, 4889, 4714, 4783]]
t, pv, df, expected = stats.chi2_contingency(tab, correction = False)
print("t =", t)
print("p-value =", pv)

t = 129.16856308361378
p-value = 5.864117639139824e-27


Note that we can also extract the expected (as opposed to observed) contingency table:

In [5]:
print("Expected Contingency Table: \n", expected)

Expected Contingency Table: 
 [[ 177.68194544  176.15814735  178.24894008  177.43388529  176.47708183]
 [4836.31805456 4794.84185265 4851.75105992 4829.56611471 4803.52291817]]
