# Pearson's chi-squared test

A statistical hypothesis test that is used to determine whether there is a statistically significant difference between the expected frequencies and the observed frequencies in one or more categories of a contingency table. The formula below is used to calculate the difference between the observed and expected in a table of values.

$\chi^2 = \mathbf{N} \sum \limits _{i=1} ^n \frac{(O _i / \mathbf{N} - p _i)^2}{p _i}$

Where:

$\chi^2$ = Pearson's cumulative test statistic.<br>
$O _i$ = the number of observations of type i.<br>
$\mathbf{N}$ = total number of observations.<br>
$E_i = N_p{_i}$ = the expected (theoretical) count of type i, asserted by the null hypothesis that the fraction of type i in the population is $p_i$<br>
$n$ = the number of cells in the table

The chi-squared statistic can then be used to calculate a p-value by comparing the value of the statistic to a chi-squared distribution. The number of degrees of freedom is equal to the number of cells $n$, minus the reduction in degrees of freedom, $p$.

Applied to this table of values:

![Table](./images/table_of_values.png)

We obtain a chi-squared value of **24.5712028585826**<br>
And a $p$ value of **0.0004098425861096696**

In [2]:
import numpy as np
from scipy.stats import chi2_contingency
obs = np.array([[90, 60, 104, 95], [30, 50, 51, 20], [30, 40, 45, 35]])

In [3]:
g, p, dof, expctd = chi2_contingency(obs, lambda_="pearson")

In [4]:
# Chi-square and p-value.
print(g, p)

24.5712028585826 0.0004098425861096696
