# 🎯 Purpose of the Chi-Squared Goodness-of-Fit Test

**Question: Does a single categorical variable follow a theoretical distribution?**

**It tests whether observed categorical frequencies differ significantly from what we would expect under a theoretical distribution (e.g., a fair die).** 

**Hypotheses:**  

- *Null hypothesis ($H_0$):* The observed frequencies match the expected ones.  
- *Alternative hypothesis ($H_1$):* The observed frequencies are different from the expected ones.

## Step 1 — Chi-squared Statistic Formula

- $O_i = \text{observed frequency for category} \ i$  
- $E_i = \text{expected frequency for category} \ i$
- $k = \text{number of categories}$

The chi-squared statistic is computed as:

$$\chi^2 = \sum_{i=1}^k\dfrac{(O_i - E_i)^2}{E_i}$$

This measures the total squared deviation between observed and expected frequencies, normalized by the expected count.

Example:

Let's say you roll a die 600 times and get:

In [1]:
observed = [90, 100, 110, 100, 105, 95]
expected = [100, 100, 100, 100, 100, 100]

Apply the formula:  

$\chi^2 = \dfrac{(90-100)^2}{100} + \dfrac{(100-100)^2}{100} + \dfrac{(110-100)^2}{100} + \ldots + \dfrac{(95-100)^2}{100}$


$\chi^2 = \dfrac{100}{100} + \dfrac{0}{100} + \dfrac{100}{100} + \dfrac{0}{100} + \dfrac{25}{100} + \dfrac{25}{100} = 2.5$

## Step 2 — Compute the p-value

Once you have the chi-squared statistic $\chi^2 = 2.5$, you calculate the p-value using the chi-squared distribution with: 

- degrees of freedom (df) = k − 1 = 6 − 1 = 5  

In Python:

In [2]:
from scipy.stats import chi2
p_value = 1 - chi2.cdf(2.5, df=5)

This gives the probability that a chi-squared value $\ge 2.5$ occurs by chance, assuming the null hypothesis is true.

📌 **Summary**

|Step|What it does ?|
|------|------------|
|1. Chi-squared stat | Measures how far observed counts devriate from expected|
|2. Degress of freedom | Number of categories minus 1|
|3. p-value| Probability of seeing that deviation (or worse) by chance|
|4. Decision| Reject $H_0$ if p-value < threshold (e.g., 0.05)|

**END**