# üìò Chi-Square Goodness of Fit Test

---

### üîπ **Purpose**
The Chi-Square Goodness of Fit Test is used to determine whether  
the **observed frequencies (actual data)** differ significantly from  
the **expected frequencies (theoretical or assumed data)**.

---

### üîπ **Formula**

$$
\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}
$$

Where:  
- \( O_i \) = Observed frequency for category *i*  
- \( E_i \) = Expected frequency for category *i*  
- \( \chi^2 \) = Chi-Square statistic  

---

### üîπ **Hypotheses**

- **Null Hypothesis (H‚ÇÄ):** Observed distribution = Expected distribution  
- **Alternative Hypothesis (H‚ÇÅ):** Observed distribution ‚â† Expected distribution  

---

### üîπ **Decision Rule**

1. Calculate \( $\chi^2$ \) using the formula.  
2. Find the **critical value** from the Chi-Square distribution table  
   based on your significance level (Œ± = 0.05, for example)  
   and degrees of freedom \( df = k - 1 \) (where *k* = number of categories).  
3. If \( \chi^2_{calculated} > \chi^2_{critical} \):  
   ‚Üí **Reject H‚ÇÄ** (there is a significant difference).  
4. Otherwise:  
   ‚Üí **Fail to Reject H‚ÇÄ** (no significant difference).

***


In [1]:
import pandas as pd
import random
import numpy as np
from scipy.stats import chi2, stats

In [2]:
#Simulate dice rolls (random experiment)
# Here we roll a fair dice 120 times, so each number (1‚Äì6) can appear randomly.
dice_roll = []
for i in range(120):
    dice_roll.append(random.randint(1, 6))


In [3]:
#Count the frequency of each face (1 to 6)
observed = pd.Series(dice_roll).value_counts().sort_index()
observed

1    26
2    22
3    17
4    21
5    16
6    18
Name: count, dtype: int64

In [4]:
# Define the expected frequency for a fair dice
expected = np.full(6, len(dice_roll) / 6)
expected

array([20., 20., 20., 20., 20., 20.])

In [5]:
def goodness_of_fit(observed, expected):
    chi_square = 0
    for o, e in zip(observed, expected):
        chi_square += ((o - e) ** 2) / e
        
    P_value = chi2.cdf(chi_square, (len(observed) - 1))
    return chi_square, P_value

In [6]:
print(goodness_of_fit(observed, expected))
if (1 - goodness_of_fit(observed, expected)[1]) < 0.05:
    print("‚ùå Dice may not be fair (not uniform).")
else:
    print("‚úÖ Dice is likely fair (uniform distribution).")

(3.5, 0.3766123722504178)
‚úÖ Dice is likely fair (uniform distribution).


## üßæ Chi-Square Test Example: Shopkeeper‚Äôs Sales Analysis

---

### üéØ Problem Statement

A shopkeeper sells three items ‚Äî **shirts**, **pants**, and **T-shirt**.  
He claims that all three items sell equally well.  

We will test his claim using **one day‚Äôs sales data** to see  
whether the sales follow a **uniform distribution** (i.e., all items sell equally) or not.

---

In [51]:
products = (['shirt'] * 80) + (['pants'] * 70) + (['t-shirt'] * 50)

In [52]:
data = pd.DataFrame({'Products' : products})
data.head()

Unnamed: 0,Products
0,shirt
1,shirt
2,shirt
3,shirt
4,shirt


In [53]:
data['Products'].value_counts()

Products
shirt      80
pants      70
t-shirt    50
Name: count, dtype: int64

In [54]:
df = data['Products'].value_counts().reset_index()
df.columns = ['Products', 'Observed Sales']

In [55]:
df['Expected'] = int(200/ 3)

In [56]:
df.head()

Unnamed: 0,Products,Observed Sales,Expected
0,shirt,80,66
1,pants,70,66
2,t-shirt,50,66


In [None]:
def goodness_of_fit(observed, expected):
    chi_square = 0
    for o, e in zip(observed, expected):
        chi_square += ((o - e) ** 2) / e
        
    P_value = chi2.cdf(chi_square, (len(observed) - 1))
    return chi_square, P_value



print(goodness_of_fit(df['Observed Sales'], df['Expected']))
if (1 - goodness_of_fit(df['Observed Sales'],df['Expected'])[1]) < 0.05:
    print("‚ùå The shopkeeper's claims rejected.")
else:
    print("‚úÖ The shopkeeper's claims accpted.")

(7.090909090909091, 0.9711444966096118)
‚ùå Dice may not be fair (not uniform).
