# Understanding the Chi-Square Test ($\chi^2$)

The Chi-Square test is used when you are dealing with labels or categories (e.g., Gender, Color, Location, Choice) rather than continuous numbers.

---

## 1. Types of Chi-Square Tests

There are two primary versions used in data science:

### A. Goodness of Fit Test
Determines if a sample data matches a population with a specific distribution.
* **Example:** Testing if a six-sided die is fair (does each number come up 1/6th of the time?).
* **Hypothesis:** $H_0$: The data follows the expected distribution.



### B. Test of Independence
Determines if there is a significant relationship between two categorical variables from the same population.
* **Example:** Is there a relationship between "Gender" and "Streaming Service Preference"? 
* **Hypothesis:** $H_0$: The two variables are independent (no relationship).



---

## 2. How it Works (The Formula)
The test compares **Observed** counts ($O$) to **Expected** counts ($E$)â€”which is what we would expect to see if the Null Hypothesis were true.

$$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$$

* **Small $\chi^2$:** Observed and Expected values are very close (No relationship).
* **Large $\chi^2$:** Observed and Expected values are very different (Significant relationship).



---

## 3. Assumptions for Chi-Square
1. **Categorical Data:** The variables must be nominal or ordinal.
2. **Independence:** Each observation must be independent of others.
3. **Large Sample Size:** Every "cell" in your table should have an expected frequency of at least **5**. If your counts are too small, the test becomes unreliable.

---

## 4. Python Implementation: Test of Independence
Using `scipy.stats.chi2_contingency` to check if "Department" affects "Job Satisfaction."


In [1]:

import numpy as np
from scipy.stats import chi2_contingency

# 1. Create a Contingency Table (Observed Frequencies)
# Columns: Satisfied, Neutral, Dissatisfied
# Rows: Sales, Tech, HR
observed = np.array([
    [50, 30, 20],  # Sales
    [70, 20, 10],  # Tech
    [30, 40, 30]   # HR
])

# 2. Perform Chi-Square Test
chi2, p_val, dof, expected = chi2_contingency(observed)

print(f"Chi-Square Statistic: {chi2:.4f}")
print(f"P-value: {p_val:.4f}")
print(f"Degrees of Freedom: {dof}")

# 3. Decision
if p_val < 0.05:
    print("Conclusion: Reject H0. There is a significant relationship between Department and Satisfaction.")
else:
    print("Conclusion: Fail to Reject H0. Department does not significantly affect Satisfaction.")

Chi-Square Statistic: 32.6667
P-value: 0.0000
Degrees of Freedom: 4
Conclusion: Reject H0. There is a significant relationship between Department and Satisfaction.
