# Chi-Square Test (Test of Independence)

---

## Sample Data
We will use the following contingency table for demonstration:

| Group   | Success | Failure |
|---------|---------|---------|
| A       |   12    |    5    |
| B       |   20    |   10    |
| C       |   18    |    7    |

---

## Definition
The **Chi-Square Test of Independence** is a statistical test used to determine whether there is a significant association between two categorical variables.  

It tests the **null hypothesis (H₀)**: the variables are independent (no relationship).  
Alternative hypothesis (H₁): the variables are dependent (there is a relationship).  

---

## Mathematical Formula

The Chi-square statistic is given by:

$$
\chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}
$$

Where:

- \(O_{ij}\) = observed frequency in cell (i,j)  
- \(E_{ij}\) = expected frequency in cell (i,j), computed as:

$$
E_{ij} = \frac{(\text{row total}_i)(\text{column total}_j)}{N}
$$

Degrees of freedom:  

$$
df = (r - 1)(c - 1)
$$

where \(r\) = number of rows, \(c\) = number of columns, and \(N\) = total sample size.  

---

## Usage
1. To test whether two categorical variables are related or independent.

2. Used when working with frequency tables (contingency tables).

3. Commonly applied in classification problems, survey analysis, and categorical data modeling.

## Applications
1. Market Research: To see if customer preferences depend on gender, age, or location.

2. Medical Studies: To check if a treatment outcome is related to patient groups.

3. Machine Learning (Feature Selection): To test independence between categorical features and target labels.

4. Sociology & Psychology: To study relationships between demographics and behaviors.

In [2]:
# Computerized Formula (Programming Perspective)

# In Python, we can use `scipy.stats.chi2_contingency`:

# ```python
import numpy as np
from scipy.stats import chi2_contingency

# Example contingency table
table = np.array([
    [12,  5],   # Group A
    [20, 10],   # Group B
    [18,  7]    # Group C
])

chi2_stat, p_value, dof, expected = chi2_contingency(table)

print("Chi-square statistic:", chi2_stat)
print("p-value:", p_value)
print("Degrees of freedom:", dof)
print("Expected frequencies:\n", expected)


alpha = 0.05
if(p_value < alpha):
    print("Rejected")
else:
    print("Accepted")

Chi-square statistic: 0.19651764705882366
p-value: 0.9064142720501266
Degrees of freedom: 2
Expected frequencies:
 [[11.80555556  5.19444444]
 [20.83333333  9.16666667]
 [17.36111111  7.63888889]]
Accepted
