<a href="https://colab.research.google.com/github/Rishabh9559/Data_science/blob/main/Phase%201%20Statistical%20Foundations%20for%20Data%20Science/7%20Chi-square%20test/chi_square_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### ✅ What is the **Chi-Square Test (χ² Test)?**

The **Chi-Square test** is a **non-parametric** statistical test used to determine whether there's a **significant association** between **categorical variables**.

---

## 🔍 Types of Chi-Square Tests:

| Type                        | Purpose                                                  | Example                                 |
| --------------------------- | -------------------------------------------------------- | --------------------------------------- |
| **1. Goodness of Fit Test** | Checks if data fits a **specific expected distribution** | Is a die fair?                          |
| **2. Test of Independence** | Tests if **two categorical variables** are related       | Is gender related to voting preference? |

---

## 📐 **Formula (Both Tests Use the Same Core Formula):**

$$
\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}
$$

* $O_i$: Observed frequency
* $E_i$: Expected frequency

---

## 🔢 1. **Chi-Square Goodness of Fit Test**

### 🎯 Use Case:

Does the observed frequency match the **expected (theoretical) distribution**?

---

### 📌 Example:

You roll a die 60 times. Are the outcomes (1–6) **equally likely**?

| Face | Observed (O) | Expected (E = 60/6 = 10) |
| ---- | ------------ | ------------------------ |
| 1    | 8            | 10                       |
| 2    | 12           | 10                       |
| …    | …            | …                        |

Apply formula and compare with chi-square critical value or p-value.

---

## 🔢 2. **Chi-Square Test of Independence**

### 🎯 Use Case:

Are **two categorical variables** related?

---

### 📌 Example:

Do **gender** and **product preference** have a relationship?

|        | Product A | Product B | Total |
| ------ | --------- | --------- | ----- |
| Male   | 20        | 30        | 50    |
| Female | 30        | 20        | 50    |
| Total  | 50        | 50        | 100   |

1. Compute expected frequencies:

$$
E_{ij} = \frac{(\text{Row Total}) \cdot (\text{Column Total})}{\text{Grand Total}}
$$

2. Use $\chi^2 = \sum \frac{(O - E)^2}{E}$

3. Compare with chi-square distribution with:

   $$
   \text{df} = (r - 1)(c - 1)
   $$

---

## 🧠 Hypotheses:

| Test                 | Null Hypothesis (H₀)               | Alternative Hypothesis (H₁)          |
| -------------------- | ---------------------------------- | ------------------------------------ |
| Goodness of Fit      | Data follows expected distribution | Data does not follow expected        |
| Test of Independence | Variables are independent          | Variables are dependent (associated) |

---

## 📌 Requirements:

* Data must be **counts**, not percentages or means.
* **Expected frequencies** should be ≥ 5 in most cells.
* Observations must be **independent**.

---



In [None]:
import scipy.stats as stats
import pandas as pd

# Contingency table
data = [[20, 30],
        [30, 20]]

chi2, p, dof, expected = stats.chi2_contingency(data)

print("Chi-square:", chi2)
print("Degrees of Freedom:", dof)
print("P-value:", p)
print("Expected Frequencies:\n", expected)

Chi-square: 3.24
Degrees of Freedom: 1
P-value: 0.07186063822585143
Expected Frequencies:
 [[25. 25.]
 [25. 25.]]


In [None]:
import numpy as np
import scipy.stats as stats

observed_data = np.array([[50, 30, 20],
                         [30, 40, 30],
                         [20, 30, 50]])

# Perform the chi-square test
chi2_stats, p_value, df, expected = stats.chi2_contingency(observed_data)

chi2_stats, p_value, df, expected



(np.float64(30.0),
 np.float64(4.894437128029217e-06),
 4,
 array([[33.33333333, 33.33333333, 33.33333333],
        [33.33333333, 33.33333333, 33.33333333],
        [33.33333333, 33.33333333, 33.33333333]]))

In [None]:
alpha = 0.05

if p_value < alpha:
  print('Reject the null hypothesis meaning different age groups have different preferences of the products')
else:
  print('Fail to reject the null hypothesis meaning different age groups have similar preferences of the products')

Reject the null hypothesis meaning different age groups have different preferences of the products


## **A/B Testing**

**A/B testing**, also known as **split testing**, is a method of comparing two versions of a webpage, app, or other digital asset to see which performs better based on a specific metric.

It involves showing different users different versions of an element (like a button color or headline) and tracking which version leads to a higher conversion rate or other desired outcome.

example

- website
- email marketing
- contant marketing
- Ads
- click rates on header or ads etc