## Hypothesis Testing

This is the decision-making framework of **Statistics**.  
In **Data Science**, we use this to prove that a result isn’t just due to luck.

---

## The Setup

### **Null Hypothesis ($H_0$)**
- The default assumption  
- “Nothing special is happening”  
- **Example:** The new website design has the same conversion rate as the old one

### **Alternative Hypothesis ($H_1$)**
- What we want to prove  
- **Example:** The new website design performs better

---

## Key Concept

### **P-value**
- The probability that the observed results happened by **random chance**

### **Rule of Thumb**
- If **p-value < 0.05 (5%)**, the result is **statistically significant**
- We **reject the Null Hypothesis ($H_0$)**

---

## Analogy

Imagine you flip a coin **10 times** and get **10 heads**.

- **$H_0$:** The coin is fair  
- **Observation:** Getting 10 heads in a row is extremely unlikely  
  - p-value ≈ **0.001**
- **Conclusion:**  
  - Reject **$H_0$**  
  - The coin is likely **rigged** (**$H_1$**)


In [1]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### ***T-test***

In [None]:
from scipy import stats

# 1. Generate Synthetic Data
np.random.seed(42) # Ensures we get the same random numbers every time
class_A_scores = np.random.normal(75, 10, 30)  # Mean 75, Std 10
class_B_scores = np.random.normal(82, 10, 30)  # Mean 82, Std 10 (Slightly better)

# 2. Perform T-test
t_stat, p_value = stats.ttest_ind(class_A_scores, class_B_scores)

print(f"Class A Mean: {np.mean(class_A_scores):.2f}")
print(f"Class B Mean: {np.mean(class_B_scores):.2f}")
print(f"T-statistic: {t_stat:.2f}")
print(f"P-value: {p_value:.4f}")

# 3. Interpret
alpha = 0.05
if p_value < alpha:
    print("Result: Reject Null Hypothesis. The difference is statistically significant.")
else:
    print("Result: Fail to reject Null Hypothesis. The difference could be due to chance.")

Class A Mean: 73.12
Class B Mean: 80.79
T-statistic: -3.24
P-value: 0.0020
Result: Reject Null Hypothesis. The difference is statistically significant.


In [None]:
from scipy import stats

# 1. Generate Synthetic Data
np.random.seed(42) # Ensures we get the same random numbers every time
class_A_scores = np.random.normal(75, 10, 30)  # Mean 75, Std 10
class_B_scores = np.random.normal(76, 10, 30)  # Mean 76, Std 10 (Slightly better)

# 2. Perform T-test
t_stat, p_value = stats.ttest_ind(class_A_scores, class_B_scores)

print(f"Class A Mean: {np.mean(class_A_scores):.2f}")
print(f"Class B Mean: {np.mean(class_B_scores):.2f}")
print(f"T-statistic: {t_stat:.2f}")
print(f"P-value: {p_value:.4f}")

# 3. Interpret
alpha = 0.05
if p_value < alpha:
    print("Result: Reject Null Hypothesis. The difference is statistically significant.")
else:
    print("Result: Fail to reject Null Hypothesis. The difference could be due to chance.")

Class A Mean: 73.12
Class B Mean: 74.79
T-statistic: -0.71
P-value: 0.4828
Result: Fail to reject Null Hypothesis. The difference could be due to chance.
