# **Hypothesis Testing in Python**
**Author**: Your Name  
**Date**: 2024-06-10

## **Table of Contents**  
1. [Introduction](#Introduction)  
2. [Key Concepts](#Key-Concepts)  
3. [Types of Hypothesis Tests](#Types-of-Hypothesis-Tests)  
4. [Choosing the Right Test](#Choosing-the-Right-Test)  
5. [Conclusion](#Conclusion)

## **1. Introduction**  
Hypothesis testing is a fundamental technique in statistics that helps determine whether there is enough evidence in a sample to infer a particular property about a population. This notebook explains and demonstrates several hypothesis tests using Python.


## **2. Key Concepts**  
- **Null Hypothesis (H₀)**: The default assumption (e.g., no difference or effect).  
- **Alternative Hypothesis (H₁)**: Indicates the presence of an effect or difference.  
- **p-value**: The probability of obtaining the observed result, or more extreme, if H₀ is true.  
- **α (alpha)**: Significance level, usually set to 0.05.  
- **Test Statistic**: A value used to decide whether to reject H₀.  


## **3. Types of Hypothesis Tests**

In [None]:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns


# Sample Data Generation
np.random.seed(42)

# One-Sample t-test
print("1. One-Sample t-test:")
data = np.random.normal(loc=50, scale=10, size=30)
t_stat, p_val = stats.ttest_1samp(data, popmean=55)
print(f"T-statistic: {t_stat:.3f}, p-value: {p_val:.3f}\n")

# Two-Sample Independent t-test
print("2. Two-Sample Independent t-test:")
group1 = np.random.normal(loc=60, scale=8, size=30)
group2 = np.random.normal(loc=65, scale=8, size=30)
t_stat, p_val = stats.ttest_ind(group1, group2)
print(f"T-statistic: {t_stat:.3f}, p-value: {p_val:.3f}\n")

# Paired t-test
print("3. Paired t-test:")
before = np.random.normal(loc=70, scale=5, size=20)
after = before + np.random.normal(loc=2, scale=2, size=20)
t_stat, p_val = stats.ttest_rel(before, after)
print(f"T-statistic: {t_stat:.3f}, p-value: {p_val:.3f}\n")

# ANOVA
print("4. ANOVA (Analysis of Variance):")
group_a = np.random.normal(loc=55, scale=5, size=30)
group_b = np.random.normal(loc=60, scale=5, size=30)
group_c = np.random.normal(loc=65, scale=5, size=30)
f_stat, p_val = stats.f_oneway(group_a, group_b, group_c)
print(f"F-statistic: {f_stat:.3f}, p-value: {p_val:.3f}\n")

# Chi-Square Test of Independence
print("5. Chi-Square Test of Independence:")
obs = np.array([[10, 20, 30], [6,  9, 17]])
chi2, p_val, dof, expected = stats.chi2_contingency(obs)
print(f"Chi2 Statistic: {chi2:.3f}, p-value: {p_val:.3f}, Degrees of Freedom: {dof}\n")

# Wilcoxon Signed-Rank Test
print("6. Wilcoxon Signed-Rank Test:")
before = np.random.normal(loc=100, scale=10, size=25)
after = before + np.random.normal(loc=1, scale=5, size=25)
stat, p_val = stats.wilcoxon(before, after)
print(f"Wilcoxon Statistic: {stat:.3f}, p-value: {p_val:.3f}\n")

# Mann-Whitney U Test
print("7. Mann-Whitney U Test:")
group1 = np.random.normal(loc=80, scale=10, size=40)
group2 = np.random.normal(loc=85, scale=10, size=40)
u_stat, p_val = stats.mannwhitneyu(group1, group2)
print(f"U Statistic: {u_stat:.3f}, p-value: {p_val:.3f}\n")


plt.figure(figsize=(5, 5))
sns.histplot(group1, kde=True, stat="density", linewidth=0)
plt.title('Histogram with Density Plot')
plt.show()


## **4. Choosing the Right Test**  
| Test | When to Use | Parametric? | Function |
|------|-------------|-------------|----------|
| One-Sample t-test | Compare sample mean to known value | Yes | `ttest_1samp` |
| Two-Sample t-test | Compare means of 2 independent groups | Yes | `ttest_ind` |
| Paired t-test | Compare means of related groups | Yes | `ttest_rel` |
| ANOVA | Compare means of 3+ groups | Yes | `f_oneway` |
| Chi-Square | Compare frequencies | No | `chi2_contingency` |
| Wilcoxon | Paired non-parametric test | No | `wilcoxon` |
| Mann-Whitney U | Independent non-parametric test | No | `mannwhitneyu` |


## **5. Conclusion**  
This notebook provided an overview of hypothesis testing in Python using real statistical tests. Always evaluate assumptions such as normality and sample size before choosing a test.
