# Hypothesis Testing


## 1. Check if two categorical variables are independent using a contingency table.


In [2]:
import pandas as pd
from scipy.stats import chi2_contingency

# Example data
data = {'Gender': ['Male', 'Male', 'Female', 'Female'],
        'Preference': ['Pizza', 'No Pizza', 'Pizza', 'No Pizza'],
        'Count': [30, 10, 25, 35]}

df = pd.DataFrame(data)

# Pivot into contingency table
contingency_table = df.pivot(index='Gender', columns='Preference', values='Count')

# Chi-Square Test
chi2, p, dof, expected = chi2_contingency(contingency_table)

print(f"Chi2 Statistic = {chi2}")
print(f"P-Value = {p}")

if p < 0.05:
    print("Reject null hypothesis → Variables are dependent.")
else:
    print("Fail to reject null → Variables are likely independent.")


Chi2 Statistic = 9.469696969696969
P-Value = 0.0020889387721520552
Reject null hypothesis → Variables are dependent.


## 2. Compare variances of two or more groups.

Question: checking if two or more groups have the same variability (spread) in their data
* Levene's Test is preferred when data may not be normally distributed.

* Bartlett’s Test is more powerful but assumes normality.



📌 Null Hypothesis (H₀):
All groups have equal variances.

📌 Alternative Hypothesis (H₁):
At least one group has different variance.

In [3]:
import numpy as np
from scipy.stats import levene, bartlett

# Simulated data from 3 groups
group1 = np.random.normal(50, 10, 100)  # mean=50, std=10
group2 = np.random.normal(60, 15, 100)  # different std
group3 = np.random.normal(55, 10, 100)

# Levene's test (more robust)
stat, p = levene(group1, group2, group3)
print("Levene’s test → stat:", stat, "p-value:", p)

# Bartlett’s test (assumes normality)
stat2, p2 = bartlett(group1, group2, group3)
print("Bartlett’s test → stat:", stat2, "p-value:", p2)

Levene’s test → stat: 18.929674266359328 p-value: 1.8292041082089777e-08
Bartlett’s test → stat: 54.79473324888809 p-value: 1.2632079676909706e-12


Interpretation:
If p > 0.05 → fail to reject H₀ → variances are equal

If p ≤ 0.05 → reject H₀ → at least one variance is different

## 3. Perform a hypothesis test and write conditional logic to "Reject" or "Fail to reject" based on p-value.

### One sample T-test

In [5]:
import numpy as np
from scipy import stats


sample_data=np.array([90,70,93,85,75,98,35,45,70,85,56,76,89,94,54,36,78,77,89,60])
hypothesised_mean=75
t_stats,p_values=stats.ttest_1samp(sample_data,hypothesised_mean)
print(f"p_value is {p_values}")
if p_values<0.05:
    print("Rejected the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

p_value is 0.6072402083246451
Fail to reject the null hypothesis


## 4. Test whether the average height of a group is different from a given population mean.

In [6]:
import numpy as np
from scipy import stats

# Sample data (e.g. measured heights of 30 people)
heights = np.random.normal(loc=168, scale=5, size=30)

# Population mean (assumed or given)
population_mean = 165

# Perform one-sample t-test
t_stat, p_value = stats.ttest_1samp(heights, population_mean)

print(f"T-statistic: {t_stat:.3f}")
print(f"P-value: {p_value:.3f}")


T-statistic: 2.926
P-value: 0.007


If p-value < 0.05 → Reject H₀ → Sample mean is significantly different from 165

If p-value ≥ 0.05 → Fail to reject H₀ → Sample mean is not significantly different