# Hypothesis Testing

In [34]:
from scipy import stats
import statsmodels.api as sm
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.stats.weightstats import ztest
from statsmodels.formula.api import ols
import math
import numpy as np
import pandas as pd

## Z-test

### **Note**


**Z-Test Quick Revision Note**

**Definition**:  

A Z-test is a statistical method used to determine if there is a significant difference between the means of two groups, particularly when the sample size is large (typically \( n > 30 \)) and the population variance is known. It helps assess hypotheses about population parameters when data follows a normal distribution.

**Equation**:

For a Z-test, the equation is:

Z = (X bar - mu) / (sigma / root(n))

where:
-  X bar = sample mean,
-  mu = population mean,
-  sigma  = population standard deviation,
-  n  = sample size.


**Uses**:
1. **Testing Population Mean**: Used when comparing a sample mean to a known population mean.
2. **Comparing Two Means**: To test the difference between two sample means when population variance is known.
3. **Large Samples**: Effective for large sample sizes, as it relies on the Central Limit Theorem for normal approximation.

### **Practical**

One-sample Z-test(Two tail test)

**Problem** <br> Suppose a factory claims that the average weight of a pack of chips is 200 grams. A customer suspects that the actual average weight might be different. To test this, the customer randomly selects 50 packs and finds that their average weight is 198 grams with a standard deviation of 10 grams. <br>


State the Hypotheses:

Null hypothesis (𝐻0): The average weight is 200 grams (𝜇=200). <br>
Alternative hypothesis (𝐻1): The average weight is not 200 grams (𝜇≠200).

In [35]:
sample_mean = 198
population_mean = 200
sample_std_dev = 10
sample_size = 50
alpha = 0.05

z_value = (sample_mean - population_mean) / (sample_std_dev / math.sqrt(sample_size))
print(f" Z value is: {z_value}")
p_value = 2 * (1 - stats.norm.cdf(abs(z_value)))
print(f" P value is: {p_value}")

if p_value < alpha:
    print(" We reject null hypothesis and accept alternative hypothesis because p-value is less than significant value(alpha). So we conclude that there is no significant relationship between weights")
else:
    print(" We fail to reject null hypothesis, P value is greater than significant value(alpha),.So we conclude that there is a significant relationship between weights")

 Z value is: -1.4142135623730951
 P value is: 0.157299207050285
 We fail to reject null hypothesis, P value is greater than significant value(alpha),.So we conclude that there is a significant relationship between weights


Using stats funcion

In [36]:
random_values = np.random.normal(loc=sample_mean,scale=sample_std_dev, size=50)
z_statistic, p_value = ztest(random_values, value=population_mean)
z_statistic, p_value

(-1.8309127460224979, 0.06711356756485325)

Two-sample Z-test

**Problem**

 
Suppose you work for a company that manufactures two different types of light bulbs. The company wants to compare the average lifespans of the two light bulbs.
<br> You take two independent random samples:

Sample 1: 100 bulbs of type A, with an average lifespan of 1500 hours and a standard deviation of 300 hours. <br>
Sample 2: 120 bulbs of type B, with an average lifespan of 1450 hours and a standard deviation of 320 hours. <br>
You want to test at a 5% significance level (α=0.05) whether there is a significant difference between the lifespans of the two types of bulbs.

H0: There is no significant difference b/w them(mu1 = mu2) <br>
H1: There is significant difference b/w two bulbs(mu1 not equals mu2)<br>
It is a two-tail test. so p = 2 * ( 1 - cdf(abs(z_stati)))

In [37]:
# sample 1
n1 = 100
mean1 = 1500
std1 = 300
# sample 2
n2 = 120
mean2 = 1450
std2 = 320
aplha = 0.05

In [38]:
z_value = (mean1 - mean2) / math.sqrt(((std1 ** 2) / n1) + ((std2 ** 2) / n2))
print(f"Z value is: {z_value}")
p_value = 2 * (1 - stats.norm.cdf(abs(z_value)))
print(f"P value is: {p_value}")
if p_value < aplha:
    print("Reject Null hypothesis")
else:
    print("Fail to reject null hypothesis")

Z value is: 1.1940919199575821
P value is: 0.2324420135643246
Fail to reject null hypothesis


In [39]:
sample1 = np.random.normal(loc=mean1,scale=std1,size=n1)
sample2 = np.random.normal(loc=mean2,scale=std2,size=n2)

In [40]:
z_score_two_sample, p_value_two_sample = ztest(sample1, sample2)
z_score_two_sample, p_value_two_sample

(1.2604319692904795, 0.2075135750405187)

**Paired test**

A paired Z-test is used for comparing the means of two related groups, such as measurements taken on the same subjects before and after an intervention.

Define hypotheses: <br>
H0: 𝜇𝑑 = 0 (The mean of the differences is zero) <br>
H1: ud not equals to 0

In [41]:
before = np.array([20, 18, 19, 22, 20, 23, 21, 22])
after = np.array([18, 17, 16, 20, 19, 21, 20, 19])

differences = before - after
mean_diff = np.mean(differences)
std_diff = np.std(differences, ddof=1)
n = len(differences)

z_score_paired = mean_diff / (std_diff / np.sqrt(n))

# Calculate the p-value (two-tailed)
p_value_paired = 2 * (1 - stats.norm.cdf(abs(z_score_paired)))

print(f"Paired Z-test Z-score: {z_score_paired}")
print(f"Paired Z-test p-value: {p_value_paired}")


Paired Z-test Z-score: 6.354889093022426
Paired Z-test p-value: 2.085771555471183e-10


## T-test

### Note

T-Test: Quick Revision
A T-test is a statistical test used to compare the means of one or two groups, helping determine if there is a statistically significant difference between them. It's commonly used when the sample size is small or when the population standard deviation is unknown.

for one sample test: t = X bar - mu / (s/root(n)) <br>
for two sample test: t = X1 bar - X2 bar / root(square(sp2 (1/n1 + 1/n2) )) <br>
for paired sample test: t = D bar / (Sd / root(n))

**Uses of T-Tests** <br>
Hypothesis Testing: Used to test hypotheses about population means.<br>
Small Samples: Suitable for small samples (generally 
𝑛
<
30.)<br>
Unknown Population Variance: Often used when population variance is unknown. <br>


**Assumptions**<br>
Normality: Data should follow a normal distribution (especially important for small samples).<br>
Independence: Observations should be independent.<br>
Equal Variances (for two-sample t-tests): Known as homogeneity of variance; applies when comparing two groups.

### Practical

**One sample test**

In [42]:
sample_data = [5.3, 6.8, 7.2, 5.7, 6.1, 5.9, 6.3]
population_mean = 6.0

Scratch Implementation

In [43]:
mean = np.mean(sample_data)
std = np.std(sample_data,ddof=1)
n = len(sample_data)
t_value = (mean - population_mean) / (std/math.sqrt(n))
df = n -1
p_value = 2 * stats.t.sf(abs(t_value), df)
print("t-statistic:", t_value)
print("p-value:", p_value)

t-statistic: 0.7568892626614537
p-value: 0.477774677079275


Using stats

In [44]:
t_stat, p_value = stats.ttest_1samp(sample_data, population_mean)

print("One-Sample t-Test")
print("t-statistic:", t_stat)
print("p-value:", p_value)

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")

One-Sample t-Test
t-statistic: 0.7568892626614537
p-value: 0.477774677079275
Fail to reject the null hypothesis.


**Two sample**

In [45]:
sample_group1 = np.array([5.1, 6.2, 7.4, 6.9, 5.8])
sample_group2 = np.array([5.9, 6.3, 6.8, 6.2, 6.5])

Scratch Implementation

In [46]:
# (n1 - 1) * s1**2 + (n2 - 1) * s2**2 / (n1 + n2 - 2).

In [47]:
# for two sample test: t = X1 bar - X2 bar / root(square(sp2 (1/n1 + 1/n2) )) <br>
x1_mean = np.mean(sample_group1)
x2_mean = np.mean(sample_group2)
s1 = np.std(sample_group1,ddof=1)
s2 = np.std(sample_group2,ddof=1)
n1 = len(sample_group1)
n2 = len(sample_group2)
sp = np.sqrt(((n - 1) * s1**2 + (n2 - 1)* s2**2)) / (n1 + n2 - 2)
t_value = (x1_mean - x2_mean) / (sp * np.sqrt(1/n1 + 1/n2))
t_value

-0.3279986515638687

Using stats

In [48]:
t_stat, p_value = stats.ttest_ind(sample_group1, sample_group2)

print("\nTwo-Sample t-Test")
print("t-statistic:", t_stat)
print("p-value:", p_value)

# Decision
if p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")



Two-Sample t-Test
t-statistic: -0.13912166872804954
p-value: 0.892792491799274
Fail to reject the null hypothesis.


**Paired t-test**

In [49]:
before_treatment = np.array([5.7, 6.2, 5.8, 6.1, 6.0])
after_treatment = np.array([6.1, 6.4, 6.0, 6.2, 6.3])

In [50]:
n = len(after_treatment)
d = before_treatment - after_treatment
d_bar = np.mean(d)
d_std = np.std(d,ddof=1)
t_value = d_bar / (d_std / math.sqrt(n))
df = n- 1
p_value = 2 * stats.t.sf(abs(t_value), df)
print(f"Test Statistics: {t_value}")
print(f"P Value is: {p_value}")

Test Statistics: -4.706787243316435
P Value is: 0.0092616967595143


In [51]:
t_stat, p_value = stats.ttest_rel(before_treatment, after_treatment)

print("\nPaired t-Test")
print("t-statistic:", t_stat)
print("p-value:", p_value)

if p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")


Paired t-Test
t-statistic: -4.706787243316435
p-value: 0.0092616967595143
Reject the null hypothesis.


## Chi Square-test

### Note

Definition:
The chi-square test is a statistical method used to assess whether there is a significant association between categorical variables. It compares observed frequencies with expected frequencies if there was no relationship.

Types of Chi-Square Tests:

Test of Independence: Determines if two categorical variables (e.g., gender and movie preference) are independent.
Goodness of Fit Test: Tests if observed data matches an expected distribution.
Formula:

    𝜒2 = ∑(𝑂 −𝐸)**2/𝐸

<br>
O: Observed frequency <br>
𝐸: Expected frequency

<br>

Use Case: In a survey of voters by region, the chi-square test of independence can reveal if voting preferences are associated with specific regions.

### Practical

**Chi-Square Test of Independence**

Create data

In [52]:
# Sample data (contingency table)
# Rows represent genders, columns represent product preference (Product A, Product B)
# data = np.array([[30, 10],[29, 15]])
data = np.array([[30, 15],[15, 25]])
df_observed = pd.DataFrame(data, index=["Male", "Female"], columns=["Product A", "Product B"])
df_observed

Unnamed: 0,Product A,Product B
Male,30,15
Female,15,25


**From Scratch Implementation**

Degree of freedom

In [53]:
# df = (number of rows - 1) * (number of columns - 1)
df = (df_observed.shape[0]- 1) * (df_observed.shape[1] - 1)
df

1

In [54]:
df_observed_1 = df_observed.copy()

In [55]:
df_observed_1['Row_Sum'] = df_observed.sum(axis=1)
df_observed_1.loc['Col_Sum'] = df_observed.sum(axis=0)

In [56]:
df_observed_1

Unnamed: 0,Product A,Product B,Row_Sum
Male,30.0,15.0,45.0
Female,15.0,25.0,40.0
Col_Sum,45.0,40.0,


In [57]:
expected_df= pd.DataFrame()

In [58]:
row_totals = df_observed.sum(axis=1)
column_totals = df_observed.sum(axis=0)
grand_total = df_observed.values.sum()

In [59]:
for row in df_observed.index:
    for col in df_observed.columns:
        expected_df.loc[row, col] = (row_totals[row] * column_totals[col]) / grand_total

In [60]:
expected_df

Unnamed: 0,Product A,Product B
Male,23.823529,21.176471
Female,21.176471,18.823529


In [61]:
chi_square_stat = ((df_observed - expected_df) ** 2 / expected_df).sum().sum()
chi_square_stat

7.230902777777776

In [62]:
p_value = 1 - stats.chi2.cdf(chi_square_stat, df)
p_value

0.0071659163049716534

Conclusion

In [63]:
if p_value < 0.05:
    print("Reject the null hypothesis: There is an association between gender and product preference.")
else:
    print("Fail to reject the null hypothesis: No association between gender and product preference.")

Reject the null hypothesis: There is an association between gender and product preference.


**chi-square test using stats**

In [64]:
# Perform Chi-Square Test of Independence
chi2, p, dof, expected = stats.chi2_contingency(data)

print(f"Chi-Square Statistic: {chi2}")
print(f"p-value: {p}")
print(f"Degrees of Freedom: {dof}")
print("Expected Frequencies:")
print(expected)

Chi-Square Statistic: 6.107571373456789
p-value: 0.01346039651302516
Degrees of Freedom: 1
Expected Frequencies:
[[23.82352941 21.17647059]
 [21.17647059 18.82352941]]


Conclusion

In [65]:
if p < 0.05:
    print("Reject the null hypothesis: There is an association between gender and product preference.")
else:
    print("Fail to reject the null hypothesis: No association between gender and product preference.")
# Create a DataFrame for the observed counts

Reject the null hypothesis: There is an association between gender and product preference.


Expected data

In [66]:
# Create a DataFrame for the expected frequencies
df_expected = pd.DataFrame(expected, index=["Male", "Female"], columns=["Product A", "Product B"])
df_expected

Unnamed: 0,Product A,Product B
Male,23.823529,21.176471
Female,21.176471,18.823529


**Chi-Square Goodness of Fit Test**

Data Creation

Problem: Check a die is fair or not

In [67]:
observed = np.array([11, 1, 17, 9, 10, 12])

expected = np.array([10, 10, 10, 10, 10, 10])

Scratch implementation

In [68]:
df = len(observed) - 1

In [69]:
chi_value = (((observed - expected) ** 2) / expected).sum()
p_value = 1 - stats.chi2.cdf(chi_value,df)
print(f"P value is {p_value}")
print(f"Chi Square: {chi_value}")

P value is 0.01836019640951947
Chi Square: 13.6


In [70]:
if p < 0.05:
    print("Reject the null hypothesis: The die is not fair.")
else:
    print("Fail to reject the null hypothesis: The die could be fair.")

Reject the null hypothesis: The die is not fair.


Chi Square test using stats

In [71]:
chi2, p = stats.chisquare(observed, f_exp=expected)

print(f"Chi-Square Statistic: {chi2}")
print(f"p-value: {p}")

if p < 0.05:
    print("Reject the null hypothesis: The die is not fair.")
else:
    print("Fail to reject the null hypothesis: The die could be fair.")

Chi-Square Statistic: 13.6
p-value: 0.01836019640951945
Reject the null hypothesis: The die is not fair.


## ANNOVA

### Note

Definition: ANOVA (Analysis of Variance) is a statistical technique used to compare the means of three or more groups to determine if there is a statistically significant difference between them. It tests the hypothesis that the means of different groups are equal, based on the variance within each group and between the groups.

Formula: The basic formula for the F-statistic in ANOVA is:

F=  Variance within groups (MSW) / Variance between groups (MSB)
​<br>
Where: <br>
MSB (Mean Square Between) = SSB / DFB
<br>
MSW (Mean Square Within) = SSW / dfW
 
Where:

SSB (Sum of Squares Between): Measures the variation due to the difference in group means.
SSW (Sum of Squares Within): Measures the variation within each group.
dfB: Degrees of freedom between groups.
dfW: Degrees of freedom within groups.
Steps:

Calculate the mean of each group.<br>
Compute the sum of squares (SSB and SSW).<br>
Compute the mean squares (MSB and MSW).<br>
Calculate the F-statistic.<br>
Compare the F-statistic with the critical value from the F-distribution table (based on the chosen significance level).<br>

Use Case:<br>
*Example*: Testing if different teaching methods affect student performance.
Groups: Students taught using different methods (e.g., traditional, online, hybrid).
Objective: Compare the mean scores of students in different teaching methods.
If the p-value corresponding to the F-statistic is less than the significance level (e.g., 0.05), the null hypothesis is rejected, indicating that at least one group mean is significantly different from the others.<br>

*Key Points:* <br>
ANOVA is used for multiple group comparisons.
<br>
It assumes that the groups are independent, have normal distributions, and the variances are equal (homogeneity of variances).
<br>
If ANOVA shows significant differences, post-hoc tests (like Tukey’s HSD) can be performed to identify which groups differ.

### Practical

**One-Way ANNOVA**

Example Problem: Suppose we have data on student scores from three different teaching methods and want to check if the mean scores differ among the groups.

In [None]:
np.random.seed(42)
group1 = np.random.normal(75, 10, 30)
group2 = np.random.normal(80, 10, 30)
group3 = np.random.normal(78, 10, 30)

f_stat, p_value = stats.f_oneway(group1, group2, group3)

print("F-statistic:", f_stat)
print("P-value:", p_value)

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between the group means.")
else:
    print("Fail to reject the null hypothesis: No significant difference between the group means.")

F-statistic: 3.251721004064245
P-value: 0.04345972421308328
Reject the null hypothesis: There is a significant difference between the group means.


**Two-way test**

Imagine we have a dataset representing the effects of two different factors on test scores:

Factor 1: Teaching Method (Method A and Method B)<br>
Factor 2: Study Environment (Quiet and Noisy)<br>
We want to determine:

Whether there is a significant difference in test scores based on the teaching method.<br>
Whether there is a significant difference based on the study environment.<br>
Whether there is an interaction effect between the teaching method and the study environment.

In [None]:
np.random.seed(42)
n = 15

method_a_quiet = np.random.normal(85, 5, n)
method_a_noisy = np.random.normal(80, 5, n)
method_b_quiet = np.random.normal(78, 5, n)
method_b_noisy = np.random.normal(70, 5, n)

data = pd.DataFrame({
    'Score': np.concatenate([method_a_quiet, method_a_noisy, method_b_quiet, method_b_noisy]),
    'Method': ['A'] * (2 * n) + ['B'] * (2 * n),
    'Environment': ['Quiet'] * n + ['Noisy'] * n + ['Quiet'] * n + ['Noisy'] * n
})

print(data.head())

       Score Method Environment
0  87.483571      A       Quiet
1  84.308678      A       Quiet
2  88.238443      A       Quiet
3  92.615149      A       Quiet
4  83.829233      A       Quiet


(60, 3)

In [None]:
model = ols('Score ~ C(Method) + C(Environment) + C(Method):C(Environment)', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)


                               sum_sq    df          F        PR(>F)
C(Method)                 1000.027450   1.0  47.902002  4.671675e-09
C(Environment)             680.629864   1.0  32.602638  4.484464e-07
C(Method):C(Environment)     0.928770   1.0   0.044489  8.337126e-01
Residual                  1169.085516  56.0        NaN           NaN


In [None]:
def print_conclusions(anova_table, data, alpha=0.05):
    print("\nConclusions:")
    
    if anova_table.loc['C(Method)', 'PR(>F)'] < alpha:
        print(f"- The teaching method has a significant effect on the scores (p-value = {anova_table.loc['C(Method)', 'PR(>F)']:.4f}).")

    else:
        print(f"- The teaching method does not have a significant effect on the scores (p-value = {anova_table.loc['C(Method)', 'PR(>F)']:.4f}).")
        
    if anova_table.loc['C(Environment)', 'PR(>F)'] < alpha:
        print(f"- The study environment has a significant effect on the scores (p-value = {anova_table.loc['C(Environment)', 'PR(>F)']:.4f}).")
    else:
        print(f"- The study environment does not have a significant effect on the scores (p-value = {anova_table.loc['C(Environment)', 'PR(>F)']:.4f}).")
    
    if anova_table.loc['C(Method):C(Environment)', 'PR(>F)'] < alpha:
        print(f"- There is a significant interaction effect between the teaching method and the environment (p-value = {anova_table.loc['C(Method):C(Environment)', 'PR(>F)']:.4f}).")
        print("  This suggests that the effect of one factor depends on the level of the other.")
    else:
        print(f"- There is no significant interaction effect between the teaching method and the environment (p-value = {anova_table.loc['C(Method):C(Environment)', 'PR(>F)']:.4f}).")

print_conclusions(anova_table, data)



Conclusions:
- The teaching method has a significant effect on the scores (p-value = 0.0000).
- The study environment has a significant effect on the scores (p-value = 0.0000).
- There is no significant interaction effect between the teaching method and the environment (p-value = 0.8337).
