### **1 Sample Z Test**

**Test 1 : Z-test for a population mean (variance known)**

**Data**

    H0: μ0 = 4.0
    n = 9, x ̄ = 4.6
    σ = 1.0
    ∴ Z = 1.8

**Hypotheses and alternatives**
1. H0: μ = μ0 , H1: μ != μ0 # Two tailed
2. H0: μ = μ0 , H1: μ>μ0 # one tailed

**Conclusion**
1. Do not reject H0.
2. Reject H0

In [None]:
# 1. H0: μ = μ0 , H1: μ != μ0
# Two tailed z test
import numpy as np
from scipy.stats import norm

population_mean = 4.0
n = 9
sample_mean = 4.6
pop_std_dev = 1.0
alpha = 0.05

z_score =(sample_mean - population_mean)/(pop_std_dev/np.sqrt(n)) # 1.8 acording to data given, ours is 1.7999999999...
p_value = 2*(1 - norm.cdf(abs(z_score)))

if p_value >= alpha:
  print("Failed to Reject H0")
if p_value < alpha :
  print("Reject H0")

Failed to Reject H0


In [None]:
# 2. H0: μ = μ0 , H1: μ>μ0
# one tailed
from scipy.stats import norm
import numpy as np

population_mean = 4.0
n = 9
sample_mean = 4.6
pop_std_dev = 1.0
alpha = 0.05

z_score = (sample_mean - population_mean)/(pop_std_dev/(np.sqrt(n)))
p_value = 1 - norm.cdf(abs(z_score))

if p_value >= alpha:
  print("Failed to Reject H0")
if p_value < alpha :
  print("Reject H0")

Reject H0


### **2 Sample Z Test**

**Test 2 : Z-test for two population means (variances known and unequal)**

**Hypotheses and alternatives :**
1. H0: μ1 − μ2 = μ0 , H1: μ1 − μ2 != μ0 # Two tailed
2. H0: μ1 − μ2 = μ0 , H1: μ1 − μ2 > μ0 # one tailed

**Data**

    H0: μ1 − μ2 = 0
    n1 = 9, n2 = 16
    x ̄1 = 1.2, x ̄2 = 1.7
    σ21 = 1, σ22 = 4
    ∴ Z = −0.832

**Conclusion**
1. Do not reject H0.
2. Do not reject H0.

In [None]:
# 1. H0: μ1 − μ2 = μ0 , H1: μ1 − μ2 != μ0
# Two tailed
from scipy.stats import norm
import numpy as np

n1 = 9
n2 = 16

sample_mean_1 = 1.2
sample_mean_2 = 1.7

pop_std_dev_1 = np.sqrt(1)
pop_std_dev_2 = np.sqrt(4)

z_score = (sample_mean_1 - sample_mean_2) / np.sqrt((pop_std_dev_1**2 / n1) + (pop_std_dev_2**2 / n2))
print(f"z score : {z_score}")

p_value = 2 * (1 - norm.cdf(abs(z_score)))
print(f"p value : {p_value}")

if p_value >= alpha:
  print("Failed to Reject H0")
if p_value < alpha :
  print("Reject H0")

z score : -0.8320502943378437
p value : 0.40538055645894233
Failed to Reject H0


In [None]:
# H0: μ1 − μ2 = μ0 , H1: μ1 − μ2 > μ0
# one tailed
from scipy.stats import norm
import numpy as np

n1 = 9
n2 = 16

sample_mean_1 = 1.2
sample_mean_2 = 1.7

pop_std_dev_1 = np.sqrt(1)
pop_std_dev_2 = np.sqrt(4)

z_score = (sample_mean_1 - sample_mean_2) / np.sqrt((pop_std_dev_1**2 / n1) + (pop_std_dev_2**2 / n2))
print(f"z score : {z_score}")

p_value = 1 - norm.cdf(abs(z_score))
print(f"p value : {p_value}")

if p_value >= alpha:
  print("Failed to Reject H0")
if p_value < alpha :
  print("Reject H0")

z score : -0.8320502943378437
p value : 0.20269027822947117
Failed to Reject H0


### **Z Test for Proportions**

In [None]:
import numpy as np
from statsmodels.stats.proportion import proportions_ztest

#data
vegetarian_count = 960
# vegetarian_count = 180
total_population = 1860

#phat
p_hat = vegetarian_count/total_population

alpha = 0.01

z_stat, p_value = proportions_ztest(count=vegetarian_count,
                                   nobs=total_population,     #number of observations
                                   value=0.50,
                                   alternative="two-sided")



if p_value < alpha:
    print("Reject Null Hypo H0 : Veg and NonVeg are unequall in number ")
else:
    print("Accept Null H0 : Veg and NonVeg are equall in number")

Accept Null H0 : Veg and NonVeg are equall in number


### **1 Sample T test**

**Test 3 : t-test for a population mean (variance unknown)**

**Hypotheses and alternatives**
1. H0: μ = μ0 , H1: μ != μ0 # two tailed
2. H0: μ = μ0 , H1: μ>μ0 # one tailed

**Data**

    H0: μ0 = 4.0
    n = 9, x ̄ = 3.1
    s = 1.0
    ∴ t = −2.7

**Conclusion**
1. t8; 0.025 = ±2.306. Reject H0.
2. t8; 0.05 = −1.860 (left-hand side). Reject H0.

In [None]:
# 1. H0: μ = μ0 , H1: μ != μ0
# two tailed
import numpy as np
from scipy.stats import t

pop_mean = 4.0
sample_mean = 3.1
n = 9
sample_std_dev = 1.0
df = n - 1
ci = 0.95
alpha = 1 - ci

t_critical = t.ppf(1-alpha/2,df)
print(f"t critical : ±{t_critical}")

t_stats = (sample_mean - pop_mean)/(sample_std_dev/np.sqrt(n))
print(f"t stats : {t_stats}")

p_value = 2*(1 - t.cdf(abs(t_stats), df))
print(f"p value : {p_value}")

if p_value >= alpha:
  print("Failed to Reject H0")
if p_value < alpha :
  print("Reject H0")

t critical : ±2.306004135204166
t stats : -2.6999999999999997
p value : 0.027074035047965905
Reject H0


In [None]:
# 2. H0: μ = μ0 , H1: μ>μ0
# one tailed
from scipy.stats import t
import numpy as np

pop_mean = 4.0
sample_mean = 3.1
n = 9
sample_std_dev = 1.0
df = n - 1
ci = 0.95
alpha = 1 - ci

t_critical = t.ppf(1-alpha, df)
print(f"t critical : {t_critical}")

t_stats = (sample_mean - pop_mean)/(sample_std_dev/np.sqrt(n))
print(f"t stats : {t_stats}")

p_value = 1 - t.cdf(abs(t_stats), df)
print(f"p value : {p_value}")

if p_value >= alpha:
  print("Failed to Reject H0")
if p_value < alpha :
  print("Reject H0")

t critical : 1.8595480375228424
t stats : -2.6999999999999997
p value : 0.013537017523982953
Reject H0


### **2 sample T test**

**Test 4 : t-test for two population means (variance unknown but equal)**

**Hypotheses and alternatives :**
1. H0: μ1 − μ2 = μ0 , H1: μ1 − μ2 != μ0 # Two tailed
2. H0: μ1 − μ2 = μ0 , H1: μ1 − μ2 > μ0 # one tailed

**Data**

    H0: μ1 − μ2 = 0
    n1 = 16, n2 = 16
    x ̄1 = 5.0, x ̄2 = 4
    s = 2.0
    ∴ t = 1.414

**Conclusion**
1. t30; 0.025 = ±2.042. Do not reject H0.
2. t30; 0.05 = 1.697. Do not reject H0.

In [None]:
# 1. H0: μ1 − μ2 = μ0 , H1: μ1 − μ2 != μ0
# Two tailed
import numpy as np
from scipy.stats import t
n1 = 16
n2 = 16
sample_mean_1 = 5.0
sample_mean_2 = 4
sample_std_dev = 2.0
ci = 0.95
alpha = 1 - ci

df = n1 + n2 - 2

t_critical = t.ppf(1-alpha/2,df)
print(f"t critical : ±{t_critical}")

t_stats = (sample_mean_1 - sample_mean_2)/(sample_std_dev*np.sqrt((1/n1)+(1/n2)))
print(f"t stats : {t_stats}")

p_value = 2*(1 - t.cdf(abs(t_stats), df))
print(f"p value : {p_value}")

if p_value >= alpha:
  print("Failed to Reject H0")
if p_value < alpha :
  print("Reject H0")

t critical : ±2.0422724563012373
t stats : 1.414213562373095
p value : 0.16759410801934616
Failed to Reject H0


In [None]:
# 1. H0: μ1 − μ2 = μ0 , H1: μ1 − μ2 > μ0
# one tailed
import numpy as np
from scipy.stats import t
n1 = 16
n2 = 16
sample_mean_1 = 5.0
sample_mean_2 = 4
sample_std_dev = 2.0
ci = 0.95
alpha = 1 - ci

df = n1 + n2 - 2

t_critical = t.ppf(1-alpha,df)
print(f"t critical : ±{t_critical}")

t_stats = (sample_mean_1 - sample_mean_2)/(sample_std_dev*np.sqrt((1/n1)+(1/n2)))
print(f"t stats : {t_stats}")

p_value = (1 - t.cdf(abs(t_stats), df))
print(f"p value : {p_value}")

if p_value >= alpha:
  print("Failed to Reject H0")
if p_value < alpha :
  print("Reject H0")

t critical : ±1.6972608865939574
t stats : 1.414213562373095
p value : 0.08379705400967308
Failed to Reject H0


### **Paired T Test**

**Test 5 : Method of paired comparisons**

**Data**

    n1 = 16, d = 1.0
    s = 1.0
    ∴ t = 4.0

**Hypotheses and alternatives**
1. H0: μd = 0 , H1: μd != 0
2. H0: μd = 0 , H1: μd > 0

**Conclusion**
1. t15; 0.025 = ±2.131. Reject H0.
2. t15; 0.05 = 1.753. Reject H0.

In [None]:
# 1. H0: μd = 0 , H1: μd != 0
# 2 tailed
import numpy as np
from scipy.stats import t

n1 = 16
d = 1.0
sample_std_dev = 1.0
ci = 0.95
alpha = 1 - ci
df = n1 - 1

t_critical = t.ppf(1-alpha/2,df)
print(f"t critical : ±{t_critical}")

t_stats = (d)/(sample_std_dev/np.sqrt(n1))
print(f"t stats : {t_stats}")

p_value = 2*(1 - t.cdf(abs(t_stats), df))
print(f"p value : {p_value}")

if p_value >= alpha:
  print("Failed to Reject H0")
  print("No difference")
if p_value < alpha :
  print("Reject H0")
  print("Difference")

t critical : ±2.131449545559323
t stats : 4.0
p value : 0.0011593168497612272
Reject H0
Difference


In [None]:
# 2. H0: μd = 0 , H1: μd > 0
# 1 tailed
import numpy as np
from scipy.stats import t

n1 = 16
d = 1.0
sample_std_dev = 1.0
ci = 0.95
alpha = 1 - ci
df = n1 - 1

t_critical = t.ppf(1-alpha,df)
print(f"t critical : ±{t_critical}")

t_stats = (d)/(sample_std_dev/np.sqrt(n1))
print(f"t stats : {t_stats}")

p_value = (1 - t.cdf(abs(t_stats), df))
print(f"p value : {p_value}")

if p_value >= alpha:
  print("Failed to Reject H0")
  print("No difference")
if p_value < alpha :
  print("Reject H0")
  print("Difference")

t critical : ±1.7530503556925547
t stats : 4.0
p value : 0.0005796584248806136
Reject H0
Difference


### **Test 6: Two-sample F-test for equality of variances**

In [None]:
from scipy.stats import f

# Test 6: Two-sample F-test for equality of variances

# Hypotheses:
# 1. H0: σ1^2 = σ2^2, H1: σ1^2 != σ2^2 (Two-tailed)
# 2. H0: σ1^2 = σ2^2, H1: σ1^2 > σ2^2 (One-tailed)

# Data:
# s1^2 = 10, n1 = 21
# s2^2 = 8,  n2 = 25

# Calculations:
s1_squared = 10
n1 = 21
s2_squared = 8
n2 = 25
alpha = 0.05

df1 = n1 - 1
df2 = n2 - 1

# Calculate F-statistic
f_statistic = s1_squared / s2_squared
print(f"F-statistic: {f_statistic}")

# 1. Two-tailed F-test
f_critical_lower = f.ppf(alpha / 2, df1, df2)
f_critical_upper = f.ppf(1 - alpha / 2, df1, df2)
print(f"Two-tailed F-critical values: {f_critical_lower:.3f}, {f_critical_upper:.3f}")

p_value_two_tailed = 2 * min(f.cdf(f_statistic, df1, df2), 1 - f.cdf(f_statistic, df1, df2))
print(f"Two-tailed p-value: {p_value_two_tailed:.3f}")

if p_value_two_tailed < alpha:
  print("Reject H0: Variances are unequal")
else:
  print("Failed to Reject H0: Variances are equal")

print("-" * 20)

# 2. One-tailed F-test (assuming H1: σ1^2 > σ2^2)
f_critical_one_tailed = f.ppf(1 - alpha, df1, df2)
print(f"One-tailed F-critical value: {f_critical_one_tailed:.3f}")

p_value_one_tailed = 1 - f.cdf(f_statistic, df1, df2)
print(f"One-tailed p-value: {p_value_one_tailed:.3f}")

if p_value_one_tailed < alpha:
  print("Reject H0: Variance of sample 1 is greater than variance of sample 2")
else:
  print("Failed to Reject H0: Variance of sample 1 is not greater than variance of sample 2")


F-statistic: 1.25
Two-tailed F-critical values: 0.415, 2.327
Two-tailed p-value: 0.596
Failed to Reject H0: Variances are equal
--------------------
One-tailed F-critical value: 2.027
One-tailed p-value: 0.298
Failed to Reject H0: Variance of sample 1 is not greater than variance of sample 2


In [None]:
import numpy as np
from scipy.stats import f


alpha = 0.05

#Data
s1_squared = 10
n1 = 21
s2_squared = 8
n2 = 25
alpha = 0.05

df1 = n1 - 1
df2 = n2 - 1

print(f"Variance 1 : {s1_squared}")
print(f"Variance 2 : {s2_squared}")

#calculate f_stats
f_statistic = max(s1_squared,s2_squared) / min(s1_squared,s2_squared)
print(f"F Statistic : {f_statistic}")

f_critical_value = f.ppf(1-alpha/2, df1,df2)
print(f"f critical : {f_critical_value}")

p_value = f.cdf(f_statistic, df1, df2)

if p_value < alpha:
  print("Reject H0: Variances are unequal")
else:
  print("Failed to Reject H0: Variances are equal")

Variance 1 : 10
Variance 2 : 8
F Statistic : 1.25
f critical : 2.327271444608616
Failed to Reject H0: Variances are equal


### **F-test in ANOVA (Multiple group means)**

In [None]:
from scipy.stats import f_oneway
import numpy as np

alpha = 0.05 #95 CI

#generae data on my own
fertilizer_A = [12, 13, 15, 14, 16, 13, 15]   #btw you can add not just 1 but as many as you wany
fertilizer_B = [14, 13, 13, 15, 17, 15, 16]
# fertilizer_C = [14, 15, 14, 13, 14, 13, 14]
fertilizer_C = [10, 18, 17, 14, 19, 17, 16]

f_statistics, p_value = f_oneway(fertilizer_A,fertilizer_B,fertilizer_C)
print(f"f_statistics : {f_statistics}")
print(f"p_value : {p_value}")

if p_value < alpha:
    print("Reject H0 Hyp : There is significant diff in plant height")
else:
    print("Accept H0 Hyp : There is no sig diff in plant heights")

f_statistics : 1.3772241992882563
p_value : 0.27762066315388506
Accept H0 Hyp : There is no sig diff in plant heights


### **2-way ANOVA**
- Two-way ANOVA tests the effect of two independent categorical variables on a continuous dependent variable, and also whether there’s interaction between those two variables.



In [None]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Sample data
data = {
    'Fertilizer': ['F1', 'F1', 'F2', 'F2', 'F3', 'F3'] * 3,
    'Soil': ['Clay'] * 6 + ['Sandy'] * 6 + ['Loam'] * 6,
    'Growth': [12, 13, 14, 15, 16, 15, 10, 11, 13, 14, 15, 13, 18, 17, 19, 18, 20, 19]
}

df = pd.DataFrame(data)

# OLS model with interaction
model = ols('Growth ~ C(Fertilizer) + C(Soil) + C(Fertilizer):C(Soil)', data=df).fit()

# Two-way ANOVA table
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)

# If p-value < 0.05, the factor or interaction has a significant effect.

                           sum_sq   df          F    PR(>F)
C(Fertilizer)           25.444444  2.0  19.083333  0.000579
C(Soil)                110.111111  2.0  82.583333  0.000002
C(Fertilizer):C(Soil)    2.222222  4.0   0.833333  0.536836
Residual                 6.000000  9.0        NaN       NaN


In [None]:
import pandas as pd
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

data = {
    "Gender":['Male',"Female",'Male',"Female",'Male',"Female",
             'Male',"Female",'Male',"Female"],
    "Education_level":["HS","BD","BD","MD","HS","BD","MD","HS","BD","MD"],
    "Income":[30000, 50000,45000,60000,32000,55000,48000,33000,52000,47000]
}

df = pd.DataFrame(data)


#fit the model - run linear regression
model = ols("Income ~ Gender + Education_level + Gender:Education_level", data=df).fit()
# ordinary least square

anova_result = anova_lm(model)
anova_result

Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
Gender,1.0,144400000.0,144400000.0,4.676923,0.096605
Education_level,2.0,677581000.0,338790500.0,10.972971,0.023767
Gender:Education_level,2.0,4119048.0,2059524.0,0.066705,0.93649
Residual,4.0,123500000.0,30875000.0,,


### **Chi-2 Test**

The Chi-Square (χ²) Test is a statistical method used to compare observed frequencies with expected frequencies to determine if there is a significant association between categorical variables.

In [None]:
import pandas as pd
from scipy.stats import chi2_contingency

# Create contingency table
data = [[30, 20],  # Men
        [25, 25]]  # Women
alpha = 0.05
table = pd.DataFrame(data, columns=["Tea", "Coffee"], index=["Men", "Women"])

# Perform Chi-Square Test
chi2, p, dof, expected = chi2_contingency(table)

print("Chi-Square Statistic:", chi2)
print("Degrees of Freedom:", dof)
print("P-Value:", p)
print("Expected Frequencies:\n", expected)

if p_value < alpha:
    print("Reject H0 Hyp : there is sig diff in ob and ex")
else:
    print("Accept H0 Hyp: there is no sig diff in ob and ex")

Chi-Square Statistic: 0.6464646464646464
Degrees of Freedom: 1
P-Value: 0.4213795037428696
Expected Frequencies:
 [[27.5 22.5]
 [27.5 22.5]]
Accept H0 Hyp: there is no sig diff in ob and ex


In [None]:
import pandas as pd
from scipy.stats import chi2_contingency

# Create contingency table
data = [[30, 20],  # Men
        [25, 25]]  # Women
alpha = 0.05
table = pd.DataFrame(data, columns=["Tea", "Coffee"], index=["Men", "Women"])

# Perform Chi-Square Test
chi2, p, dof, expected = chi2_contingency(table)

print("Chi-Square Statistic:", chi2)
print("Degrees of Freedom:", dof)
print("P-Value:", p)
print("Expected Frequencies:\n", expected)

if p < alpha: # Corrected comparison here
    print("Reject H0 Hyp : there is sig diff in ob and ex")
else:
    print("Accept H0 Hyp: there is no sig diff in ob and ex")

Chi-Square Statistic: 0.6464646464646464
Degrees of Freedom: 1
P-Value: 0.4213795037428696
Expected Frequencies:
 [[27.5 22.5]
 [27.5 22.5]]
Accept H0 Hyp: there is no sig diff in ob and ex
