# Hypothesis Testing

In [1]:
import pandas as pd

import pingouin
from scipy.stats import ttest_1samp, ttest_ind, bartlett, levene, f_oneway

In [2]:
# Significance level
ALPHA = 0.05

---

In [3]:
DataSet = pd.read_csv("WHR_2023_processed.csv")

In [4]:
Variable = "Life Ladder"
GroupVariable = "Continent"

DatGroup = DataSet[[Variable, GroupVariable]].groupby(GroupVariable).agg(["count", "mean", "var"]).reset_index()

In [5]:
DatGroup.round(2)

Unnamed: 0_level_0,Continent,Life Ladder,Life Ladder,Life Ladder
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,var
0,Africa,36,4.52,0.43
1,Americas,20,6.34,0.19
2,Asia,38,5.59,0.85
3,Europe,39,6.45,0.42
4,Oceania,2,7.0,0.0


In [6]:
YSampleAfrica = DataSet.loc[DataSet["Continent"] == "Africa", Variable]
YSampleAmericas = DataSet.loc[DataSet["Continent"] == "Americas", Variable]
YSampleAsia = DataSet.loc[DataSet["Continent"] == "Asia", Variable]
YSampleEurope = DataSet.loc[DataSet["Continent"] == "Europe", Variable]
YSampleOceania = DataSet.loc[DataSet["Continent"] == "Oceania", Variable]

---

## One sample Student's t-test

$$H:\mu=\mu_0$$

In [7]:
Mu0= 5.7

$$H:\mu_\text{Asia}=5.7$$

$$\bar{x}_\text{Asia}=5.59$$

In [8]:
pingouin.ttest(YSampleAsia, Mu0)

Unnamed: 0,T,dof,alternative,p-val,CI95%,cohen-d,BF10,power
T-test,-0.758925,37,two-sided,0.452705,"[5.28, 5.89]",0.123114,0.228,0.114596


In [9]:
ttest_1samp(YSampleAsia, Mu0)

TtestResult(statistic=-0.7589249413631844, pvalue=0.4527053331611153, df=37)

We cannot reject the hypothesis that the mean is equal to $\mu_0$.

$$H:\mu_\text{Africa}=5.7$$

$$\bar{x}_\text{Africa}=4.52$$

In [10]:
pingouin.ttest(YSampleAfrica, Mu0)

Unnamed: 0,T,dof,alternative,p-val,CI95%,cohen-d,BF10,power
T-test,-10.838943,35,two-sided,9.881574e-13,"[4.3, 4.74]",1.80649,8289000000.0,1.0


In [11]:
ttest_1samp(YSampleAfrica, Mu0)

TtestResult(statistic=-10.8389429939709, pvalue=9.881574425646482e-13, df=35)

We reject the hypothesis that the mean is equal to $\mu_0$.

---

## Testing equal variance between two groups

$$H:\sigma^2_\text{Group 1}=\sigma^2_\text{Group 2}$$

Bartlett's and Levene's tests are popular to test equal variance between groups

$$H:\sigma^2_\text{Africa}=\sigma^2_\text{Europe}$$

$$s_\text{Africa}=0.43,\quad s_\text{Europe}=0.42$$

In [12]:
bartlett(YSampleAfrica, YSampleEurope)

BartlettResult(statistic=0.0030011428194933296, pvalue=0.9563115932814698)

In [13]:
levene(YSampleAfrica, YSampleEurope)

LeveneResult(statistic=0.058396261549111, pvalue=0.8097264373150028)

We cannot reject the hypothesis of equal variances.

$$H:\sigma^2_\text{Americas}=\sigma^2_\text{Europe}$$

$$s_\text{Americas}=0.19,\quad s_\text{Europe}=0.42$$

In [14]:
bartlett(YSampleAmericas, YSampleEurope)

BartlettResult(statistic=3.5037433935006748, pvalue=0.06123028001971009)

In [15]:
levene(YSampleAmericas, YSampleEurope)

LeveneResult(statistic=2.756340820546062, pvalue=0.10236173462920765)

We have slight evidence againts equal variances.

---

## Testing equal mean between two groups

$$H:\mu_\text{Group 1}=\mu_\text{Group 2}$$

$$H:\mu_\text{Africa}=\mu_\text{Europe}$$

$$\bar{x}_\text{Africa}=4.52,\quad \bar{x}_\text{Europe}=6.45$$

In [16]:
pingouin.ttest(YSampleAfrica, YSampleEurope) # We concluded previously equal variance

Unnamed: 0,T,dof,alternative,p-val,CI95%,cohen-d,BF10,power
T-test,-12.851226,72.409691,two-sided,2.32202e-20,"[-2.24, -1.64]",2.971341,1.441e+17,1.0


In [17]:
ttest_ind(YSampleAfrica, YSampleEurope)

TtestResult(statistic=-12.855987396802805, pvalue=1.965759425057481e-20, df=73.0)

We reject the hypothesis of equal means.

$$H:\mu_\text{Americas}=\mu_\text{Europe}$$

$$\bar{x}_\text{Americas}=6.34,\quad \bar{x}_\text{Europe}=6.45$$

In [18]:
pingouin.ttest(YSampleAmericas, YSampleEurope, correction=True) #correction=True to consider different variances

Unnamed: 0,T,dof,alternative,p-val,CI95%,cohen-d,BF10,power
T-test,-0.828214,52.645556,two-sided,0.411288,"[-0.4, 0.17]",0.20126,0.367,0.111091


In [19]:
ttest_ind(YSampleAmericas, YSampleEurope, equal_var=False) #equal_var=False to consider different variances

TtestResult(statistic=-0.8282140709934379, pvalue=0.41128825362173227, df=52.64555613878273)

We canot reject the hypothesis of equal means.

---

## One-way ANOVA

$$\mu_\text{Group 1} = \mu_\text{Group 2} = \cdots = \mu_\text{Group J}$$

In [20]:
DatGroup

Unnamed: 0_level_0,Continent,Life Ladder,Life Ladder,Life Ladder
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,var
0,Africa,36,4.516972,0.428863
1,Americas,20,6.33555,0.190663
2,Asia,38,5.586368,0.851891
3,Europe,39,6.453641,0.421098
4,Oceania,2,7.0005,0.001201


### Using pingouin

In [21]:
pingouin.anova(dv=Variable, between=GroupVariable, data=DataSet, detailed=True)

Unnamed: 0,Source,SS,DF,MS,F,p-unc,np2
0,Continent,84.450775,4,21.112694,41.48775,2.241085e-22,0.560738
1,Within,66.155676,130,0.50889,,,


### Using scipy

Testing equal mean for all the groups

In [22]:
f_oneway(YSampleAfrica, YSampleAmericas, YSampleAsia, YSampleEurope, YSampleOceania)

F_onewayResult(statistic=41.4877504949159, pvalue=2.241084854409624e-22)

We reject same mean for all the groups