## Two Sample (Independent) t-test

The independent two-sample t-test is used to compare the means of two different samples representing two different populations. There are two different tests of this kind applied to normally distributed data:
<br>
**Student’s t-test** assumes that the variances of the two samples are the same
<br>
**Welch's t-test** which does not assume that the variances of  the two samples are the same


**Null hypothesis**: Two group means are equal
<br>
**Alternative hypothesis**: Two group means are different (two-tailed or two-sided)
<br>
**Alternative hypothesis**: Mean of one group either greater or less than another group (one-tailed or one-sided)

In [None]:
# import libraries
import numpy as np
import pandas as pd

from scipy.stats import ttest_ind
from scipy.stats import f_oneway
from statsmodels.stats.multicomp import pairwise_tukeyhsd

### Two-Sample T-Test with SciPy

stats.ttest_ind(a, b, axis=0, equal_var=True)

a, b - sample data in an array like format. Both arrays must have the same shape
axis - Axis along which to compute test. If None, compute over the whole arrays, a, and b.
equal_var - If True (default), perform a standard independent 2 sample test that assumes equal population variances (Student's t-test). If False, perform Welch’s t-test, which does not assume equal population variance.

In [None]:
# generate normally distributed random numbers
# different means same standard deviations
# loc = mean; scale = standard deviation; size = shape of the returned array
x = np.random.normal(loc=100, scale=5, size=(20, 1))
y = np.random.normal(loc=105, scale=5, size=(20, 1))

In [None]:
# equal variances - Student's t-test
results = ttest_ind(x, y, equal_var=True)
print(results)

In [None]:
# generate normally distributed random numbers
# different means different standard deviations
# loc = mean; scale = standard deviation; size = shape of the returned array
x = np.random.normal(loc=100, scale=5, size=(20, 1))
y = np.random.normal(loc=105, scale=10, size=(20, 1))

In [None]:
# Welch t-test - no assumption about variances
results = ttest_ind(x, y, equal_var=False)
print(results)

In [None]:
# read the data from file cdc.csv
data = pd.read_csv("cdc.csv")
data.head()

In [None]:
data.drop(columns="Unnamed: 0",inplace=True)
data.head()

In [None]:
# separate the data for males and females
data_m = data.loc[data['gender'] == 'm']
data_f = data.loc[data['gender'] == 'f']

In [None]:
# separate the data for heights
heights_m = data_m['height']
heights_f = data_f['height']

In [None]:
# are the variances equal?
var_heights_m = heights_m.var(ddof=1)
var_heights_f = heights_f.var(ddof=1)
var_heights_m, var_heights_f

The variances of heights_m and heights_f are different, therefore it is necessary to use Welch test.

In [None]:
# Welch t-test - no assumption about variances
results_mf = ttest_ind(heights_m, heights_f, equal_var=False)
print(results_mf)

## ANOVA

In [None]:
# read data from file anova.csv
data = pd.read_csv("anova.csv")
data.head()

In [None]:
data.describe()

In [None]:
# perform one-way ANOVA
f_oneway(data['No offer'], data['Offer 1'], data['Offer 2'])

In [None]:
# Tukey's Honest Significant Difference (HSD) 
data.shape

In [None]:
# prepare the data
No_offer_list = data['No offer'].values.tolist()
Offer_1_list = data['Offer 1'].values.tolist()
Offer_2_list = data['Offer 2'].values.tolist()
D = No_offer_list + Offer_1_list + Offer_2_list

In [None]:
df = pd.DataFrame({'score': D, 'group': np.repeat(['No offer', 'Offer 1', 'Offer 2'], repeats=150)}) 

In [None]:
# perform Tukey's HSD test and display results
tukey = pairwise_tukeyhsd(endog=df['score'], groups=df['group'], alpha=0.05)
print(tukey)