# Two Sample T-Test

Two-sample tests are appropriate for comparing two samples, typically experimental and control samples from a scientifically controlled experiment.

https://en.wikipedia.org/wiki/Test_statistic

### Dependent t-test for paired samples

This test is used when the samples are dependent; that is, when there is only one sample that has been tested twice (repeated measures) or when there are two samples that have been matched or "paired". This is an example of a paired difference test. The t statistic is calculated as

![image.png](attachment:image.png)

where {\displaystyle {\bar {X}}_{D}}{\displaystyle {\bar {X}}_{D}} and {\displaystyle s_{D}}{\displaystyle s_{D}} are the average and standard deviation of the differences between all pairs. The pairs are e.g. either one person's pre-test and post-test scores or between-pairs of persons matched into meaningful groups (for instance drawn from the same family or age group: see table). The constant μ0 is zero if we want to test whether the average of the difference is significantly different. The degree of freedom used is n − 1, where n represents the number of pairs.


In [1]:
import numpy as np
from scipy.stats import ttest_1samp, ttest_ind, mannwhitneyu, levene, shapiro
from statsmodels.stats.power import ttest_power
from statsmodels.stats import weightstats
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#### Example 1: 

Before and after medicine BP was measured. Is there a difference at 95% confidence level?


In [2]:
bp_before = [120, 122, 143, 100, 109]
bp_after = [122, 120, 141, 109, 109]

In [8]:
t_stat, p_value = stats.ttest_rel(a=bp_before, b=bp_after, alternative='two-sided')

In [9]:
print(f"""
The Z-Statistic is: {t_stat}
The p-value is: {p_value} \n"""  )

if p_value<0.05:
    print(f"Ho is rejected in favour of Ha")
else:
    print("Failed to reject Ho")


The Z-Statistic is: -0.6864064729836442
The p-value is: 0.5301776477578163 

Failed to reject Ho


### Example
Pre and Post surgery energy intake

In [10]:
# pre and post-surgery energy intake
intake = np.array([
[5260, 3910],
[5470, 4220],
[5640, 3885],
[6180, 5160],
[6390, 5645],
[6515, 4680],
[6805, 5265],
[7515, 5975],
[7515, 6790],
[8230, 6900],
[8770, 7335],
])

In [11]:
# Seperating data into 2 groups
pre = intake[:, 0]
post = intake[:, 1]
pre, post

(array([5260, 5470, 5640, 6180, 6390, 6515, 6805, 7515, 7515, 8230, 8770]),
 array([3910, 4220, 3885, 5160, 5645, 4680, 5265, 5975, 6790, 6900, 7335]))

In [12]:
# Paired t-test: doing two measurements on the same experiment unit
t_stat_paired, p_value_paired = ttest_1samp(post - pre, popmean=0)
print(t_stat_paired, p_value_paired)

-11.941392877647603 3.059020942934875e-07


In [13]:
# p < 0.05 => alternative hypothesis:
# the difference in mean is not equal to 0
print ("paired t-test p-value=", p_value_paired)
print('This is much less than alpha, so Ho is rejected. There is a significant difference between patient \
intake before and after sugery')

paired t-test p-value= 3.059020942934875e-07
This is much less than alpha, so Ho is rejected. There is a significant difference between patient intake before and after sugery


In [None]:
#print(ttest_power(0.587, nobs=22, alpha=0.10, alternative='two-sided'))

In [14]:
t_stat, p_value = stats.ttest_rel(a=pre, b=post, alternative='two-sided')

In [15]:
print(f"""
The t-Statistic is: {t_stat}
The p-value is: {p_value} \n"""  )

if p_value<0.05:
    print(f"Ho is rejected in favour of Ha")
else:
    print("Failed to reject Ho")


The t-Statistic is: 11.941392877647603
The p-value is: 3.059020942934875e-07 

Ho is rejected in favour of Ha
