### T-testing Sample

We are performing a two sample one-tailed t-test for two instances.
It's a bit different from the standard t-test since we are comparing the differences of the mean of the two samples with an estimated value (population mean difference).
Normally, comparison with a value is done in one sample t-test.

In [1]:
import pandas as pd
from scipy import stats
import math
# since the conditions for the t-test we are performing are different from the standard one, we can not use predefined library functions
# from scipy.stats import ttest_ind

In [2]:
# getting the tables for 2004 and 2005 Half Hourly S.I. data
tempdf = pd.read_excel(r"D:\Projects\ntu-vish\Seasonality Index Data\Half-Hourly S.I. per Month(Mon-Fri,Weekends).xlsx", sheet_name = str(2004), index_col=0, header=[0,1])
tempdf2 = pd.read_excel(r"D:\Projects\ntu-vish\Seasonality Index Data\Half-Hourly S.I. per Month(Mon-Fri,Weekends).xlsx", sheet_name = str(2005), index_col=0, header=[0,1])    

In [3]:
# getting the two samples of year 2004 and 2005 for the half hour- 00:30 Mondays
sample1 = tempdf.loc['00:30'].xs('Mon',axis=0,level=1)
sample2 = tempdf2.loc['00:30'].xs('Mon',axis=0,level=1)
print(pd.DataFrame({'sample 1': sample1,'sample 2': sample2}))

     sample 1  sample 2
Jan  0.852868  0.839458
Feb  0.874962  0.871213
Mar  0.850380  0.856242
Apr  0.860552  0.873276
May  0.877160  0.895788
Jun  0.875012  0.877738
Jul  0.857285  0.865823
Aug  0.883730  0.863383
Sep  0.846878  0.850464
Oct  0.855944  0.856002
Nov  0.872227  0.862493
Dec  0.861700  0.888451


In [4]:
# mean of samples
x1 = sample1.mean()
x2 = sample2.mean()
print(f'Mean of sample 1- {x1}')
print(f'Mean of sample 2- {x2}')
print(f'Difference of means- {x1-x2}')

Mean of sample 1- 0.8640582224878502
Mean of sample 2- 0.8666943224023204
Difference of means- -0.0026360999144701136


In [5]:
# number of observations
n1 = sample1.size
n2 = sample2.size
print(f"Degrees of freedom, n1 and n2- {n1-1} , {n2-1}")

Degrees of freedom, n1 and n2- 11 , 11


In [6]:
# standard deviation
s1 = sample1.std()
s2 = sample2.std()
print(f'Standard deviation of sample 1- {s1}')
print(f'Standard deviation of sample 2- {s2}')

Standard deviation of sample 1- 0.012065381800377441
Standard deviation of sample 2- 0.01585079556492957


$S_p$ is the pooled standard deviation

## $S_p = \frac{(n1-1)s_1^2 + (n2-1)s_2^2}{n1+n2-2}$

s1, s2 - standard deviations of sample 1 and 2 <br>
n1, n2 - number of observations in sample 1 and 2

In [7]:
sp = math.sqrt(((n1-1)*(s1**2) + (n2-1)*(s2**2))/(n1+n2-2))
print(f'Pooled standard deviation, Sp = {sp}')

Pooled standard deviation, Sp = 0.014085829014120364


In [8]:
# calculated for the alpha value of 0.05 and degrees of freedom (n1+n2-2)
crit_value = stats.t.ppf(1-0.05,n1+n2-2)

The t-statistic is calculated as- 

## $t = \frac{(\overline{x_1}-\overline{x_2}) - (\mu1 - \mu2)}{{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}}$
$\overline{x_1}, \overline{x_2}$ - means of sample 1 and 2 <br>
$\mu1, \mu2$ - estimated difference in mean (population mean difference) <br>

#### Left tailed t-test <br>
Null Hypothesis: $\mu1 - \mu2$ >= population mean difference <br>
Alternate Hypothesis: $\mu1 - \mu2$ < population mean difference

If the obtained t-stastistic value < critical value, ***reject the null*** hypothesis.

(Note- the critical value and t-stastitic value will be negative for left-tailed test)

In [9]:
# left tailed t-test
# popmean > 0 (0.015)
popmean = 0.015
t = ((x1-x2)-popmean)/(sp*math.sqrt((1/n1)+(1/n2)))
if t > -1*crit_value:
    print('Do not reject Null')
else:
    print('Reject Null')

Reject Null


#### Right tailed t-test <br>
Null Hypothesis: $\mu1 - \mu2$ <= -population mean difference <br>
Alternate Hypothesis: $\mu1 - \mu2$ > -population mean difference

If the obtained t-stastistic value > critical value, ***reject the null*** hypothesis.

(Note- the critical value and t-stastitic value will be positive for left-tailed test)

In [10]:
# right tailed t-test
# popmean < 0
popmean = -0.015
t = ((x1-x2)-popmean)/(sp*math.sqrt((1/n1)+(1/n2)))
if t < crit_value:
    print('Do not reject Null')
else:
    print('Reject Null')

###############################
# The obtained results are-
# the difference between the mean of both samples is in the interval (-0.015,0.015) with a certainity of 95%
###############################

Reject Null


Our goal is to get both the null hypothesis rejected for a given population mean difference value, so that we obtain the following result with 95%-

-(population mean difference) < $\mu1 - \mu2$ < (population mean difference)