# t - test  

https://towardsdatascience.com/inferential-statistics-series-t-test-using-numpy-2718f8f9bf2f

In [1]:
## Import the packages
import numpy as np
from scipy import stats


In [2]:
## Define 2 random distributions
#Sample Size
N = 10
#Gaussian distributed data with mean = 2 and var = 1
a = np.random.randn(N) + 2
#Gaussian distributed data with with mean = 0 and var = 1
b = np.random.randn(N)

In [3]:
a

array([3.23270109, 1.58013374, 2.91318064, 2.87583854, 1.07274158,
       0.91264471, 1.86448672, 1.85551411, 2.22369585, 3.0201823 ])

In [4]:
b

array([ 1.89122641,  0.9979508 , -0.05035885,  0.28095164,  2.26805056,
        2.56671897, -0.24340951,  0.45284402, -0.10297679, -0.18943851])

In [5]:
## Calculate the Standard Deviation
#Рассчитаем дисперсию, чтобы получить стандартное отклонение

#For unbiased max likelihood estimate we have to divide the var by N-1, and therefore the parameter ddof = 1
#(Для несмещенной оценки максимального правдоподобия мы должны разделить переменную на N-1, и, следовательно, параметр ddof = 1)
var_a = a.var(ddof=1)
var_b = b.var(ddof=1)

#std deviation
s = np.sqrt((var_a + var_b)/2)
s

0.9640571671363499

In [7]:
## Calculate the t-statistics
t = (a.mean() - b.mean())/(s*np.sqrt(2/N))


In [8]:
## Compare with the critical t-value
#Degrees of freedom
df = 2*N - 2

#p-value after comparison with the t 
p = 1 - stats.t.cdf(t,df=df)


print("t = " + str(t))
print("p = " + str(2*p))
### You can see that after comparing the t statistic with the critical t value (computed internally) we get a good p value of 0.0005 and thus we reject the null hypothesis and thus it proves that the mean of the two distributions are different and statistically significant.


t = 3.172885210013273
p = 0.005267082948737123


In [9]:
## Cross Checking with the internal scipy function
t2, p2 = stats.ttest_ind(a,b)
print("t = " + str(t2))
print("p = " + str(p2))

t = 3.172885210013273
p = 0.005267082948737136


# Calculating an Independent Samples T Test By hand   
https://www.statisticshowto.com/independent-samples-t-test/

Sample question: Calculate an independent samples t test for the following data sets:  
Data set A: 1,2,2,3,3,4,4,5,5,6   
Data set B: 1,2,4,5,5,5,6,6,7,9   

In [10]:
a = [1,2,2,3,3,4,4,5,5,6]
b = [1,2,4,5,5,5,6,6,7,9]

In [11]:
#Step 1: Sum the two groups:
sum_a =  1 + 2 + 2 + 3 + 3 + 4 + 4 + 5 + 5 + 6 
sum_b =  1 + 2 + 4 + 5 + 5 + 5 + 6 + 6 + 7 + 9

In [12]:
print(sum_a)
print(sum_b)

35
50


In [13]:
#Step 2: Square the sums from Step 1:
sq_a = 35**2
sq_b = 49**2
#Set these numbers aside for a moment.

In [14]:
print(sq_a)
print(sq_b)

1225
2401


In [15]:
#Step 3: Calculate the means for the two groups:
mean_a = (1 + 2 + 2 + 3 + 3 + 4 + 4 + 5 + 5 + 6)/10
mean_b = (1 + 2 + 4 + 5 + 5 + 5 + 6 + 6 + 7 + 9)/10

In [16]:
print(mean_a)
print(mean_b)

3.5
5.0


In [20]:
#Step 4: Square the individual scores and then add them up:
sq_a_ind = 1**1 + 2**2 + 2**2 + 3**2 + 3**2 + 4**2 + 4**2 + 5**2 + 5**2 + 6**2 
sq_b_ind = 1**2 + 2**2 + 4**2 + 5**2 + 5**2 + 5**2 + 6**2 + 6**2 + 7**2 + 9**2 
#Set these numbers aside for a moment.

In [21]:
print(sq_a_ind)
print(sq_b_ind)

145
298


In [23]:
#Step 5 
t = (3.5 - 5)/np.sqrt((((sq_a_ind - sq_a/10)+(sq_b_ind - sq_b/10))/18)*(1/10+1/10))

In [24]:
t

-1.587027635681846

In [25]:
#Step 6 Find the Degrees of freedom 
df = (10-1 + 10-1)

In [26]:
print(df)

18


Look up your degrees of freedom (Step 6) in the t-table. If you don’t know what your alpha level is, use 5% (0.05).
18 degrees of freedom at an alpha level of 0.05 = 2.10.

Compare your calculated value (Step 5) to your table value (Step 7). The calculated value of -1.79 is less than the cutoff of 2.10 from the table. Therefore p > .05. As the p-value is greater than the alpha level, we cannot conclude that there is a difference between means

Сравните рассчитанное значение (шаг 5) с табличным значением (шаг 7). Расчетное значение -1,79 меньше, чем срез 2,10 из таблицы. Следовательно, р> 0,05. Поскольку значение р больше, чем альфа-уровень, мы не можем сделать вывод, что существует разница между средними

# What is a Paired T Test (Paired Samples T Test / Dependent Samples T Test)?  
https://www.statisticshowto.com/probability-and-statistics/t-test/#PairedTTest

A paired t test (also called a correlated pairs t-test, a paired samples t test or dependent samples t test) is where you run a t test on dependent samples. Dependent samples are essentially connected — they are tests on the same person or thing. For example:   

- Knee MRI costs at two different hospitals,
- Two tests on the same person before and after training,
- Two blood pressure measurements on the same person using different equipment.

In [39]:
#Sample question: Calculate a paired t test by hand for the following data:
a = [3, 3, 3, 12, 15, 16, 17, 19, 23, 24, 32]
b = [20, 13, 13, 20, 29, 32, 23, 20, 25, 15, 30]

#Step 1: Subtract each Y score from each X score.
diff = np.array(a) - np.array(b)
diff = list(diff)
diff

[-17, -10, -10, -8, -14, -16, -6, -1, -2, 9, 2]

In [40]:
#Step 2: Add up all of the values from Step 1. Set this number aside for a moment.
sum_ab = sum(diff)
sum_ab

-73

In [41]:
#Step 3: Square the differences from Step 1.
sq_diff = [i**2 for i in diff]  
sq_diff

[289, 100, 100, 64, 196, 256, 36, 1, 4, 81, 4]

In [42]:
#Step 4: Add up all of the squared differences from Step 3
sum_sq = sum(sq_diff)
sum_sq

1131

In [43]:
t = (sum_ab/11)/(np.sqrt((sum_sq - (sum_ab**2/10))/110))
t

-2.846030934916851

In [44]:
#Step 6: Subtract 1 from the sample size to get the degrees of freedom. We have 11 items, so 11-1 = 10.
#Step 7: Find the p-value in the t-table, using the degrees of freedom in Step 6.
#If you don’t have a specified alpha level, use 0.05 (5%). For this sample problem, with df=10, 
#the t-value is 2.228.
#Step 8: Compare your t-table value from Step 7 (2.228) to your calculated t-value (-2.84). 
#The calculated t-value is greater than the table value at an alpha level of .05. 
#The p-value is less than the alpha level: p <.05. 
#We can reject the null hypothesis that there is no difference between means.

#Note: You can ignore the minus sign when comparing the two t-values, as ± indicates the direction; 
#the p-value remains the same for both directions.

In [45]:
## Cross Checking with the internal scipy function
t2, p2 = stats.ttest_rel(a,b)
print("t = " + str(t2))
print("p = " + str(p2))

t = -2.737328922288368
p = 0.02092847795148222
