<h1>Comparison of averages</h1>
<h2>Student's t-distribution</h2>
<p>If the quantity of sample is less then 30 and general mean is not defined, it is used <b>Student's t-distribution</b> in order to describe how the behavior of all of sample means</p>
<p>The <b>t-distribution</b> describes the standardized distances of sample means to the population mean when the population standard deviation is not known, and the observations come from a normally distributed population.</p>
<p>A t-distribution is defined by one parameter, that is, <b>degrees of freedom</b> (df = n - 1). With increase of df (that means increase quantity of samples) value the distribution tends to normal</p>
<img src='img/t-distr.webp' style='width: 400px'>
<img src='https://www.gstatic.com/education/formulas2/443397389/en/student_s_t_distribution.svg'>
<p style='text-align: center'>
t	=	Student's t-distribution<br>
x	=	sample mean<br>
mu	=	population mean<br>
s	=	sample standard deviation<br>
n	=	sample size</p>
<img src='img/t-table.png' style='width: 600px'>


In [1]:
# Lets imagine we have some sample with:
import scipy.stats as st
import numpy as np
length = 25
std = 2
mn = 10.8

# and we know that in general variance the mean is:
g_mean = 10

# Commonly we could cumpute the standard error firslty, 
# and then compute how far away our mean deviate
# from general mean using the z-score 

se = std/(length**0.5)     # 0.4

z = (mn-g_mean)/se           # 2 quantity of std
# It means that we get deviation from supposed mean 
# by 2 standart deviation on right side

# So now lets compute the probability of getting such or
# more expressive deviation

df = length - 1            # quantity degree of freedom

print(2*(1-st.t.cdf(z, df))) # по t-распределению
print(2*(1-st.norm.cdf(z))) # по нормальному распределению


#Если на выборке в 15 наблюдений при помощи одновыборочного 
#t-теста проверяется нулевая гипотеза: \mu=10μ=10 
#и рассчитанное t-значение равняется -2 (t = -2), 
#то p-уровень значимости  (двусторонний) равен:

n = 15 #наблюдений в выборке
tt = -2 # t-значение

pval = st.t.sf(np.abs(tt), n-1)*2
print(pval)

0.0569398499365914
0.045500263896358195
0.06528795288911197


<h2>Student's t-test</h2>
<p><b>Student's t-test</b>, in statistics, a method of testing hypotheses about the mean of a small sample drawn from a normally distributed population when the population standard deviation is unknown.</p>
<p>If we are tasting two samples, then in order to compute t-value use this formula (Mu - Mu1 is always 0, as we compare the H0)</p>
<img src='img/t-test.jpg' style='width: 400px'>

In [36]:
# Make t-test if we have two samples with:
# Is it true that the mean of temperature is different between the molecules
from scipy import stats
import pandas as pd
sample1 = [84.7, 105, 98.9, 97.9, 108.7, 81.3, 99.4, 89.4 , 93, 119.3, 99.2, 99.4, 97.1, 112.4, 99.8, 94.7, 114, 95.1, 115.5, 111.5, 75.1]
sample2 = [57.2, 68.6, 104.4, 95.1, 89.9, 70.8, 83.5, 60.1, 75.7, 102, 69, 79.6, 68.9, 98.6, 76, 74.8, 56, 55.6, 55.6, 69.4, 59.5]

mean1 = 89.9
std1 = 11.3
n1 = 20

mean2 = 80.7
std2 = 11.7
n2 = 20

t = (mean1-mean2)/((std1**2/n1)+(std1**2/n1))**0.5
print('t-test value:', round(t, 2))
print('That means that our differense between the samples mean is deviate\
      from guessed general population mean by 2.57 sigma to the right')

# Now lets compute the probability of such or much more expressive
# deviation if Ho is True

df = n1+n2-2    # 38
p_value = 2*(1-stats.t.cdf(t, df))
print('P-value =', round(p_value, 3))
print('This result is statistically significant')

t-test value: 2.57
That means that our differense between the samples mean is deviate      from guessed general population mean by 2.57 sigma to the right
P-value = 0.014
This result is statistically significant


<p></p>

<img src='' style='width: 400px'>
<h2></h2>
<p></p>
<b></b>