# Parametric Statistical Significance Tests

Parametric statistical tests assume that a data sample was drawn from a specific population
distribution. They often refer to statistical tests that assume the Gaussian distribution. Because
it is so common for data to fit this distribution.

### A typical question we may have about two or more samples of data is whether they have the same distribution -  that is a data distribution with the same mean and standard deviation?

In [4]:
import numpy as np
np.random.seed(1)

# generate two sets of univariate observations
MEAN_1 = 50
MEAN_2 = 51
STD = 5
SAMPLE_SIZE=100

data1 = np.random.normal(MEAN_1, STD, size=SAMPLE_SIZE)
data2 = np.random.normal(MEAN_2, STD, size=SAMPLE_SIZE)
# summarize
print('data1: mean=%.3f stdv=%.3f' % (np.mean(data1), np.std(data1)))
print('data2: mean=%.3f stdv=%.3f' % (np.mean(data2), np.std(data2)))


data1: mean=50.303 stdv=4.426
data2: mean=51.764 stdv=4.660


#  Student’s t-Test
The Student’s t-test is a statistical hypothesis test that :

two independent data samples known to have a Gaussian distribution, have the SAME Gaussian distribution.


* Fail to Reject H0: No difference between the sample means.

* Reject H0: Some difference between the sample means.

The assumption or null hypothesis of the test is that the means of two populations are equal.
A rejection of this hypothesis indicates that there is sufficient evidence that the means of the
populations are different, and in turn that the distributions are not equal.


In [8]:
from scipy.stats import ttest_ind
stat, p = ttest_ind(data1, data2)
print('Statistics=%.3f, p=%.3f' % (stat, p))

alpha = 0.05

if p > alpha:
    print('Same distributions (fail to reject H0)')
else:
    print('Different distributions (reject H0)')

Statistics=-2.262, p=0.025
Different distributions (reject H0)


## For different variance in samples:

In [23]:
# generate two sets of univariate observations
MEAN_1 = 50
MEAN_2 = 51
STD1 = 5
STD2 = 7

SAMPLE_SIZE=100

data1 = np.random.normal(MEAN_1, STD1, size=SAMPLE_SIZE)
data2 = np.random.normal(MEAN_2, STD2, size=SAMPLE_SIZE)
# summarize
print('data1: mean=%.3f stdv=%.3f' % (np.mean(data1), np.std(data1)))
print('data2: mean=%.3f stdv=%.3f' % (np.mean(data2), np.std(data2)))

data1: mean=50.938 stdv=5.670
data2: mean=50.132 stdv=7.261


In [24]:
stat, p = ttest_ind(data1, data2, equal_var=False)
print('Statistics=%.3f, p=%.3f' % (stat, p))

alpha = 0.05

if p > alpha:
    print('Same distributions (fail to reject H0)')
else:
    print('Different distributions (reject H0)')

Statistics=0.871, p=0.385
Same distributions (fail to reject H0)


# Paired Student’s t-Test

## Resources used

Statistical Methods for Machine Learning. Discover How to Transform Data into Knowledge with Python (Brownlee) 1,4 ed (2019)