**1. Normality Tests**

- Shapiro-Wilk Test

**2. Correlation Tests**

- Pearson’s Correlation Coefficient

- Spearman’s Rank Correlation

- Kendall’s Rank Correlation

- Chi-Squared Test

**3. Parametric Statistical Hypothesis Tests**

- Student’s t-test

- Paired Student’s t-test

**1. Normality Tests**

This section lists statistical tests that you can use to check if your data has a **Gaussian distribution.**

**Shapiro-Wilk Test** 

Tests whether a data sample has a Gaussian distribution.

**Assumptions**

Observations in each sample are independent and identically distributed (iid).

**Interpretation**

H0: the sample has a Gaussian distribution.

H1: the sample does not have a Gaussian distribution.


In [1]:
# Example of the Shapiro-Wilk Normality Test
from scipy.stats import shapiro
data = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
stat, p = shapiro(data)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably Gaussian')
else:
	print('Probably not Gaussian')

stat=0.895, p=0.193
Probably Gaussian


**2. Correlation Tests**

This section lists statistical tests that you can use to check if two samples are related.

**Pearson’s Correlation Coefficient**

Tests whether two samples have a linear relationship.

**Assumptions**

Observations in each sample are independent and identically distributed (iid).

Observations in each sample are normally distributed.

Observations in each sample have the same variance.

**Interpretation**

H0: the two samples are independent.

H1: there is a dependency between the samples.

In [4]:
# Example of the Pearson's Correlation test

from scipy.stats import pearsonr
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [0.353, 3.517, 0.125, -7.545, -0.555, -1.536, 3.350, -1.578, -3.537, -1.579]
stat, p = pearsonr(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably independent')
else:
	print('Probably dependent')

stat=0.688, p=0.028
Probably dependent


**Spearman’s Rank Correlation**

Tests whether two samples have a monotonic relationship.

**Assumptions**

Observations in each sample are independent and identically distributed (iid).

Observations in each sample can be ranked.

**Interpretation**

H0: the two samples are independent.

H1: there is a dependency between the samples.

In [5]:
# Example of the Spearman's Rank Correlation Test

from scipy.stats import spearmanr
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [0.353, 3.517, 0.125, -7.545, -0.555, -1.536, 3.350, -1.578, -3.537, -1.579]
stat, p = spearmanr(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably independent')
else:
	print('Probably dependent')

stat=0.855, p=0.002
Probably dependent


**Kendall’s Rank Correlation**

Tests whether two samples have a monotonic relationship.

**Assumptions**

Observations in each sample are independent and identically distributed (iid).
Observations in each sample can be ranked.

Interpretation

H0: the two samples are independent.

H1: there is a dependency between the samples.

In [6]:
# Example of the Kendall's Rank Correlation Test

from scipy.stats import kendalltau
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [0.353, 3.517, 0.125, -7.545, -0.555, -1.536, 3.350, -1.578, -3.537, -1.579]
stat, p = kendalltau(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably independent')
else:
	print('Probably dependent')

stat=0.733, p=0.002
Probably dependent


**Chi-Squared Test**

Tests whether two categorical variables are related or independent.

**Assumptions**

Observations used in the calculation of the contingency table are independent.

25 or more examples in each cell of the contingency table.

**Interpretation**

H0: the two samples are independent.

H1: there is a dependency between the samples.

In [7]:
# Example of the Chi-Squared Test

from scipy.stats import chi2_contingency
table = [[10, 20, 30],[6,  9,  17]]
stat, p, dof, expected = chi2_contingency(table)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably independent')
else:
	print('Probably dependent')

stat=0.272, p=0.873
Probably independent


**3. Parametric Statistical Hypothesis Tests**



This section lists statistical tests that you can use to compare data samples.

**Student’s t-test**

Tests whether the means of two independent samples are significantly different.

**Assumptions**

Observations in each sample are independent and identically distributed (iid).

Observations in each sample are normally distributed.

Observations in each sample have the same variance.

**Interpretation**

H0: the means of the samples are equal.

H1: the means of the samples are unequal.

In [8]:
# Example of the Student's t-test

from scipy.stats import ttest_ind
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
stat, p = ttest_ind(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably the same distribution')
else:
	print('Probably different distributions')

stat=-0.326, p=0.748
Probably the same distribution


**Paired Student’s t-test**

Tests whether the means of two paired samples are significantly different.

**Assumptions**

Observations in each sample are independent and identically distributed (iid).

Observations in each sample are normally distributed.

Observations in each sample have the same variance.

Observations across each sample are paired.

**Interpretation**

H0: the means of the samples are equal.

H1: the means of the samples are unequal.

In [9]:
# Example of the Paired Student's t-test

from scipy.stats import ttest_rel
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
stat, p = ttest_rel(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably the same distribution')
else:
	print('Probably different distributions')

stat=-0.334, p=0.746
Probably the same distribution
