# Non-Parametric Tests

#### In the previous chapter, we discussed parametric tests. Parametric tests have strong statistical power but also require adherence to strong assumptions. When the assumptions are not satisfied, the test results are not valid. Fortunately, we have alternative tests that can be used when the assumptions of a parametric test are not satisfied. These tests are called non-parametric tests, meaning that they make no assumptions about the underlying distribution of the data. While non-parametric tests do not require distributional assumptions, these tests will still require the samples to be independent.

In [1]:
import numpy as np

low_temp = np.array([0, 0, 0, 0, 0, 1, 1])
high_temp = np.array([1, 2, 3, 1])

## The Rank-Sum test

#### When the assumptions of the t-test are not met, the Rank-Sum test is often a good non-parametric alternative test. While the t-test can be used to test for the difference between the means of two distributions, the Rank-Sum test is used to test for the difference between the locations of two distributions. This difference in the test utility is due to the lack of parametric assumptions in the Rank-Sum test. The null hypothesis of the Rank-Sum test is that the distribution underlying the first sample is the same as the second sample. If the sample distributions appear to be similar, this allows us to use the Rank-Sum test to test for the difference in the locations of the two samples. As stated, the Rank-Sum test cannot specifically be used for testing the difference between means because it does not require assumptions about the sample distributions.

## The Signed-Rank test

#### The Wilcoxon Signed-Rank test is a non-parametric alternative version of the paired t-test that is used when the assumption of normality is violated. This test is robust to outliers because of the use of ranks and medians instead of means in the null and alternative hypotheses. As indicated by the name of the test, it uses the magnitudes of differences between two stages and their signs


In [3]:
import scipy.stats as stats
import numpy as np
before_treatment = np.array([37, 14, 22, 12, 24, 35, 35, 51, 39])
after_treatment = np.array([38,17, 19, 7, 15, 25, 24, 38, 19])
# Signed Rank Test
stats.wilcoxon(before_treatment, after_treatment, alternative = 'greater')

WilcoxonResult(statistic=41.5, pvalue=0.013671875)

## The Kruskal-Wallis test

#### Another non-parametric test we will now discuss is the Kruskal-Wallis test. It is an alternative to the one-way ANOVA test when the normality assumption is not satisfied. It uses the medians instead of the means to test whether there are statistically significant differences between two or more independent groups. Let us consider a generic example of three independent groups:

In [4]:
from scipy import stats
group1 = [8, 13, 13, 15, 12, 10, 6, 15, 13, 9]
group2 = [16, 17, 14, 14, 15, 12, 9, 12, 11, 9]
group3 = [7, 8, 9, 9, 4, 15, 13, 9, 11, 9]
#Kruskal-Wallis Test
stats.kruskal(group1, group2, group3)

KruskalResult(statistic=5.7342701722574905, pvalue=0.056861597028239855)

## Chi-square distribution

#### Researchers are often faced with the need to test hypotheses on categorical data. The parametric tests covered in Chapter 4, Parametric Tests, are often not very helpful for this type of analysis. In the last chapter, we discussed using an F-test to compare sample variances. Extending that concept, we can consider the non-parametric and non-symmetric chi-square probability distribution, which is a distribution useful for comparing the means of sampling distribution variances to their population variances, specifically when the mean of a sampling distribution of sample variances is expected to equal the population variance under the null hypothesis. Because variance cannot be negative, the distribution starts at an origin of 0. Here, we can see the chi-square distribution:

In [5]:
from statsmodels.stats.gof import chisquare
from scipy.stats import chi2
chi_square_stat, p_value = chisquare(f_obs=[45, 30, 15],
    f_exp=[30, 30, 30])
chi_square_critical_value = chi2.ppf(1-.05, df=2)
print('Chi-Square Test Statistic: %.4f'%chi_square_stat)
print('Chi-Square Critical Value: %.4f'%chi_square_critical_value)
print('P-Value: %.4f'%p_value)

Chi-Square Test Statistic: 15.0000
Chi-Square Critical Value: 5.9915
P-Value: 0.0006


## Chi-square test of independence

In [6]:
from scipy.stats import chi2_contingency
from scipy.stats import chi2
import numpy as np
observed_frequencies = np.array([[1429, 1235], [1216934, 22663]])
chi_Square_test_statistic, p_value, degrees_of_freedom, expected_frequencies = chi2_contingency(observed_frequencies)
chi_square_critical_value = chi2.ppf(1-.05, df=degrees_of_freedom)
print('Chi-Square Test Statistic: %.4f'%chi_Square_test_statistic)
print('Chi-Square Critical Value: %.4f'%chi_square_critical_value)
print('P-Value: %.4f'%p_value)

Chi-Square Test Statistic: 27915.1221
Chi-Square Critical Value: 3.8415
P-Value: 0.0000


## Chi-square goodness-of-fit test power analysis

#### Let’s use an example where a phone vendor sells four popular models of phones, models A, B, C, and D. We want to determine how many samples are required to produce a power of 0.8 so we can understand whether there is a statistically significant difference between the popularity of different phones so the vendor can more properly invest in phone acquisitions. In this case, the null hypothesis asserts that 25% of phones from each model were sold. In reality, 20% of phones sold were model A, 30% were model B, 19% were model C, and 31% were model D phones.

In [7]:
from statsmodels.stats.power import GofChisquarePower
from statsmodels.stats.gof import chisquare_effectsize
# probs0 asserts 25% of each brand are sold
# In reality, 12% of Brand A, 25% of Brand B, 33% sold were Brand C, and 1% were Brand D.
effect_size = chisquare_effectsize(probs0=[25, 25, 25, 25], probs1=[20, 30, 19, 31], cohen=True)
alpha = 0.05
n_bins=4 # 4 brands of phones
analysis = GofChisquarePower()
result = analysis.solve_power(effect_size, nobs=224, alpha=alpha, n_bins=n_bins)
print('Sample Size Required in Sample 1: {:.3f}'.format(
    result))
# Sample Size Required in Sample 1: 0.801

Sample Size Required in Sample 1: 0.801


## Spearman’s rank correlation coefficient

#### In Chapter 4, Parametric Tests, we looked at the parametric correlation coefficient, Pearson’s correlation, where the coefficient is calculated from independently sampled, continuous data. However, when we have ranked, ordinal data, such as that from a satisfaction survey, we would not want to use Pearson’s correlation as it cannot be assumed to guarantee the preservation of order. As with Pearson’s correlation coefficient, Spearman’s correlation coefficient results in a coefficient, r, that ranges from -1 to 1, with -1 being a strong inverse correlation and 1 being a strong direct correlation. Spearman’s is derived by dividing the covariance of the two variables’ ranks by the product of their standard deviations.

#### Suppose we have students being judged in a competition by two judges and there is a concern that one of the judges may be biased toward some of the participants based on confounding factors, such as family ties, rather than performance alone. We decide to run a correlation analysis on the scores to test the hypothesis the two judges scored similarly for each contestant:

In [8]:
from scipy.stats import spearmanr
import pandas as pd
df_scores = pd.DataFrame({'Judge A':[1, 3, 5, 7, 8, 3, 9],
                          'Judge B':[2, 5, 3, 9, 6, 1, 7]})

In [9]:
correlation, p_value = spearmanr(df_scores['Judge A'],
    df_scores['Judge B'])
print('Spearman Correlation Coefficient: %.4f'%correlation)
print('P-Value: %.4f'%p_value)

Spearman Correlation Coefficient: 0.7748
P-Value: 0.0408


#### Since the p-value is significant and the correlation coefficient is 0.77 – and a strong correlation coefficient starts at approximately 0.7 – we may conclude that the judges’ scores are directly correlated enough to assume there is no bias in scoring present, assuming a relatively objective method for ranking exists; something more subjective may not be as suitable for correlation analysis.