# Parametric Tests

#### The Kolmogorov-Smirnov test focuses more on the centrality of the data. Consequently, however, the test has less power if there is a wide variance around the center of the data. Anderson-Darling focuses more on the tails of the data than the center and is more likely to identify non-conformity to normality if data is heavy-tailed with extreme outliers. These two tests perform well on large sample sizes but do not have as much power when sample sizes are lower. The third test we consider, Shapiro-Wilk, is more general than the Kolmogorov-Smirnov and Anderson-Darling tests and therefore more robust to small sample sizes. Based on these traits, it may be more useful to use Shapiro-Wilk tests in an automated pipeline. Alternatively, it may be better to lower the level of confidence for the test being applied.

## Kolmogorov-Smirnov

#### The Kolmogorov-Smirnov test can be used to test the null hypothesis that a given sample distribution is normally distributed. This version of the Kolmogorov-Smirnov test is the one-sample goodness-of-fit test, which performs analysis against a benchmark cumulative density distribution. When running the kstest function in the scipy.stats module, using stats.norm.cdf (scipy’s cumulative density function) performs this one-sample version of the test. The two-sample version tests against a specified distribution to determine whether the two distributions match. In the two-sample case, the distribution to be tested must be provided as a numpy array instead of the stats.norm.cdf function used in the code snippet shown below Figure 4.3. However, this is outside of the scope of testing for normality, so we will not look at this.

#### Kolmogorov-Smirnov measures a calculated test statistic against a table-based critical value (kstest calculates this internally). As with other hypothesis tests, if the test statistic is larger than the critical value, the null hypothesis that the given distribution is normally distributed can be rejected. This can also be assessed if the p-value is low enough to be significant. The test statistic is calculated as the absolute value of the maximum distance between all data points in the given distribution against the cumulative density function.

## KOLMOGOROV-SMIRNOV SPECIAL REQUIREMENT

#### The Kolmogorov-Smirnov test requires data to be centered around zero and scaled to a standard deviation of one. All data must be transformed for the test, but inference can be applied to the pre-transformed distribution; the centered and scaled distribution does not need to be the distribution used in further statistical testing or analysis.

In [2]:
from scipy import stats
import numpy as np
mu, sigma = 0, 1
normally_distributed = np.random.normal(mu, sigma, 1000)

stats.kstest(normally_distributed,
             stats.norm.cdf)

KstestResult(statistic=0.030329008345265418, pvalue=0.31020996163951164, statistic_location=0.4115653723352908, statistic_sign=1)

In [3]:
stats.kstest(np.exp(normally_distributed), stats.norm.cdf)

KstestResult(statistic=0.5310428209444901, pvalue=9.943698899899565e-264, statistic_location=0.10810253397924445, statistic_sign=-1)

In [5]:
mu, sigma = 100, 2
normally_distributed = np.random.normal(mu, sigma, 1000)
normally_distributed_scaled = (
normally_distributed-normally_distributed.mean()) /normally_distributed.std()
stats.kstest(normally_distributed_scaled, stats.norm.cdf)

KstestResult(statistic=0.018687284805557258, pvalue=0.869393452090583, statistic_location=0.6016989191439979, statistic_sign=1)

## Anderson-Darling

#### Similar to the Kolmogorov-Smirnov test, the Anderson-Darling test measures a given distribution against a normally distributed distribution. In scipy’s anderson test, we can test against other distributions, but the default argument specifying a normal distribution, dist="norm", assumes a null hypothesis that the given distribution is statistically the same as a normally distributed distribution. For each distribution tested against, a different set of critical values must be calculated.