# Statistical Analysis II - Practicum 2

## Non-parametric statistics

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import stats

### Kolmogorov-Smirnov test

Examples from [the Scipy website](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html)

The one-sample test compares the underlying distribution F(x) of a sample against a given distribution G(x) (e.g. normal, uniform). 

The two-sample test compares the underlying distributions of two independent samples. 

Both tests are **valid only** for continuous distributions and measure the maximum distance between distributions.

In [None]:
# Import random seed
rng = np.random.default_rng() #seed = 42

#Produce a distribution
F_x = stats.uniform.rvs(size=100, random_state=rng)
print(F_x)

#Compare it against a standard distribution
G_x = stats.norm.cdf

stats.kstest(F_x,G_x)

#### Question: What would have happened if the sample distribution was larger?

In [None]:
#Produce a distribution
F1_x = stats.uniform.rvs(size=10, random_state=rng)
print(F1_x)

stats.kstest(F1_x,G_x)

Let's look into another example

In [None]:
F2_x = stats.norm.rvs(size=1000, random_state=rng)
print(F2_x)
stats.kstest(F2_x, G_x)

As expected, the p-value is not below our threshold of 0.05, so we cannot reject the null hypothesis.

Let us assume, however, that the random variates are distributed according to a normal distribution that is shifted toward greater values. 
In this case, the cumulative density function (CDF) of the underlying distribution tends to be less than the CDF of the standard normal. 

In [None]:
F3_x = stats.norm.rvs(size=100, loc=0.5, random_state=rng)
print(F3_x.mean())
stats.kstest(F3_x, G_x, alternative='less')

#### What would have happened if I had used another alternative condition?

Two-sample tests can also be performed.

In [None]:
F4_x = stats.laplace.rvs(size=105, random_state=rng)
F5_x = stats.laplace.rvs(size=95, random_state=rng)

In [None]:
plt.plot(np.linspace(0,1,105),np.sort(F4_x))
plt.plot(np.linspace(0,1,95),np.sort(F5_x))
plt.show()

In [None]:
stats.kstest(F4_x, F5_x)