## Kolmogorov–Smirnov test

The Kolmogorov-Smirnov test (KS-test) is used to decide if a sample comes from a population with a specific distribution.

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

In [2]:
def ks_test(data, pdf, guess_args):
    """
    H0: the data follow a given distribution
    d < critical value (cv)  -->> accept H0
    
    Parameters
    ----------
    data: 1d array
        empirical data 
    pdf: callable
        probability density function
    guess_args: float, 1d array
        pdf parameters guess
        
    Returns
    -------
    d: float
        KS-test stats
    cv: float
        critical value (Significance level:  α = 0.05)
    m: Boolean
        KS-test result: Accept H0 (True); reject H0 (False)
    """
    
    # get CDF of empirical data
    counts, bin_edges = np.histogram (data, bins=len(data), density=True)
    ecdf = np.cumsum(counts)/(np.cumsum(counts)[-1])
    
    # fit data to the model 
    y, x = np.histogram(data, bins='auto', density=True)
    x = (x + np.roll(x, -1))[:-1] / 2.0
    popt, pcov = curve_fit(pdf, x, y, p0=guess_args)
    
    # compute CDF from the fitted PDF
    cdf_scaled = np.cumsum(pdf(bin_edges[1:], *popt))
    cdf = cdf_scaled/cdf_scaled[-1]
    
    # perform KS-test
    gaps = np.column_stack([cdf - ecdf, ecdf - cdf])
    d = np.max(gaps)
    cv = 1.36/np.sqrt(len(data))
    
    if d < cv:
        m = True
    if d > cv:
        m = False
    
    return [d, cv, m]

### Example 1
Check if the simulated data follow a comple double Gaussian distribution, which is the real part of the product of two independent complex Gaussian distributions.

In [3]:
# generate CNN data z
x = np.random.normal(size=(2, 1000)).view(np.complex128)
z = x[0, :] * x[1, :]

In [4]:
# define CNN PDF
def real_pdf(z, s):
    """
    Compute the PDF of the CNN distribution
    
    Parameters
    ----------
    z : float, array_like
        Input data.
    s : float
        Standard deviation of the input data.
    
    Return
    -------
    p : float, array_like
        Return the PDF.
    """
    a = 1/(s)
    b = (-np.abs(2*z))/(s)
    p = a*np.exp(b)
    return p

In [5]:
# Perform KS-test
d, cv, result = ks_test(z.real, real_pdf, 1)
print("The KS-test stats $D_n$ is %.3f." % (d))
print("The critical value (significance level 0.05) is %.3f." % (cv))
if result:
    print("The data follow the given distribution.")
else:
    print("The data do not follow the given distribution.")

The KS-test stats $D_n$ is 0.055.
The critical value (significance level 0.05) is 0.061.
The data follow the given distribution.


### Example 2
Check if the simnulated data follow a distribution of the difference of two CNN distributions mentioned in Example 1.

In [6]:
# generate data n
x = np.random.normal(size=(4, 1000)).view(np.complex128)
n = (x[0, :]*x[1, :]).real - (x[2, :]*x[3, :]).real

In [7]:
# define delta CNN PDF
def null_pdf(x, s):    
    """
    Compute the PDF of the delta delta CNN distribution
    
    Parameters
    ----------
    x : float, array_like
        Input data.
    s : float
        Standard deviation of the input data.
    
    Return
    -------
    pdf : float, array_like
        Return the PDF.
    """
    pdf = (1/(2*s**2))*np.exp(-2*np.abs(x)/(s))*(s+2*np.abs(x))
    return pdf

In [8]:
# Perform KS-test
d, cv, result = ks_test(n, null_pdf, 1)
print("The KS-test stats $D_n$ is %.3f." % (d))
print("The critical value (significance level 0.05) is %.3f." % (cv))
if result:
    print("The data follow the given distribution.")
else:
    print("The data do not follow the given distribution.")

The KS-test stats $D_n$ is 0.033.
The critical value (significance level 0.05) is 0.061.
The data follow the given distribution.
