# Pearson $\chi^2$ test
**Assumptions**:  
1. $\Delta y \gg \Delta x$
2. $y$ is a Gaussian distribution
3. all data points are independent

Given model: $y = y(x, m)$, based on above assumptions:
\begin{equation}
\begin{split}
P_i &= \frac{1}{\sqrt{2\pi\,\sigma_i^2}}\exp(-\frac{(y(x_i, m)-y_i)^2}{2\sigma_i^2})\\
P &= \prod_i P_i\\
&=\prod_i \frac{1}{\sqrt{2\pi\,\sigma_i^2}} \exp\Bigl(-\frac{1}{2} \sum_i \frac{(y(x_i, m)-y_i)^2}{\sigma_i^2}\Bigr)
\end{split}
\end{equation}
**Define $\chi^2$**
\begin{equation}
\sum_i \frac{(y(x_i, m)-y_i)^2}{\sigma_i^2}
\end{equation}

## if $y(x, m)=a + bx$ is a linear model:
$S_x = \sum_i\frac{x_i}{\sigma_i^2}$
\begin{align}
\Delta &= S\,S_{xx} - S_x^2\\
a &= \frac{S_{xx}S_y - S_xS_{xy}}{\Delta}\\
b &= \frac{S\,S_{xy} - S_xS_y}{\Delta}\\
\end{align}

### Another form of expression
\begin{align}
t_i &=\frac{1}{\sigma_i}\Bigl[x_i - \frac{S_x}{S}\Bigr]\\
S_{tt} & = \sum_i t_i^2\\
b & = \frac{1}{S_{tt}} \sum_i \frac{t_i\, y_i}{\sigma_i}\\
a & = \frac{S_y -S_x\,b}{S}\\
\end{align}

In [16]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib osx
#plt.style.use('ggplot')
import seaborn as sns

In [96]:
x = np.linspace(1, 4, 100)
sigma = 0.05


def linearFit(x, y, sigma=1.0):
    if type(sigma) is not list:
        sigma = np.array([sigma] * len(x))
    Sx = np.sum(x/sigma**2)
    Sy = np.sum(y/sigma**2)
    S = np.sum(1/sigma**2)
    t = 1/sigma * (x - Sx/S)
    Stt = np.sum(t**2)
    b = 1.0/(Stt) * np.sum(t * y / sigma)
    a = (Sy - Sx * b) / S

    Sxx= np.sum(x**2/sigma**2)
    Delta = S*Sxx - Sx**2
    da = np.sqrt(Sxx/Delta)
    db = np.sqrt(S/Delta)
    return a, b, da, db

aList = []
bList = []
daList = []
dbList = []
chisqList = []
for i in xrange(10000):
    y = 2*x + 1 + np.random.normal(0.0, scale = sigma, size = len(x))
    a, b, da, db = linearFit(x, y, sigma)
    chisq = np.sum((y - a - b*x)**2/sigma**2)
    chisqList.append(chisq)
    aList.append(a)
    bList.append(b)
    daList.append(da)
    dbList.append(db)
plt.close('all')   

da = daList[0]
db = dbList[0]
fig = plt.figure(figsize=(12,6))
ax1 = fig.add_subplot(121)
na, bina = np.histogram(aList, bins=30)
na = na/float(na.max())
bina = (bina[:-1] + bina[1:])/2
binsizea = bina[1] - bina[0]
ax1.plot(bina + 0.5 * binsizea, na, linestyle='steps')
ax1.plot(bina, np.exp(-(bina - 1.0)**2/(2*da**2)))
ax1.set_xlabel('$a$')
ax2 = fig.add_subplot(122)
nb, binb = np.histogram(bList, bins=30)
nb = nb/float(nb.max())
binb = (binb[:-1] + binb[1:])/2
binsizeb = binb[1] - binb[0]
ax2.plot(binb + 0.5 * binsizeb, nb, linestyle='steps')
ax2.plot(binb, np.exp(-(binb - 2.0)**2/(2*db**2)))
ax2.set_xlabel('$b$')
for ax in fig.axes:
    ax.set_ylabel('P density')


fig = plt.figure()
ax = fig.add_subplot(111)
ax.errorbar(aList, bList, xerr=da, yerr=db, fmt='.', ms=3, lw=0.1)


<Container object of 3 artists>

## Error Estimation
\begin{equation}
\frac{\partial{a}}{\partial{y_i}} = \frac{1}{\Delta} [S_{xx}\frac{1}{\sigma_i^2}-S_x\frac{x_i}{\sigma_i^2}]
\end{equation}
\begin{align}
\sigma_a^2 &= \frac{S_{xx}}{\Delta}\\
\sigma_b^2 &= \frac{S}{\Delta}
\end{align}

## Pearson's theorem
Suppose there are $K$ independent varibles $Z_k$ with 0 mean Gaussian distribution
$$
x_K^2 = Q = \sum_i Z_i^2
$$
$$
P(x_K^2) = \frac{1}{2^{K/2}\Gamma(K/2)} \, \chi^{K/2 - 1} \, \exp(-\chi/2)
$$
$P$ is called $\chi^2$ distribution

In [97]:

from scipy.special import gamma
nchisq, binchisq = np.histogram(chisqList, bins=30)
binsizechisq = binchisq[1] - binchisq[0]
plt.bar(binchisq[:-1], nchisq/float(max(nchisq)), binsizechisq)

def pearson(chisq, K):
    return 1./(2**(K/2.) * gamma(K/2.)) * chisq**(K/2. - 1) * np.exp(-chisq/2)

probchisq = pearson(binchisq, len(x) - 2)
plt.plot(binchisq, probchisq/float(max(probchisq)))

[<matplotlib.lines.Line2D at 0x128fe9e90>]

array([  7.25825071e-08,   4.16646041e-06,   2.14651903e-05,
         6.05833301e-05,   1.29133817e-04,   2.33797789e-04,
         3.80390548e-04,   5.73923855e-04,   8.18664736e-04,
         1.11819096e-03,   1.47544336e-03,   1.89277515e-03,
         2.37199839e-03,   2.91442769e-03,   3.52092143e-03,
         4.19192042e-03,   4.92748430e-03,   5.72732569e-03,
         6.59084229e-03,   7.51714693e-03,   8.50509578e-03,
         9.55331470e-03,   1.06602240e-02,   1.18240615e-02,
         1.30429041e-02,   1.43146879e-02,   1.56372268e-02,
         1.70082303e-02,   1.84253194e-02,   1.98860420e-02,
         2.13878863e-02])

In [91]:
K

8