In [1]:
## Import required Python modules
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy, scipy.stats
import io
import base64
#from IPython.core.display import display
from IPython.display import display, HTML, Image
from urllib.request import urlopen

try:
    import astropy as apy
    import astropy.table
    _apy = True
    #print('Loaded astropy')
except:
    _apy = False
    #print('Could not load astropy')

## Customising the font size of figures
plt.rcParams.update({'font.size': 14})

## Customising the look of the notebook
display(HTML("<style>.container { width:95% !important; }</style>"))
## This custom file is adapted from https://github.com/lmarti/jupyter_custom/blob/master/custom.include
HTML('custom.css')
#HTML(urlopen('https://raw.githubusercontent.com/bretonr/intro_data_science/master/custom.css').read().decode('utf-8'))

In [2]:
## Custom imports
from scipy.stats import binom, poisson, chi2, norm, uniform
from scipy.optimize import curve_fit
from math import ceil, pi
from numpy import exp
from matplotlib.collections import PatchCollection
from matplotlib.patches import Circle, Rectangle
#from matplotlib.colors import makeMappingArray
from matplotlib.cm import jet
import pandas as pd
from tqdm import trange

In [3]:
## Adding a button to hide the Python source code
HTML('''<script>
code_show=true;
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the Python code."></form>''')

<div class="container-fluid">
    <div class="row">
        <div class="col-md-8" align="center">
            <h1>PHYS 10792: Introduction to Data Science</h1>
            <!--<h3>2019-2020 Academic Year</h3><br>-->
        </div>
        <div class="col-md-3">
            <img align='center' style="border-width:0" src="images/UoM_logo.png"/>
        </div>
    </div>
</div>

<div class="container-fluid">
    <div class="row">
        <div class="col-md-2" align="right">
            <b>Course instructors:&nbsp;&nbsp;</b>
        </div>
        <div class="col-md-9" align="left">
            <a href="http://www.ajmarkwick.net/">Dr. Andrew Markwick</a> - Twitter <a href="https://twitter.com/AndrewMarkwick">@AndrewMarkwick</a><br>
            <a href="http://www.hep.manchester.ac.uk/u/gersabec">Prof. Marco Gersabeck</a> - Twitter <a href="https://twitter.com/MarcoGersabeck">@MarcoGersabeck</a>
        </div>
    </div>
</div>

# Chapters 7-10 - Formula Sheet

### 7.2 Chi-squared test

#### 7.2.1 General formulae

The formulae below are a recap from Chapter 5.

Main formula to calculate $\chi^2$:

$$\chi^2=\sum_{i=1}^n\frac{(y_i-f(x_i))^2}{\sigma_i^2}.$$

Main formula on which test is based:

$${\rm Prob}(\chi^2;N)=\int_{\chi^2}^{\infty}P(\chi'^2;N)d\chi'^2.$$

Need to distinguish $n$ and $N$.

#### 7.2.2 Application

Setting a threshold of $\chi^2$ or $\chi^2/N$ requires taking into account the corresponding probability.

As a consequence, a unique $\chi^2/N$ threshold for all $N$ does not make sense.

#### 7.4.1 Comparing samples with known $\sigma$

$$x_1-x_2=0?$$ 

The variance of the difference is

$$V_{12} = \sigma_1^2 + \sigma_2^2.$$

Compare the difference, $x_1-x_2$, to the combined uncertainty $\sigma_{12} = \sqrt{V_{12}}$.

### 7.5 Kolmogorov-Smirnov test and its application to the two-sample problem

#### 7.5.1 The Kolmogorov-Smirnov test

The KS test is based on normalised cumulative distributions and evaluating their greatest difference.

$$D=max|{\rm cum}(x)-{\rm cum}(P)|.$$

This needs to be normalised for the sample size.

$$d = D \sqrt{N}.$$

The value of $d$ then needs to be compared to a table of critical values, $c$, to determine the level, $\alpha$, beyond which the statement that both distributions are compatible is rejected, i.e. you require $d<c(\alpha)$. (The tabulated values for $c(\alpha)$ do not need to be learned by heart)

#### 7.5.2 The Kolmogorov-Smirnov test with two samples

For a two-sample test the formula becomes

$$D={\rm max}|{\rm cum}(x)-{\rm cum}(y)|,$$

with the normalisation

$$d=\sqrt{\frac{N_xN_y}{N_x+N_y}}D.$$

#### 8.1.4 Type I/II errors

The two cases where there is a mismatch between the hypothesis being true or false and the decision taken based on the test are called Type I and Type II error according to the following pattern:

| Hypothesis \ Decision | accept | reject |
|:-------------------|:----------:|:----------:|
| **true** | :) | Type I error |
| **false** | Type II error | :) |

#### 8.1.5 Significance and Power

**Significance**

Type I errors are inevitable and the rate at which they occur is called significance.
The significance, $\alpha$, is the integral of the probability distribution of the hypothesis over the rejection region:

$$\alpha=\int_{Reject}P_H(x)dx.$$

**Power**

Considering the alternative hypothesis, we can define the integral of the probability distribution of the alternative hypothesis over the acceptance region, in other words the rate of Type II errors, as

$$\beta=\int_{Accept}P_A(x)dx,$$

or, by integrating of the rejection region as above, we get

$$1-\beta=\int_{Reject}P_A(x)dx,$$

where $1-\beta$ is called the power of the test.

#### 8.2.1 Hypothesis tests with a discrete distribution

In a Poisson test of the hypothesis that a counting experiment results in a count compatible with a certain mean $\lambda$ or smaller is:
$$1-\alpha\lt\int_{Accept}Poisson(x;\lambda)dx=\sum_{x=0}^{n}Poisson(x;\lambda)$$
for significance $\alpha$ and $n$ the limit of the acceptance region. 

#### 9.1.3 Subjective probability

$$P({\rm theory}\,|\,{\rm result})=\frac{P({\rm result}\,|\,{\rm theory})}{P({\rm result}\,|\,{\rm theory})P({\rm theory})+P({\rm result}\,|\,{\rm not~theory})[1-P({\rm theory})]}P({\rm theory}).$$

If a result is equally likely regardless of whether or not the theory is true, i.e. $P({\rm result}\,|\,{\rm theory})=P({\rm result}\,|\,{\rm not~theory})$, there is no information gain as this results in $P({\rm theory}\,|\,{\rm result})=P({\rm theory}).$

The other extreme is that the result is much more likely to occur if the theory is true, i.e. $P({\rm result}\,|\,{\rm theory})\gg P({\rm result}\,|\,{\rm not~theory})$, which leads to the observation of the result being highly predictive as $P({\rm theory}\,|\,{\rm result})\approx 1$.

In [4]:
# get the CL corresponding to a given range +/- n sigma
def CLfromSigma(n):
    return norm.cdf(n)-norm.cdf(-n)

# get the +/- n sigma range corresponding to a given CL
def SigmaFromCL(cl):
    return norm.ppf(1-0.5*(1-cl))

print('sigma | C.L.       | to remember')
print('-------------------------------------')
n = 1
cl = CLfromSigma(n)
e = 1-cl
digits = abs(ceil(np.log10(e)))+3
print('{:5d} | {:.{width}f}      | 68%'.format(n,cl,width=digits))
n = 2
cl = CLfromSigma(n)
e = 1-cl
digits = abs(ceil(np.log10(e)))+2
print('{:5d} | {:.{width}f}      | 95%'.format(n,cl,width=digits))
n = 3
cl = CLfromSigma(n)
e = 1-cl
digits = abs(ceil(np.log10(e)))+2
print('{:5d} | {:.{width}f}     | 99.7%'.format(n,cl,width=digits))
n = 4
cl = CLfromSigma(n)
e = 1-cl
digits = abs(ceil(np.log10(e)))+2
print('{:5d} | {:.{width}f}   | < 1 in 10,000'.format(n,cl,width=digits))
n = 5
cl = CLfromSigma(n)
e = 1-cl
digits = abs(ceil(np.log10(e)))+2
print('{:5d} | {:.{width}f} | < 1 in 1,000,000'.format(n,cl,width=digits))

    
print()
print('C.L. | sigma')
print('------------')
for cl in [0.90,0.95]:
    n = SigmaFromCL(cl)
    print('{:4.2f} | {:.2f}'.format(cl,n))

sigma | C.L.       | to remember
-------------------------------------
    1 | 0.683      | 68%
    2 | 0.954      | 95%
    3 | 0.9973     | 99.7%
    4 | 0.999937   | < 1 in 10,000
    5 | 0.99999943 | < 1 in 1,000,000

C.L. | sigma
------------
0.90 | 1.64
0.95 | 1.96


### 9.3 Examples of confidence intervals

#### 9.3.1 Binomial confidence intervals

The confidence interval covered by the range $k_-$ to $k_+$ is _at least_ $C$. This is given by the following constructions

$$\sum_{k=0}^{k_+}P(k;p,n)\geq 1-(1-C)/2.$$

and

$$\sum_{k=k_-}^{n}P(k;p,n)\geq 1-(1-C)/2.$$

If we are to construct bands with $C=0.9$, these two equations mean that we have to construct one-sided intervals that each cover at least $0.95$.
Their intersection, i.e. the range $k_-$ to $k_+$ will then cover at least $0.9$.

Finally, if $m$ successes are observed, the limits on the true probability interval can be assigned with $p_-$ and $p_+$ given by

$$\sum_{k=m+1}^{n}P(k;p_+,n)= 1-(1-C)/2,$$

and

$$\sum_{k=0}^{m-1}P(k;p_-,n)= 1-(1-C)/2.$$

In practice, these are the outward-facing corners of the confidence belt at a position $k=m$. These are also known as the _Clopper-Pearson confidence limits_.

#### 9.3.2 Poisson confidence intervals

To construct intervals of confidence level $C$, we need the greatest value of $k_-$ that satisfies for a given $\lambda$

$$\sum_{k=k_-}^\infty P(k;\lambda)\geq 1-(1-C)/2.$$

This is equivalent to

$$\sum_{k=0}^{k_--1} P(k;\lambda) \leq (1-C)/2,$$

which is easier to calculate.

Accordingly, we require the smallest $k_+$ that satisfies for a given $\lambda$

$$\sum_{k=0}^{k_+} P(k;\lambda) \geq 1-(1-C)/2.$$

### 10.1 Coverage

#### 10.1.1 Definition of coverage

The construction of confidence belts, which we discussed previously, is based on defining horizontal intervals according to a certain confidence level $C$. These can be constructed as central confidence intervals according to

$$P(x<x_1|\mu)=P(x>x_2|\mu)=(1-C)/2,$$

or as upper confidence limit intervals

$$P(x<x_1|\mu)=1-C.$$

For a given measured value of $x_0$, these then lead to an interval for $\mu$ with

$$P(\mu\in[\mu_1,\mu_2])=C.$$

This statement means that the unknown true value of $\mu$, $\mu_t$ lies within the interval $[\mu_1,\mu_2]$ in a fraction $C$ of the experiments conducted.

#### 10.1.2 Measurement of a constrained quantity

Bayesian construction with a normalisation that takes the physical limit of $\mu>0$ into account.

$$P(\mu|x)=\frac{e^{-(x-\mu)^2/2\sigma^2}}{\int_0^\infty e^{-(x-\mu')^2/2\sigma^2}d\mu'}(\mu>0).$$

This construction will then lead to one limit being zero, i.e. we set an upper limit. 

<div class="well" align="center">
    <div class="container-fluid">
        <div class="row">
            <div class="col-md-3" align="center">
                <img align="center" alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" width="60%">
            </div>
            <div class="col-md-8">
            This work is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>).
            </div>
        </div>
    </div>
    <br>
    <br>
    <i>Note: The content of this Jupyter Notebook is provided for educational purposes only.</i>
</div>