In [1]:
## Import required Python modules
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy, scipy.stats
import io
import base64
#from IPython.core.display import display
from IPython.display import display, HTML, Image
from urllib.request import urlopen

try:
    import astropy as apy
    import astropy.table
    _apy = True
    #print('Loaded astropy')
except:
    _apy = False
    #print('Could not load astropy')

## Customising the font size of figures
plt.rcParams.update({'font.size': 14})

## Customising the look of the notebook
display(HTML("<style>.container { width:95% !important; }</style>"))
## This custom file is adapted from https://github.com/lmarti/jupyter_custom/blob/master/custom.include
HTML('custom.css')
#HTML(urlopen('https://raw.githubusercontent.com/bretonr/intro_data_science/master/custom.css').read().decode('utf-8'))

In [2]:
## Adding a button to hide the Python source code
HTML('''<script>
code_show=true;
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the Python code."></form>''')

<div class="container-fluid">
    <div class="row">
        <div class="col-md-8" align="center">
            <h1>PHYS 10791: Introduction to Data Science</h1>
            <!--<h3>2019-2020 Academic Year</h3><br>-->
        </div>
        <div class="col-md-3">
            <img align='center' style="border-width:0" src="images/UoM_logo.png"/>
        </div>
    </div>
</div>

<div class="container-fluid">
    <div class="row">
        <div class="col-md-2" align="right">
            <b>Course instructors:&nbsp;&nbsp;</b>
        </div>
        <div class="col-md-9" align="left">
            <a href="http://www.renebreton.org">Prof. Rene Breton</a> - Twitter <a href="https://twitter.com/BretonRene">@BretonRene</a><br>
            <a href="http://www.hep.manchester.ac.uk/u/gersabec">Dr. Marco Gersabeck</a> - Twitter <a href="https://twitter.com/MarcoGersabeck">@MarcoGersabeck</a>
        </div>
    </div>
</div>

# Chapter 3 - Summary

## 3.1 Law of large numbers and central limit theorem

### 3.1.1 The law of large numbers

The law of large numbers essentially says that:

\begin{equation}
  \langle X_N \rangle = \frac{1}{N} \sum_{i=1}^{N} X_i \to \mu \quad {\rm for} \quad N \to \infty
\end{equation}

That is, the sample mean is equal to the true (population) mean as the number of samples tends to infinity.

### 3.1.2 The central limit theorem

The central limit theorem states that if random samples are added, then the distribution of their sum should increasingly look like a Gaussian distribution with a mean $\sum \mu_i$ and a variance $\sum \sigma_i^2$ as N increases. That is:
\begin{equation}
    \lim_{N \to \infty} \sum_{i=1}^N X_i \xrightarrow{d} \mathcal{N}(\sum \mu_i, \sum \sigma_i^2) \quad .
\end{equation}

(The symbol $\xrightarrow{d}$ means *convergence in distribution*; i.e. calculating the sum on the left hand side for multiple subsets would yield different values, but these would distribute according to the Gaussian distribution from the right hand side.)

## 3.2 Parameter estimation

#### Expectation values

The expectation value is the more general way of expressing the arithmetic mean of a function $a(x)$ if all $x$ values are not equally probable and distribution according to a probability $\mathcal{L}(x)$.

- Continuous case: $E\left[ a(x) \right] = \int_\Omega a(x) \mathcal{L}(x) {\rm d}x \,.$

- Discrete case: $E\left[ a(x) \right] = \sum_i^N a(x_N) \mathcal{L}(x_N) \,.$

##### Recall
There are a number of basics properties involving the expectation value and variance which are useful to determine the quality of an estimator. You do not need to know them by heart. They would be provided in a formula sheet.

### 3.2.1 Definition

**Estimator**: procedure that is applied to the data sample and gives a numerical value for a parameter and/or a property of a parent population/distribution function.

### 3.2.2 Properties

'Good' estimators should fulfil the following three criteria:

#### Consistent

\begin{equation}
  \lim_{N \to \infty} \widehat{a} = a
\end{equation}

##### Recall
Can also be tested with: $\lim_{N \to \infty} V(\widehat{a}) = 0 \,.$

#### Unbiased

\begin{equation}
  E\left[ \widehat{a} \right] = a
\end{equation}

#### Efficient

Has a small variance.

<div class="well" align="center">
    <div class="container-fluid">
        <div class="row">
            <div class="col-md-3" align="center">
                <img align="center" alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" width="60%">
            </div>
            <div class="col-md-8">
            This work is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>).
            </div>
        </div>
    </div>
    <br>
    <br>
    <i>Note: The content of this Jupyter Notebook is provided for educational purposes only.</i>
</div>