In [1]:
## Import required Python modules
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy, scipy.stats
import io
import base64
#from IPython.core.display import display
from IPython.display import display, HTML, Image
from urllib.request import urlopen

try:
    import astropy as apy
    import astropy.table
    _apy = True
    #print('Loaded astropy')
except:
    _apy = False
    #print('Could not load astropy')

## Customising the font size of figures
plt.rcParams.update({'font.size': 14})

## Customising the look of the notebook
display(HTML("<style>.container { width:95% !important; }</style>"))
## This custom file is adapted from https://github.com/lmarti/jupyter_custom/blob/master/custom.include
HTML('custom.css')
#HTML(urlopen('https://raw.githubusercontent.com/bretonr/intro_data_science/master/custom.css').read().decode('utf-8'))

In [2]:
## Adding a button to hide the Python source code
HTML('''<script>
code_show=true;
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the Python code."></form>''')

<div class="container-fluid">
    <div class="row">
        <div class="col-md-8" align="center">
            <h1>PHYS 10791: Introduction to Data Science</h1>
            <!--<h3>2019-2020 Academic Year</h3><br>-->
        </div>
        <div class="col-md-3">
            <img align='center' style="border-width:0" src="images/UoM_logo.png"/>
        </div>
    </div>
</div>

<div class="container-fluid">
    <div class="row">
        <div class="col-md-2" align="right">
            <b>Course instructors:&nbsp;&nbsp;</b>
        </div>
        <div class="col-md-9" align="left">
            <a href="http://www.renebreton.org">Prof. Rene Breton</a> - Twitter <a href="https://twitter.com/BretonRene">@BretonRene</a><br>
            <a href="http://www.hep.manchester.ac.uk/u/gersabec">Dr. Marco Gersabeck</a> - Twitter <a href="https://twitter.com/MarcoGersabeck">@MarcoGersabeck</a>
        </div>
    </div>
</div>

# Chapter 4 - Summary

## 4.1 Bayesian inference and likelihood function

### 4.1.1 Recap and likelihood function

Bayes' theorem:
\begin{eqnarray}
  P(a \mid x) &=& \frac{I(a) \mathcal{L}(x \mid a)}{E(x)} \\
         &=& \frac{I(a) \prod \mathcal{L}(x_i \mid a)}{E(x)}
\end{eqnarray}

with:

- $x = \{x_1, x_2, ..., x_N\}$: the data
- $a$: the parameter (or model)
- $P(a \mid x)$: the posterior probability of $a$ conditional to $x$
- $I(a)$: the prior probability of $a$
- $\mathcal{L}(x \mid a)$: the likelihood probability of $x$ conditional to $a$
- $E(x)$: the evidence of $x$

Recall that the evidence is simply the constant which ensures the posterior probability is normalised. Hence:
\begin{eqnarray}
    E(x) &=& \sum_i \mathcal{L}(x \mid a_i) I(a_i) \\
         &=& \int L(x \mid a) I(a) \, {\rm d}a \,.
\end{eqnarray}

The first case is for a discrete probability function, while the second is for a continuous probability function.

### 4.1.2 Maximum a posteriori and maximum likelihood

#### Maximum a posteriori (MAP):

\begin{eqnarray}
  \widehat{a}_{\rm MAP} &=& \arg \max_{\substack a} P(a \mid x) &\equiv& \left[ \frac{\partial P(a \mid x)}{\partial a} \right]_{a = \widehat{a}} = 0\\
              &=& \arg \max_{\substack a} \ln P(a \mid x) &\equiv& \left[ \frac{\partial \ln P(a \mid x)}{\partial a} \right]_{a = \widehat{a}} = 0 \,.
\end{eqnarray}

##### Recall
The value of $E(x)$ is irrelevant as it is a constant and is therefore independent of $a$.

#### Maximum likelihood estimation (MLE)

\begin{eqnarray}
  \widehat{a}_{\rm MLE} &=& \arg \max_{\substack a} \mathcal{L}(a \mid x) &\equiv& \left[ \frac{\partial \mathcal{L}(a \mid x)}{\partial a} \right]_{a = \widehat{a}} = 0\\
              &=& \arg \max_{\substack a} \ln \mathcal{L}(a \mid x) &\equiv& \left[ \frac{\partial \ln \mathcal{L}(a \mid x)}{\partial a} \right]_{a = \widehat{a}} = 0 \,.
\end{eqnarray}

##### Recall
The MAP reduces to the MLE when the priors are independent of the parameter $a$ (i.e. constant).

## 4.2 Maximum likelihood as an estimator

The MLE can be thought of as a good estimator in most cases:
1. Generally (though not strictly always) consistent
2. Generally biased, but this disappears in the limit of large $N$ for any consistent estimator
3. Generally invariant
4. Generally efficient

### 4.2.1 Maximum likelihood invariance

The invariance property implies that if $\widehat{a}$ is the MLE of a parameter $a$, then the MLE of a function of this parameter is equal to the function itself evaluated at $\widehat{a}$; that is $\widehat{f(a)} = f(\widehat{a})$.<br><br>

##### Recall
the radioactive decay example. If you calculate the MLE of $\tau$, and then want to know what the MLE for $\lambda = 1/\tau$ instead, you could just calculate $\widehat{\lambda} = 1/\widehat{\tau}$ without having to go back to taking derivatives, etc.

### 4.2.3 Variance on maximum likelihood estimators

The Minimum Variance Bound (MVB) provides a useful way to determine the variance of the MLE:

\begin{equation}
  V(\widehat{a}) = \sigma^2_\widehat{a} = \left[ \left( -\frac{\partial^2 \ln \mathcal{L}}{\partial a^2} \right)^{-1} \right]_\widehat{a} \,.
\end{equation}

In the case of multiple parameters, this becomes:

\begin{equation}
  V(\widehat{a}_{ij}) = \left[ -\frac{\partial^2 \ln \mathcal{L}}{\partial a_i \partial a_j} \right]^{-1}_\widehat{a} \,.
\end{equation}

In this case $V(\widehat{a}_{ij})$ is simply the regular variance when both $i$ and $j$ are the same, but there also exists a _'joint'_ error between each pair of parameters (i.e. $i \neq j$), which is the covariance of the parameters.<br><br>

##### Recall
If the MLE is Taylor-expanded to first-order around the maximum, then the log-likelihood is a parabola function (and the likelihood itself is a Gaussian function).

<div class="well" align="center">
    <div class="container-fluid">
        <div class="row">
            <div class="col-md-3" align="center">
                <img align="center" alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" width="60%">
            </div>
            <div class="col-md-8">
            This work is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>).
            </div>
        </div>
    </div>
    <br>
    <br>
    <i>Note: The content of this Jupyter Notebook is provided for educational purposes only.</i>
</div>