In [3]:
## Import required Python modules
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy, scipy.stats
import io
import base64
#from IPython.core.display import display
from IPython.display import display, HTML, Image
from urllib.request import urlopen

try:
    import astropy as apy
    import astropy.table
    _apy = True
    #print('Loaded astropy')
except:
    _apy = False
    #print('Could not load astropy')

## Customising the font size of figures
plt.rcParams.update({'font.size': 14})

## Customising the look of the notebook
display(HTML("<style>.container { width:95% !important; }</style>"))
## This custom file is adapted from https://github.com/lmarti/jupyter_custom/blob/master/custom.include
HTML('custom.css')
#HTML(urlopen('https://raw.githubusercontent.com/bretonr/intro_data_science/master/custom.css').read().decode('utf-8'))

In [4]:
## Adding a button to hide the Python source code
HTML('''<script>
code_show=true;
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the Python code."></form>''')

<div class="container-fluid">
    <div class="row">
        <div class="col-md-8" align="center">
            <h1>PHYS 10791: Introduction to Data Science</h1>
            <!--<h3>2019-2020 Academic Year</h3><br>-->
        </div>
        <div class="col-md-3">
            <img align='center' style="border-width:0" src="images/UoM_logo.png"/>
        </div>
    </div>
</div>

<div class="container-fluid">
    <div class="row">
        <div class="col-md-2" align="right">
            <b>Course instructors:&nbsp;&nbsp;</b>
        </div>
        <div class="col-md-9" align="left">
            <a href="http://www.renebreton.org">Prof. Rene Breton</a> - Twitter <a href="https://twitter.com/BretonRene">@BretonRene</a><br>
            <a href="http://www.hep.manchester.ac.uk/u/gersabec">Dr. Marco Gersabeck</a> - Twitter <a href="https://twitter.com/MarcoGersabeck">@MarcoGersabeck</a>
        </div>
    </div>
</div>

*Note: You are not expected to understand all the computer coding presented with the solutions. You should understand the mathematical concepts and be able to recover the results. We present the computer code so you can learn coding tricks (e.g. read data, compute useful values, fit and plot data) should you be interested.*

# Chapter 4 - Problem Sheet

## Problem 1

### Case study: Bayesian M&M candies

The year 1995 marked the introduction of the blue M&M's. This new addition changed the proportion of colours in a bag:

| Colour  | Pre-1995 | Post-1995 |
| :-----: | :------: | :-------: |
| Brown   | 30%      | 13%       |
| Yellow  | 20%      | 14%       |
| Red     | 20%      | 13%       |
| Green   | 10%      | 20%       |
| Orange  | 10%      | 16%       |
| Tan     | 10%      | 0%        |
| Blue    | 0%       | 24%       |

Your best friend has two bags of M&M’s, and tells you that one is vintage from 1994 and the other one is brand new. They will not tell you which is which, but they give you one M&M from each bag. One is yellow and one is green.

What is the probability that the yellow one came from the vintage bag?

## Solution 1

We need Bayesian inference to answer this problem.

Let us assume that the yellow one comes from bag 1 and the green one from bag 2.

There are two possible outcomes, $H = \{A, B\}$:

- $A$: Bag 1 (yellow) is vintage and bag 2 (green) is new
- $B$: Bag 1 (yellow) is new and bag 2 (green) is vintage

Bayes theorem says:

\begin{equation}
  P(H \mid D) = \frac{P(H) P(D \mid H)}{P(D)}
\end{equation}

Let us construct a table in which we fill in the terms of Bayes theorem:

| H    | Prior | Likelihood   | P(H) P(D;H) | Posterior     |
| :--: | :---: | :----------: | :---------: | :-----------: |
| A    | 0.5   | (0.20)(0.20) | 0.020       | 0.020 / 0.027 |
| B    | 0.5   | (0.14)(0.10) | 0.007       | 0.007 / 0.027 |

Therefore, $P(A) = 74.1\%$ and $P(B) = 25.9\%$.

I would definitely choose to eat from bag 2, which is the one from which the green M&M came.

_This was an example of MAP for discrete data._

## Problem 2

### MLE of a Gaussian with unknown mean $\mu$ and standard deviation $\sigma$

Suppose that you have a set of $N$ measurements $\{x_i\}$ that are all drawn from a Gaussian distribution having an unknown $\mu$ and $\sigma$, where both quantities are the same for all measurements.<br><br>

#### Task 1
Write an expression for the likelihood of a single measurement, $\mathcal{L}(x_i \mid \mu, \sigma)$.<br><br>

#### Task 2
Show that the log-likelihood for the *full dataset* can be written as:
\begin{equation}
  \ln \mathcal{L} = -\frac{1}{2 \sigma^2} \sum_i \left( x_i - \mu \right)^2 - N \ln \sigma + C \,,
\end{equation}
where $C$ is a constant that does not depend on the conditional parameters $\mu$ and $\sigma$.<br><br>

#### Task 3
Demonstrate that the MLE estimators for $\widehat{\mu}$ and $\widehat{\sigma}$ are nothing else than the arithmetic mean and the uncorrected standard deviation, respectively.

## Solution 2

### Task 1
We can express the likelihood of a single event as:
\begin{equation}
  \mathcal{L}(x_i \mid \mu, \sigma) = \frac{1}{\sigma \sqrt{2\pi}} \exp{\left( \frac{-(x_i-\mu)^2}{2\sigma^2} \right)} \,.
\end{equation}

### Task 2
For the full dataset, the likelihood becomes:
\begin{equation}
  \mathcal{L}(x \mid \mu, \sigma) = \prod_i \mathcal{L}(x_i \mid \mu, \sigma) \,.
\end{equation}

We therefore obtain the joint log-likelihood of the dataset as:
\begin{eqnarray}
  \ln \mathcal{L} &=& \sum \ln \mathcal{L}(x_i | \mu, \sigma) \\
                  &=& -N \ln \sigma - \frac{N}{2} \ln (2\pi) -\frac{1}{2 \sigma^2} \sum \left( x_i - \mu \right)^2 \,.
\end{eqnarray}

The second term from the equation above is a constant which only depends on the data.<br><br>

### Task 3
Since neither $\mu$ nor $\sigma$ are known, we need to simultaneously find the MLE $\widehat{\mu}$ and $\widehat{\sigma}$:

\begin{eqnarray}
  \left[ \frac{\partial \ln \mathcal{L}}{\partial \mu} \right]_{\widehat{\mu},\widehat{\sigma}} &=& \sum \left( x_i - \widehat{\mu} \right) = 0 \,, \\
  \left[ \frac{\partial \ln \mathcal{L}}{\partial \sigma} \right]_{\widehat{\mu},\widehat{\sigma}} &=& -\frac{N}{\widehat{\sigma}} + \sum \frac{\left( x_i - \widehat{\mu} \right)^2}{\widehat{\sigma}^3} = 0 \,.
\end{eqnarray}

From the first equation we find:
\begin{equation}
  \widehat{\mu} = \frac{1}{N} \sum x_i \equiv \langle x \rangle \,,
\end{equation}

which we recognise as the sample mean.

From the second equation, after substituting $\langle x \rangle = \widehat{\mu}$ we obtain:
\begin{equation}
  \widehat{\sigma}^2 = \frac{1}{N} \sum \left( x_i - \langle x \rangle \right)^2 \, \equiv \sigma_{\rm uncorrected}^2 \,.
\end{equation}

This last result tells us that the MLE $\widehat{\sigma}$ is the well-known uncorrected standard deviation of a sample set, which we know is a biased estimator. However, in the limit $N \to \infty$, the difference between $N$ and $N-1$ becomes negligible.




<div class="well" align="center">
    <div class="container-fluid">
        <div class="row">
            <div class="col-md-3" align="center">
                <img align="center" alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" width="60%">
            </div>
            <div class="col-md-8">
            This work is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>).
            </div>
        </div>
    </div>
    <br>
    <br>
    <i>Note: The content of this Jupyter Notebook is provided for educational purposes only.</i>
</div>