In [1]:
## Import required Python modules
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy, scipy.stats
import io
import base64
#from IPython.core.display import display
from IPython.display import display, HTML, Image
from urllib.request import urlopen

try:
    import astropy as apy
    import astropy.table
    _apy = True
    #print('Loaded astropy')
except:
    _apy = False
    #print('Could not load astropy')

## Customising the font size of figures
plt.rcParams.update({'font.size': 14})

## Customising the look of the notebook
display(HTML("<style>.container { width:95% !important; }</style>"))
## This custom file is adapted from https://github.com/lmarti/jupyter_custom/blob/master/custom.include
HTML('custom.css')
#HTML(urlopen('https://raw.githubusercontent.com/bretonr/intro_data_science/master/custom.css').read().decode('utf-8'))

In [2]:
## Adding a button to hide the Python source code
HTML('''<script>
code_show=true;
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the Python code."></form>''')

<div class="container-fluid">
    <div class="row">
        <div class="col-md-8" align="center">
            <h1>PHYS 10791: Introduction to Data Science</h1>
            <!--<h3>2019-2020 Academic Year</h3><br>-->
        </div>
        <div class="col-md-3">
            <img align='center' style="border-width:0" src="images/UoM_logo.png"/>
        </div>
    </div>
</div>

<div class="container-fluid">
    <div class="row">
        <div class="col-md-2" align="right">
            <b>Course instructors:&nbsp;&nbsp;</b>
        </div>
        <div class="col-md-9" align="left">
            <a href="http://www.renebreton.org">Prof. Rene Breton</a> - Twitter <a href="https://twitter.com/BretonRene">@BretonRene</a><br>
            <a href="http://www.hep.manchester.ac.uk/u/gersabec">Dr. Marco Gersabeck</a> - Twitter <a href="https://twitter.com/MarcoGersabeck">@MarcoGersabeck</a>
        </div>
    </div>
</div>

*Note: You are not expected to understand all the computer coding presented with the solutions. You should understand the mathematical concepts and be able to recover the results. We present the computer code so you can learn coding tricks (e.g. read data, compute useful values, fit and plot data) should you be interested.*

# Chapter 3 - Problem Sheet

## Problem 1

### Case study: Casino's profits

Using a key concept from Chapter 3, explain why a casino would still make a profit on the *roulette* game, given that one can gamble on the ball falling on red/black or odd/even and that the payout earns the same amount that was placed on the bet. It may be useful to know that (in the European-style roulette) there are 37 pockets, the 37th of which is labelled *'0'*. This special pocket is considered colourless and (for the purpose of the game) neither odd or even. _[see [Roulette](https://en.wikipedia.org/wiki/Roulette) on Wikipedia]_

## Solution 1

The law of large numbers states that upon performing the same experiment a large number of times, the sample average will converge towards the expected average. With equal number of red/black or odd/even pockets, both the casino and gambler would end up neither loosing nor earning money. Due to the '0' pocket, there is 1 in 37 chances that the ball will fall in this pocket and generate a loss for the gambler. The expected value for such type of bet, called an _outside_ bet, on a constant bet of £1 is therefore $-£1/37 = -£0.027$.

## Problem 2

### Variance with unknown mean

Let us now consider the variance for the case where the true mean $\mu$ is unknown and instead replaced by the sample mean, i.e. the estimate $\widehat{\mu} = \langle X \rangle$. This is in fact the uncorrected sample variance:

\begin{equation}
  \widehat{V(X)} \equiv \widehat{s^2_{\rm uncorr}} = \frac{1}{N} \sum \left(X_i - \langle X \rangle\right)^2 \,.
\end{equation}

Show that this is a biased estimator.

## Solution 2

We can show that this estimator is biased as its expectation value is not equal to the true value. That is: $E\left[ \widehat{s^2_{\rm uncorr}} \right] \neq \sigma^2$.

First, we should recall the definition of the variance from the list of identities above: $V(X) = E\left[X^2\right] - E\left[X\right]^2$. Since the mean is estimated from the sample mean we have $E\left[ X \right] = \widehat{\mu}$. This enables use to rewrite this estimator as:
\begin{equation}
  \widehat{s^2_{\rm uncorr}} = E\left[ X^2 \right] - \widehat{\mu}^2 \,.
\end{equation}

We can verify that this is biased as following:
\begin{eqnarray}
  E\left[ \widehat{s^2_{\rm uncorr}} \right] &=& E\left[ E\left[X^2 \right] - \widehat{\mu}^2 \right] \\
                                             &=& E\left[ E\left[X^2 \right] \right] - E\left[ \widehat{\mu}^2 \right] \\
                                             &=& E\left[X^2 \right] - \widehat{\mu}^2 \\
                                             &=& \left(E\left[X^2 \right] - \mu^2 \right) - \left(\widehat{\mu}^2 - \mu^2 \right) \\
                                             &=& \sigma^2 - \frac{\sigma^2}{N} \\
                                             &=& \left( \frac{N-1}{N} \right) \sigma^2 \,.
\end{eqnarray}

In the above, we have used the fact that the expectation of an expectation value is the expectation value itself, hence $E\left[ E\left[ X^2 \right] \right] = E\left[ X^2 \right]$. Also, the expectation of a constant is the constant itself, hence $E\left[ \widehat{\mu}^2 \right] = \widehat{\mu}^2$.

On the fourth line we added and subtracted $\mu^2$, and rearranged the original equation. This enabled us to recover the definition of the population variance, $\sigma^2$, for the first parenthesis.

We can use the Central Limit Theorem to simplify the second parenthesis of the fourth line. The CLT says that if random variables are drawn from the same distribution mean and variance $\mu$ and $\sigma^2$, respectively, then their sum is distributed like a Gaussian with a mean equal to N times the mean of a sample and a variance equal to N times the variance of a sample:
\begin{equation}
    \lim_{N \to \infty} \sum_{i=1}^N X_i \xrightarrow{d} \mathcal{N}(N \mu, N \sigma^2) \,.
\end{equation}
If the sum of random variables drawn from the same distribution is distributed as a Gaussian with variance $N \sigma^2$, then the distribution of the average of such random variables will follow:
\begin{equation}
    \lim_{N \to \infty} \frac{1}{N} \sum_{i=1}^N X_i \xrightarrow{d} \mathcal{N}(\mu, \sigma^2 / N) \,.
\end{equation}
That is, a Gaussian with a variance $\sigma^2 / N$. Note that the average above is what we defined as $\widehat{\mu}$. Using the notation from earlier, we can write it as following:
\begin{equation}
    V(\widehat{\mu}) = E\left[\widehat{\mu}^2\right] - E\left[\widehat{\mu}\right]^2 = \widehat{\mu}^2 - \mu^2 = \frac{\sigma^2}{N} \,.
\end{equation}

<div class="well" align="center">
    <div class="container-fluid">
        <div class="row">
            <div class="col-md-3" align="center">
                <img align="center" alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" width="60%">
            </div>
            <div class="col-md-8">
            This work is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>).
            </div>
        </div>
    </div>
    <br>
    <br>
    <i>Note: The content of this Jupyter Notebook is provided for educational purposes only.</i>
</div>