In [1]:
## Import required Python modules
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy, scipy.stats
import io
import base64
#from IPython.core.display import display
from IPython.display import display, HTML, Image
from urllib.request import urlopen

try:
    import astropy as apy
    import astropy.table
    _apy = True
    #print('Loaded astropy')
except:
    _apy = False
    #print('Could not load astropy')

## Customising the font size of figures
plt.rcParams.update({'font.size': 14})

## Customising the look of the notebook
display(HTML("<style>.container { width:95% !important; }</style>"))
## This custom file is adapted from https://github.com/lmarti/jupyter_custom/blob/master/custom.include
HTML('custom.css')
#HTML(urlopen('https://raw.githubusercontent.com/bretonr/intro_data_science/master/custom.css').read().decode('utf-8'))

In [2]:
## Custom imports
from matplotlib.cm import jet
from math import ceil, pi
from scipy.stats import poisson, norm, binom
from matplotlib.collections import PatchCollection
from matplotlib.patches import Circle, Rectangle

In [3]:
## Adding a button to hide the Python source code
HTML('''<script>
code_show=true;
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the Python code."></form>''')

<div class="container-fluid">
    <div class="row">
        <div class="col-md-8" align="center">
            <h1>PHYS 10791: Introduction to Data Science</h1>
            <!--<h3>2019-2020 Academic Year</h3><br>-->
        </div>
        <div class="col-md-3">
            <img align='center' style="border-width:0" src="images/UoM_logo.png"/>
        </div>
    </div>
</div>

<div class="container-fluid">
    <div class="row">
        <div class="col-md-2" align="right">
            <b>Course instructors:&nbsp;&nbsp;</b>
        </div>
        <div class="col-md-9" align="left">
            <a href="http://www.renebreton.org">Prof. Rene Breton</a> - Twitter <a href="https://twitter.com/BretonRene">@BretonRene</a><br>
            <a href="http://www.hep.manchester.ac.uk/u/gersabec">Dr. Marco Gersabeck</a> - Twitter <a href="https://twitter.com/MarcoGersabeck">@MarcoGersabeck</a>
        </div>
    </div>
</div>

## Problem Sheet 9

### Problem 1: Probabilities

#### Problem 1.1

In a lab experiment 300 students measure the weight of the same object. On average, they measure a weight of 1 kg. The variance of their measurements is $10^{-4}$ kg. 
- What is the error on the mean derived from the results of all students?
- The measurement uncertainty of the scales used by the students is stated as 5 g. What do you conclude from this?
- The lowest value measured by any of the students is 968 g. Does this agree with your expectations from the cohort of measurements?

#### Problem 1.2

An experiment to count muons reaching the earth's surface from cosmic rays is conducted by 120 students. The average expected count rate is 1 per cm$^2$ and per minute. The students start their experiment at 15:20 on a Friday and end the count at 10:00 on the following Monday. Their detectors have a survace area of 0.5 cm by 5.0 cm.
- What average count and sample standard deviation do you expect?
- How many of the students would you expect to have a count of 200 or more above the average?

#### Solution to 1.1
- The error on the mean is the standard deviation divided by the square root of the number of measurements. The standard deviation is $\sigma=\sqrt{V}=10^{-2}$ kg. Therefore, the error on the mean is $\sigma/\sqrt{N}=10^{-2}/\sqrt{300}=0.00058$ kg.
- The stated uncertainty is half the sample standard deviation of the students' measurements. Either, the manufacturer was promising more than they delivered or there is a signficant additional source of uncertainty that has to be taken into account.
- We would expect $99.7\%$ of the students' results to lie within $\pm3\sigma$ of the mean, i.e. within the range $970-1030$ g. Among 300 students we would expect 1 measurement to lie outside this range, so the lowest value being just outside this range fits perfectly.

#### Solution to 1.2
- 4000 minutes and 2.5 cm$^2$ mean the expected count is 10,000. The standard deviation is therefore $\sigma=\sqrt{N}=100$.
- A count of 200 above the mean corresponds to 2 standard deviations. We would expect $5\%$ to lie outside two standard deviations either below or above the mean, so $2.5\%$ should have a count of 200 or more above the average. This corresponds to 3 students.

### Problem 2: Confidence belts

#### Problem 2.1
You want to produce a $90\%$ upper limit confidence belt for a Poisson distribution. Calculate the lower limits of the confidence intervals, $k_-$, for the following true means:
- $\lambda=2.0$
- $\lambda=2.3$
- $\lambda=2.4$
- $\lambda=7.5$

#### Problem 2.2

The plot below shows the $80\%$ central interval belt for a Poisson distribution. Derive the $80\%$ intervals for the true mean for measurements of
- $k=0$
- $k=4$
- $k=10$

What is the largest observed count rate that can be interpreted based on this plot as shown?

<img src="images/Poisson_belt_80.png" width=80%>

### Solution to Problem 2

#### Solution to 2.1
We need to find the largest $k$ for which $\sum_{i=0}^k e^{-\lambda}\lambda^k/k!<1-C$ and the confidence interval will start at the subsequent count, therefore $k_-=k+1$. The list below are the probabilities for the elements starting with $k=0$:
- $\lambda=2.0$: 0.135: $k_-=0$
- $\lambda=2.3$: 0.1003: $k_-=0$
- $\lambda=2.4$: 0.091, 0.218: $k_-=1$
- $\lambda=7.5$: 0.0006, 0.0041, 0.0156, 0.0389, 0.0729: $k_-=4$

#### Solution to 2.2
The intervals for the true values of $\lambda$ are read off as the outward-facing corners of the confidence belts where this is intersected by a vertical line through the measured count. They can only be read off approximately by eye, so don't worry if you obtained values that are one or two tenth off.
- $k=0$: $[0.0,2.3]$
- $k=4$: $[1.8,8.0]$
- $k=10$: $[6.3,15.4]$

The largest count rate that can be interpreted is $k=13$ as the upper end of the interval for $k=14$ is not visible on the plot.

<div class="well" align="center">
    <div class="container-fluid">
        <div class="row">
            <div class="col-md-3" align="center">
                <img align="center" alt="Creative Commons License" style="border-width:0" src="https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png" width="60%">
            </div>
            <div class="col-md-8">
            This work is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>).
            </div>
        </div>
    </div>
    <br>
    <br>
    <i>Note: The content of this Jupyter Notebook is provided for educational purposes only.</i>
</div>