In [1]:
## Import required Python modules
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy, scipy.stats
import io
import base64
#from IPython.core.display import display
from IPython.display import display, HTML, Image
from urllib.request import urlopen

try:
    import astropy as apy
    import astropy.table
    _apy = True
    #print('Loaded astropy')
except:
    _apy = False
    #print('Could not load astropy')

## Customising the font size of figures
plt.rcParams.update({'font.size': 14})

## Customising the look of the notebook
display(HTML("<style>.container { width:95% !important; }</style>"))
## This custom file is adapted from https://github.com/lmarti/jupyter_custom/blob/master/custom.include
HTML('custom.css')
#HTML(urlopen('https://raw.githubusercontent.com/bretonr/intro_data_science/master/custom.css').read().decode('utf-8'))

In [2]:
## Custom imports
from scipy.stats import binom, poisson, chi2, norm, uniform
from scipy.optimize import curve_fit
from math import ceil, pi
from numpy import exp
from matplotlib.collections import PatchCollection
from matplotlib.patches import Circle, Rectangle
from matplotlib.colors import makeMappingArray

In [3]:
## Adding a button to hide the Python source code
HTML('''<script>
code_show=true;
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the Python code."></form>''')

<div class="container-fluid">
    <div class="row">
        <div class="col-md-8" align="center">
            <h1>PHYS 10792: Introduction to Data Science</h1>
            <!--<h3>2019-2020 Academic Year</h3><br>-->
        </div>
        <div class="col-md-3">
            <img align='center' style="border-width:0" src="images/UoM_logo.png"/>
        </div>
    </div>
</div>

<div class="container-fluid">
    <div class="row">
        <div class="col-md-2" align="right">
            <b>Course instructors:&nbsp;&nbsp;</b>
        </div>
        <div class="col-md-9" align="left">
            <a href="http://www.renebreton.org">Prof. Rene Breton</a> - Twitter <a href="https://twitter.com/BretonRene">@BretonRene</a><br>
            <a href="http://www.hep.manchester.ac.uk/u/gersabec">Dr. Marco Gersabeck</a> - Twitter <a href="https://twitter.com/MarcoGersabeck">@MarcoGersabeck</a>
        </div>
    </div>
</div>

# Chapter 8 - Problem Sheet

### Problem 1: Type I and II errors

Identify which statements are correct.

- Type I error is the rate of acceptance of the hypothesis in a hypothesis test.
- Type I error is the rate of rejection of the hypothesis in a hypothesis test.
- Type I error is the rate of acceptance of the alternative hypothesis in a hypothesis test.
- Type I error is the rate of rejection of the alternative hypothesis in a hypothesis test.
- Type II error is the rate of acceptance of the hypothesis in a hypothesis test.
- Type II error is the rate of rejection of the hypothesis in a hypothesis test.
- Type II error is the rate of acceptance of the alternative hypothesis in a hypothesis test.
- Type II error is the rate of rejection of the alternative hypothesis in a hypothesis test.


### Problem 2: The choice of significance and power

#### Problem 2.1

Describe in your words what are the relevant things to consider when choosing the acceptance point of a hypothesis test, which defines significance and power.

#### Problem 2.2

In a medical diagnostic test that aims to identify a disease the quantities discussed are often: 
- the sensitivity, i.e. the rate at which true positives are not overlooked, and
- the specificity, i.e. the rate of candidates without a disease that are correctly identified as healthy.

Relate these to Type I and Type II errors and to significance and power.

#### Problem 2.3

A medical diagnostic test has a rate of Type I errors of $20\%$ and a rate of Type II errors of $0.01\%$. The test is carried out on 100,000 candidates. It is expected that 1 in 1,000 people carry the disease. Based on these numbers calculate
- the expected number of infected candidates,
- the expected number of candidates returning a postive test,
- the number of infected candidates not identified as such, and
- the fraction of postive tests that were returned by healthy candidates.

Based on the last number, discuss the usefulness of this test and what could be done to address this.

### Problem 3: Hypothesis tests with Poisson and Gauss

The last lecture video discussed an example in which a Poisson distribution was approximated by a Gaussian distribution. This problem aims to illustrate this further. In a counting experiment, assume that the hypothesis is that the expected count rate is 30. Make a table for counts 0 to 50 with the following columns (if you're not using a computer and calculate the numbers one-by-one, you may start at a count of 15; note also that one of the parts of the Poisson probability formula does not depend on the count):
- The count (a running number from 0 to 50)
- The Poisson probability for this count
- The cumulative Poisson probability for counts from 0 to this value
- The signed number of standard deviations corresponding to the deviation of this count from the mean when approximating the Poisson distribution by a Gaussian normal distribution.
- The fractional integral of the Gaussian normal distribution up to the number of standard deviations calculated in the previous column
- The ratio of the cumulative sum of Poisson probabilities to the fraction integral of the normal distribution, i.e. of the values in the third and fifth column.

<div class="well" align="center">
    <div class="container-fluid">
        <div class="row">
            <div class="col-md-3" align="center">
                <img align="center" alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" width="60%">
            </div>
            <div class="col-md-8">
            This work is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>).
            </div>
        </div>
    </div>
    <br>
    <br>
    <i>Note: The content of this Jupyter Notebook is provided for educational purposes only.</i>
</div>