In [1]:
## Import required Python modules
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy, scipy.stats
import io
import base64
#from IPython.core.display import display
from IPython.display import display, HTML, Image
from urllib.request import urlopen

try:
    import astropy as apy
    import astropy.table
    _apy = True
    #print('Loaded astropy')
except:
    _apy = False
    #print('Could not load astropy')

## Customising the font size of figures
plt.rcParams.update({'font.size': 14})

## Customising the look of the notebook
display(HTML("<style>.container { width:95% !important; }</style>"))
## This custom file is adapted from https://github.com/lmarti/jupyter_custom/blob/master/custom.include
HTML('custom.css')
#HTML(urlopen('https://raw.githubusercontent.com/bretonr/intro_data_science/master/custom.css').read().decode('utf-8'))

In [2]:
HTML('''
<script>
    function toggleCodeCells() {
      var codeCells = document.querySelectorAll('.jp-CodeCell');

      codeCells.forEach(function(cell) {
        var inputArea = cell.querySelector('.jp-InputArea');
        if (inputArea) {
          var currentDisplay = inputArea.style.display || getComputedStyle(inputArea).display;
          inputArea.style.display = currentDisplay === 'none' ? '' : 'none';
        }
      });
    }
</script>

<!-- Add a button to toggle visibility of input code cells -->
<button onclick="toggleCodeCells()">Toggle Code Cells</button>
''')

<div class="container-fluid">
    <div class="row">
        <div class="col-md-8" align="center">
            <h1>PHYS 10791: Introduction to Data Science</h1>
            <!--<h3>2019-2020 Academic Year</h3><br>-->
        </div>
        <div class="col-md-3">
            <img align='center' style="border-width:0" src="images/UoM_logo.png"/>
        </div>
    </div>
</div>

<div class="container-fluid">
    <div class="row">
        <div class="col-md-2" align="right">
            <b>Course instructors:&nbsp;&nbsp;</b>
        </div>
        <div class="col-md-9" align="left">
            <a href="http://www.renebreton.org">Prof. Rene Breton</a> - Twitter <a href="https://twitter.com/BretonRene">@BretonRene</a><br>
            <a href="http://www.hep.manchester.ac.uk/u/gersabec">Dr. Marco Gersabeck</a> - Twitter <a href="https://twitter.com/MarcoGersabeck">@MarcoGersabeck</a>
        </div>
    </div>
</div>

*Note: You are not expected to understand all the computer coding presented with the solutions. You should understand the mathematical concepts and be able to recover the results. We present the computer code so you can learn coding tricks (e.g. read data, compute useful values, fit and plot data) should you be interested.*

# Chapter 2 - Problem Sheet

## Problem 1

### Rene's household universe

Rene's household contains a couple of living being: two humans, two cats, five chickens and two 'visiting' cats that come to our backyard. The Euler diagram below describes our household universe.

<img src="images/household.png" width="60%" />

#### Tasks
1. Calculate P(human)
2. Calculate P(cat)
3. Calcualte P(grey colour)
4. Calculate P(NOT grey colour)
5. Calculate P(cat $\cap$ human)
6. Calculate P(cat $\cup$ human)
7. Calculate P(grey colour $\mid$ mammal)

## Problem 2

### The maths behind screening tests

Let us investigate the maths behind medical screening tests...

*An over-the-counter test is available to diagnose some particular disease. One sees the following package on a shelf. They purchase it and get a positive test result once administered at home. What is the probability that they suffer from this particular disease?*

<img src="images/supertest.png" width="30%">

After reading the small prints on the instruction manual, they find the following:

```
Disease BlahBlah is a rare condition contracted from aliens, which affects 0.1% of the population. Infected individuals produce a special protein that our test has been designed to detect. Controlled laboratory studies demonstrated a test accuracy of 99% on patients suffering from the disease and a 3% false positive rate on non-infected patients.
```

## Problem 3

### Tossing a fair coin

What is the probability of having no more than one head out of four fair coin toss?

## Problem 4

### Hurricane Harvey

Hurricane Harvey brought in major floods in Texas at the end of August 2017. It was qualified as a once-in-a-century type of event. It was the first Category 4 hurricane to make landfall in Texas since Carla in 1961. Does this claim about the rarity of such an event make any sense?

Answering this problem requires casting the right question. In science, this is often the most difficult part. Here, we could say:

```
What are the odds of having two or more such hurricanes in the observed time span?
```

## Problem 5

### Oreo's egg production

Rene's hens oreo is a Light Sussex breed, which typically produce between 240 and 260 eggs a year. Assuming that egg production is normally distributed, and that this range represents the interval around the mean where 68.26\% of the egg production is found, calculate the threshold above which 99.85\% of the egg production should lie.

<img src="images/oreo.jpg" width="50%">

<div class="well" align="center">
    <div class="container-fluid">
        <div class="row">
            <div class="col-md-3" align="center">
                <img align="center" alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" width="60%">
            </div>
            <div class="col-md-8">
            This work is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>).
            </div>
        </div>
    </div>
    <br>
    <br>
    <i>Note: The content of this Jupyter Notebook is provided for educational purposes only.</i>
</div>