In [1]:
## Import required Python modules
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy, scipy.stats
import io
import base64
#from IPython.core.display import display
from IPython.display import display, HTML, Image
from urllib.request import urlopen

try:
    import astropy as apy
    import astropy.table
    _apy = True
    #print('Loaded astropy')
except:
    _apy = False
    #print('Could not load astropy')

## Customising the font size of figures
plt.rcParams.update({'font.size': 14})

## Customising the look of the notebook
display(HTML("<style>.container { width:95% !important; }</style>"))
## This custom file is adapted from https://github.com/lmarti/jupyter_custom/blob/master/custom.include
HTML('custom.css')
#HTML(urlopen('https://raw.githubusercontent.com/bretonr/intro_data_science/master/custom.css').read().decode('utf-8'))

In [2]:
## Adding a button to hide the Python source code
HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the Python code."></form>''')

<div class="container-fluid">
    <div class="row">
        <div class="col-md-8" align="center">
            <h1>PHYS 10791: Introduction to Data Science</h1>
            <!--<h3>2019-2020 Academic Year</h3><br>-->
        </div>
        <div class="col-md-3">
            <img align='center' style="border-width:0" src="images/UoM_logo.png"/>
        </div>
    </div>
</div>

<div class="container-fluid">
    <div class="row">
        <div class="col-md-2" align="right">
            <b>Course instructors:&nbsp;&nbsp;</b>
        </div>
        <div class="col-md-9" align="left">
            <a href="http://www.renebreton.org">Prof. Rene Breton</a> - Twitter <a href="https://twitter.com/BretonRene">@BretonRene</a><br>
            <a href="http://www.hep.manchester.ac.uk/u/gersabec">Dr. Marco Gersabeck</a> - Twitter <a href="https://twitter.com/MarcoGersabeck">@MarcoGersabeck</a>
        </div>
    </div>
</div>

# Chapter 2 - Summary

## 2.1 Probabilities and random variables

### 2.1.1 Axioms of probability

(Kolmogorov's) Probability theory is based on three axioms:
1. For each subset $A$ such that $A \subset \Omega$, $P(A) \geq 0$
2. For all disjoint subsets $A$ and $B$, the probability of $A$ **or** $B$ is: $P(A \cup B) = P(A) + P(B)$
3. $P(\Omega) = 1$

### 2.1.2 Bayes' theorem

#### Conditional probability
The probability of $A$ given (conditional to) $B$ is given by: $P(A \mid B) = \frac{P(A \cap B)}{P(B)}$

#### Bayes' theorem

Provides a 'natural' way of expressing conditional probabilities, as the intersection of two sets is not always easy to evaluate.

Bayes' theorem: 
\begin{eqnarray}
  P(A \mid B) &=& \frac{I(A) \, \mathcal{L}(B \mid A)}{E(B)}
\end{eqnarray}

- $P(A \mid B)$: the posterior probability of $A$ conditional to $B$
- $I(A)$: the prior probability of $A$
- $\mathcal{L}(B \mid A)$, the likelihood probability of $B$ conditional to $A$
- $E(B)$, the evidence of $B$

Recall that the evidence, $E(B)$, is simply the constant which ensures the posterior probability is normalised. Hence:
\begin{eqnarray}
    E(B) &=& \sum_i I(A_i) \mathcal{L}(B \mid A_i) \\
         &=& \int I(A) \mathcal{L}(B \mid A) \, {\rm d}A
\end{eqnarray}

The first case is for a discrete probability function, while the second is for a continuous probability function.

**This is the equation that rules them all for statistical inference.**

## 2.2 Probability distributions

### 2.2.1 General probability distributions

#### Continuous probability distribution

Probability density function (PDF). Must be normalised: $\int_\Omega f(x) dx = 1$.

#### Discrete probability distribution

Probability mass function (PMF). Must be normalised: $\sum_{x \, \in \, \Omega} f_X(x) = 1$.

### 2.2.2 Some important probability distributions

#### Binomial distribution

Describes processes with identical trials with two possible outcomes:
\begin{equation}
  P(k;, n, p) = p^k (1-p)^{(n-k)} \frac{n!}{k!(n-k)!} \,,
\end{equation}
- $k$ the number of successes
- $n$ the number of trials/events
- $p$ the probability of an individual success.

Properties:
- Mean: $np$
- Variance: $np(1-p)$

#### Poisson distribution

Describes processes with an average occurance rate in a given interval:
\begin{equation}
  P(k; \lambda) = e^{-\lambda} \frac{\lambda^k}{k!} \,,
\end{equation}
- $k$ the number of times an event occurs in an given interval (this is an integer value)
- $\lambda$ the average event rate per interval (can be any real positive number)

Properties:
- Mean: $\lambda$
- Variance: $\lambda$

#### Gaussian distribution

Describes processes typically comprising a large number of trials made of random variables independently drawn from independent distributions:
\begin{equation}
  P(x; \mu, \sigma) \sim \mathcal{N}(\mu, \sigma^2) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \,,
\end{equation}
- $\mu$ the mean of the Gaussian (can be any real number)
- $\sigma$ the standard distribution of the Gaussian (can be any real positive number)

Properties:
- Mean, median, mode: $\mu$
- Variance: $\sigma^2$

##### Recall
Key $\sigma$ intervals for the Gaussian distribution. The area between $\pm 1$, $\pm 2$ and $\pm 3 \sigma$ are respectively $68.26$, $95.44$ and $99.72\%$
<img src="images/1920px-Standard_deviation_diagram.svg.png" width="50%">
(Source: [M. W. Toews](https://commons.wikimedia.org/wiki/User:Mwtoews), [Standard deviation diagram](https://commons.wikimedia.org/wiki/File:Standard_deviation_diagram.svg), [CC BY 2.5](https://creativecommons.org/licenses/by/2.5/legalcode))


### 2.2.3 Gaussian distribution as a limit of Binomial and Poisson distributions

#### From Poisson to Gaussian and Binomial to Gaussian

Both Binomial and Poisson distribution turn into Gaussian distributions in the limit where $n$ and $\lambda$ are infinity.

<img src="images/stats triad.png" width="40%">

### 2.2.4 Other probability distributions

#### Multivariate Gaussian

Generalization of the one-dimensional (univariate) normal distribution to higher dimensions.

##### Two-dimensional case
In two dimensions, for the variables $(x,y)$, the covariabe matrix can be written as:

\begin{equation}
  f(x,y) = \frac{1}{2 \pi  \sigma_x \sigma_y \sqrt{1-\rho^2}}
    \exp\left(
      -\frac{1}{2(1-\rho^2)}\left[
          \frac{(x-\mu_x)^2}{\sigma_x^2} +
          \frac{(y-\mu_y)^2}{\sigma_y^2} -
          \frac{2\rho(x-\mu_x)(y-\mu_y)}{\sigma_x \sigma_y}
      \right]
    \right)
\end{equation}


<div class="well" align="center">
    <div class="container-fluid">
        <div class="row">
            <div class="col-md-3" align="center">
                <img align="center" alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" width="60%">
            </div>
            <div class="col-md-8">
            This work is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>).
            </div>
        </div>
    </div>
    <br>
    <br>
    <i>Note: The content of this Jupyter Notebook is provided for educational purposes only.</i>
</div>