
#Chapter 1: Introduction and probability basics (exercises)

###The assignment for next week is under the section **Exercises** and count for 100 points


**Reminder from the guidelines**

**This notebook has no meaning of imposing a format** to give us back your homework. It only gives me a convenient way to keep both texts and codes at the same place.



* ** We do not mark your coding skills **, any language is possible, take the one that is convenient and efficient. 
* This means **we do not read the codes**. We do not look out for comments in the codes, but **we will not guess** what a plot means. Be explicit and describe even in once sentence what you did.

* Feel free to use the notebooks (it may not be the most efficient), be careful when printing (Check out nbconvert to produce a pdf or even latex document)

## Background Information

#### Python comments
**In python, most of the statistical functions are already coded and available within `scipy.stats`**

Some versions of scipy, numpy, and matplotlib may have different calling sequences.
Checkout the documentation to understand how to properly call/use them. Also Google is your friend.

In [None]:
# if you use the notebook only: pylab inline!
%pylab inline            
import numpy as np       # numerical package
from scipy import stats  # most of the common distributions
import pylab as plt      # matplotlib plotting library

## Binomial

This probability distribution describes processes in which an event can have only one of two possible outcomes, such as tossing a coin, detecting something at a security check, or winning the lottery, where $p$ is the probability of one event, call it "success" and (therefore) $1-p$ is the probability of the other event, call it "failure". If the trial is repeated independently n times, then we are interested in the probability of getting a total of exactly $r$ successes, call it $P (r \mid p, n)$.

plot $P$ vs. $r$ for fixed $n$ and first a fixed $p$, then a range of $p$.

In [None]:
r = np.arange(0, 10, 1)
p = np.arange(0.1, 0.9, 0.1)
for pk in p:
    p_r = stats.binom.pmf(r, n=10, p=pk)
    plt.plot(r, p_r, 'o-', color='k', mfc='w',)
plt.xlabel('r')
plt.ylabel('P(r $\mid$ n, p)');

$P (r \mid p, n)$ vs. $r$ for fixed $n=10$ for various values of $p$. The distribution is discrete: points are joined with lines just to help identification of the points with common $p$.

In [None]:
r = [1, 2, 3, 5, 9]
p = np.arange(0., 1.01, 0.05)
for rk in r:
    p_r = stats.binom.pmf(rk, n=10, p=p)
    plt.plot(p, p_r, 'o-', color='k', mfc='w')
plt.xlabel('p')
plt.ylabel('P(r $\mid$ n, p)');

$P (r \mid p, n)$ vs. $p$ for fixed $n$ for various values of $r$. 
**Note that this latter plot is not a probability distribution function over $p$.**

## Beta
A convenient prior for some quantity $p$ bound to lie between $0$ and $1$, is the beta distribution, which is
described by two parameters, the shape and scale.

## Poisson
The binomial distribution describes events in which there is a definite event taking place which has a
definite two way result: it's either a "success" or a "failure"; something happens or it doesn’t. A lot of
natural processes are only "one-way", by which I mean it's clear if they happened but not clear if they
didn't. For example, lightning strikes, α particle emission from a radioactive source. In these cases you
can't count non-events because we cannot identify a sequence of events where something is supposed
to happen or not. Suppose that on average these events occurs at a rate of $\lambda$, so that $\lambda$ is the expected
number of events in some unit time. We would like to find the probability that we get $r$ events in this
interval.

### Example: Radioactive decay

Consider a radioactive source with half life $t_{1/2}$. If $N_0$ is the initial number of radioactive atoms, then
the number left after time $t$ is given by
$$N = N_0 \exp(−t/\tau ) = N_0 \exp(−\lambda t)$$

where $\tau = t_{1/2} / 2$. The mean (expected) number of decays per unit time is $\lambda = 1/\tau$ . The distribution of the number of decays per time interval is a Poisson distribution. To see this from a set of data, we record
the time at which decay occurs, and then divide the entire time span up into constant intervals of unit time. We then count how many of these intervals have 0, 1, 2, 3, etc. decays. When normalized, this is approximately a Poisson distribution with mean $1/\tau$.

We can demonstrate this using a simulation of radioactive decay. Let’s assume a source has a decay time
scale given by $\lambda = 10$ (per unit time interval). The number of decays in any unit time interval can be
simulated by drawing once from a Poisson distribution with this mean. We do this a large number of
times to simulate the data. Based just on these data, we then count how many of these intervals
have 0, 1, 2, 3, etc. decays. We then overplot on this with a Poisson density distribution, with a mean
derived from the data, and scaled from a probability distribution to give the expected number of counts
per time interval.

## Gaussian (Normal)

The Gaussian or Normal distribution is probably the best known and most commonly used distribution
in the physical sciences.

## Gamma

The gamma distribution is a semi-infinite distribution: it is only non-zero for $x > 0$.

## Cauchy

also known as Lorentz distribution.
It is the distribution of a random variable that is the ratio of two independent standard normal variables.

## Exercices 
100 points in total

**Exercise 1** (10 points)

Make sure you can find your way with the coding language of your choice.
Reproduce the plots given in the first chapter of the lecture notes.

Binomial, Beta, Poisson, Gamma, Cauchy and example of radioactive decay.

** Exercise 2** (10 points)

Imagine you make an experiement and measure a series of values.
From the dataset `rvs.dat`, identify by eye which distribution likely generated these values. 

* Make a plot of the distribution and overlay what you think the distribution is. (Explain/Justify)

_Tip_: this may be a distribution that is **not** in the lecture's notes.

**Exercise 3** (10 points)

You have two boxes with red and blue balls in each. 
* Box $I$ has 3 red and 2 blue balls. 
* Box $II$ has 2 red and 8 blue balls. 

A fair coin is tossed. If it lands heads you take a ball at random from box $I$. 
If tails, you take a ball at random from box $II$. What is the probability that the ball is red?

**Exercise 4** (10 points)

Now someone else tosses the coin but doesn’t tell you whether it is head or tails. But she does tell you that a red ball was drawn. What is the probability that it was drawn from box $I$?

**Exercise 5** (10 points)

If the chance of finding life on one planet is 1 in $n$, and you search for life on $n$ planets, what is the
probability of finding life on at least one planet? What is this in the limit as $n \rightarrow \infty$?

**Exercise 6**  (10 points)

In a room full of people, how many people do you have to ask before there is a 50% chance (or more) that any two or more of them share a common birthday? 

* What are your asssumptions?

* make a code that plots this Probability as a function of N.

**Exercise 7**  (10 points)

In a room full of people, how many people do you have to ask before there is a 50% chance (or more) that one of them shares your birthday?

**Exercise 8**  (10 points)

Show that the FWHM and IQR of the Cauchy distribution are equal to 2b. (10 points)

Show that the mean, variance or any moment of the Cauchy distribution are undefined. (10 points)

**Exercise 9**  (10 points)

Children inherit a fair-mix of the genetic material of both of their parents. Blood type is one of the famous examples in this domain. Blood type O is a recessive gene feature, and thus requires that both parents transmit the blood type gene to the baby. However it is also the most common gene, almost everyone has this gene.

* If you suppose that the probability of one parent to have a blood type O gene to be 1/2, what is the probability that a child born has a blood type O?

If these parents have 5 children, 

* what is the probability that exactly 2 of them have type O blood? plot it for other values
* What is the expected number of children with type O blood?
* What is the probability of at least 2 children with type O blood?

**Exercise 10**  (10 points)

Let $X$ represent the fraction of the population in a certain city who obtain the flu vaccine.
and $X$ follows a probability distribution $P(x) = 2 x$  (if $0\leq x \leq 1$).

Note that this distribution is correctly normalized.

* Find $P(1/4 \leq X \leq 1/2)$
* Find $P(X > 1/2)$
* What expectation and variance?