# Probability Day 1 Exercises

In [1]:
%autosave 1
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import scipy.stats as stats
import math
%matplotlib inline


Autosaving every 1 seconds


	"~/.matplotlib/matplotlibrc
"
	in file "/Users/sasha/.matplotlib/matplotlibrc"


---
### 1) Marginal and conditional densities I (Discrete distributions)
You’re doing a development experiment in which on each attempt of your experiment, you get either 1, or, two, or three dividing cells (we’ll call the number of dividing cells $y$), and there are 1, or, two, or three genes of interest expressed (we’ll call this number $x$). The possible outcomes occur with the following joint probability:

| _ | x = 1 | x = 2 | x = 3 |
|------|------|------|------|
| y = 1 | 4/18 | 1/18 | 1/18 | 
| y = 2 | 3/18 | 1/18 | 2/18 | 
| y = 3 | 2/18 | 1/18 | 3/18 |

**1a)** From this joint two-dimensional density, compute $P(x)$ : the marginal distribution over x

**1b)** Compute $P(y)$ : the marginal distribution over y

**1c)** Compute $P(y\,|\,x=3)$ : the conditional over $y$ given $x=3$

### 2) More Marginal and Conditional Dentisities (Middle School Survey)
As part of a STEM outreach program, you’re mentoring students while they do a survey of physical attributes (eye color, hair color) of their classmates in a middle school of 543 people. Let's refer to these random variables as $E$, and $H$ for eye color and hair color respectively. You collect the frequencies of each combination in the following pair of tables:


|  | E = *Brown* | E = *Blue* | E = *Hazel* | E = *Green* |
|------|------|------|------|------|
| H = *Black* | 52 | 19 | 12 | 2 |
| H = *Brown* | 129 | 82 | 22 | 31 |
| H = *Red* | 21 | 14 | 25 | 9 |
| H = *Blonde* | 8 | 102 | 12 | 8 |


**2a)** Compute the joint density P(E, H)

**2b)** From this joint density, compute: 

- $P(E)$: the marginal distribution over eye color
- $P(H\mid E=\text{hazel})$ : the conditional over hair color given the student has hazel eyes
- $P(E\mid H)$ : the full conditional density over eye color given hair color
- $P(H\mid E=\text{not brown})$: the conditional over hair color given the student doesn't have brown eyes.


### 3) PDFs and integrals


Consider a continuous random variable with a probability density function is $p(x) = 6x(1 − x)$ over the interval $x \in [0, 1]$. 

**3a)** Given the PMF, can you plot the PMF? 



**3b)**  One way to work with continuous random variables is to discretize it. When you plot the pmf, you had to approximate the pmf by choosing values of x with small steps to calculate individual values for p(x). Using the same method, can you approximate the integral over the entire PMF? 


**3c) ** Derive the exact integral using calculus. 



**3d) ** Does this function match the approximated integral? 




**3e)** From inspecting the graph, what is the maximal value of p(x) and for what x does it occur?

---
### 4) Marginal and conditional densities II (Continuous distributions and numerical integration)

Load the file `pdfData2D.mat`, which contains a discretely sampled 2D probability density. The variables defined include: 

$\quad$ `x` = a vector of $x$ points

$\quad$ `y` = a vector of $y$ points

$\quad$ `Pxy` = a 2D matrix, whose $(i, j)$ 'th entry is the probability $p(x = {\tt x[j]}, y = {\tt y[i]})$ 

*Note:* This isn't really "continuous", is it? When we work with continuous densities, it is common to discretely sample them in a grid. While this problem looks harder than the previous problem, all of the same methods apply! The *only* difference is that instead of three values of $x$ and $y$, we now have 200 and 250 values, respectively.
 
*Another Note:* `np.sum(Pxy) * dx * dy = 1`, where `dx = x[2] - x[1] = .1 = dy`.  Here is a block of code to get you started and help you visualize $p(x,y)$: (use this as a reference when you'll need to make your own 2D plots below)





**4a)** From this joint two-dimensional density, compute and make plots (or images) of

- $P(x)$ : the marginal distribution over x

- $P(y)$ : the marginal distribution over y

- $P(y\,|\,x=5)$ : the conditional over $y$ given $x=5$

- $P(x\,|\,y)$ : the full image of the conditional density $P(x\,|\,y)$


**4b)**  Three common statistics one might wish to compute from a density are its mean, mode, and median.  

- The *mean* is the average value, given by $\mathbb{E}[x] = \int x P(x) dx$ when $P(x)$ is a pdf, and $\mathbb{E}[x] = \sum_{i} x_i P(x_i)$ when $P(x)$ is a pmf.

- The *mode* is the value $x$ where $P(x)$ takes its maximum. We
  can write this (fancily, if we like) as $\arg\max_x P(x)$. 

- The *median* is the value of $x$ where half the probability
  mass $P(x)$ is to the left (smaller than $t$) and half the
  probability is to the right (greater than $t$). In math notation,
  this corresponds to saying that the mode $t$ satisfies 
  
  Compute the mean, mode and median of the marginal $P(x)$ and of the conditional $P(x|y=3)$. 
