# Random Variables

Khan Academy:
https://www.khanacademy.org/math/statistics-probability/random-variables-stats-library

Random variables are ways to map outcomes of random processed to numbers. Usually random variables are noted by capital letters, e.g. **X**.  

* **Discrete Random Variable** - Take distinct/separate values
* **Continuous Random Variables** - Take any value in interval

**Example**. Let's say we want to estimate the number of people we will see in the line in store. We conduct an experiment by visiting the store 50 times. Out of the 50 times we observe 0 people 24 times, 1 person 18 times, and 2 people 8 times. We estimate the probabilities as shown below.

| People in the line | Times Observed | Probability Estimate |
| :------------------ | :-------------- | :-----------------:
| 0 | 24 | $\frac{24}{24+18+8}=\frac{24}{50}$ = 0.48 = 48% |
| 1 | 18 | $\frac{18}{50}$ = 0.36 = 36% |
| 2 | 8 | $\frac{8}{50}$ = 0.16 = 16% |

Now, let's say we plan to visit the store 500 times in the coming two years. How many times do we expect to see a 2 people line? A reasonable expectiation would be

\begin{equation*}
500 \cdot \frac{8}{50} = 80
\end{equation*}


### Mean and Variance

Let's say we have a discrete random variable X which is equal to the number of workouts in a week.

 | X | P(X)
 | - | - 
 | 0 | 0.1
 | 1 | 0.15
 | 2 | 0.4
 | 3 | 0.25
 | 4 | 0.1
 
**Expected value/Mean**. The expected value of $X$ is  


\begin{equation*}
E(X) = \mu_x = 0\cdot0.1 + 1\cdot0.15 + 2\cdot0.4 + 3\cdot0.25 + 4\cdot0.1 = 2.1 
\end{equation*}

So the expected number of workouts in a week is 2.1.
 
**Variance and Standard Deviation**. Variance is a measure of spread.  

*Variance*  

\begin{equation*}
Var(X) = (0-2.1)^2\cdot0.1 + (1-2.1)^2\cdot0.15 + (2-2.1)^2\cdot0.4 + (3-2.1)^2\cdot0.25 + (4-2.1)^2\cdot0.1 = 1.19
\end{equation*}  

*Standard Deviation*  

\begin{equation*}
\sigma_x = \sqrt{Var(X)} = \sqrt{1.19} \approx 1.09
\end{equation*}

### The sum and difference of two random variables

If $X$ and $Y$ are indipendent random variables, than 

\begin{align*}
E(X + Y) &= E(X) + E(Y) \\
E(X - Y) &= E(X) - E(Y) \\
Var(X \pm Y) &= Var(X) + Var(Y)
\end{align*}

**Deriving variance of the difference of random variables**  

\begin{align*}
Var(X-Y)&=E[(X-Y-E(X-Y))^2]\\
        &=E[((X-E(X))-(Y-E(Y))^2]\\
        &=E[(X-E(X))^2-2\cdot(X-E(X))\cdot(Y-E(Y))+E(Y-E(Y))^2]\\
        &=E[(X-E(X))^2]-E[2\cdot(X-E(X))\cdot(Y-E(Y))]+E[(Y-E(Y))^2]\\
        &=Var(X)-0+Var(Y)\\
        &=Var(X)+Var(Y)
\end{align*}

## Binomial Variables

* Made up of independent trials
* Each trial has one of two discrete outcomes (either success or failure)
* Fixed number of trials
* Probability of success on each trial is constant

**Example - Binomial**. Number of heads after 10 flips of a coin.  
**Example - not a Binomial**. Number of kings after after taking 2 cards from standard deck **without** replacement (the taking of the second card is dependent of the first card).  
**Example - Binomial**. Number of kings after after taking 2 cards from standard deck **with** replacement.

* **10% Rule of Assuming "Independence"**. If our sample is less than or equal to the 10% of the population, it is ok to assum an approximate independence.

### Binomial Probability Formula

If $X$ is a Binomial Random Variable with a probability of success equal to $p$, then the probability of $k$ successes out of $n$ trials is

\begin{equation*}
P(X=k) = \binom{n}{k} \cdot p^k(1-p)^{n-k}
\end{equation*}

where $\binom{n}{k}$ is read "n choose k" and denotes the number of k-combinations of n elements without repetition.

\begin{equation*}
\binom{n}{k} = \frac{n!}{k!\cdot(n-k)!}
\end{equation*}

### Expected Value and Variance of Bernoulli Distribution

* Bernoulli distribution is the simplest case of Binomial distribution when there is **only 1 trial**.  
* If the probability of success (1) is $p$, and the probability of the failure (0) is $1-p$, then the expected value, mean, and the variance of a Bernoulli random variable are

\begin{align*}
\mu &= (1-p)\cdot0 + p\cdot1 \\
    &= p
\end{align*}

\begin{align*}
\sigma^2 &= (1-p)\cdot(0-\mu)^2+p\cdot(1-\mu)^2 \\
         &= (1-p)\cdot(0-p)^2+p\cdot(1-p)^2 \\
         &= (1-p)\cdot p^2+p\cdot(1-2p+p^2) \\
         &= p - p^2 \\
         &= p(1-p)
\end{align*}


### Expected Value and Variance of Binomial Distribution

If we have a Binomial random variable, $X$, with a probability of success equal to $p$ and n trials, then the **expected value of $X$** is

\begin{align*}
E(X) &= np\\
Var(X) &= np(1-p)
\end{align*}

**Explanation**. $X$ consists of $n$ independent Bernoulli random variables, $X_1, X_2, ..., X_n$, each of which has a probability of success equal to $p$ then

\begin{align*}
E(X) &= E(X_1 + X_2 + ... +X_n) \\
     &= E(X_1) + E(X_2) + ... + E(X_n) \\
     &= p + p + ... + p\\
     &= np
\end{align*}

\begin{align*}
Var(X) &= Var(X_1 + X_2 + ... + X_n)\\
       &= Var(X_1) + Var(X_2) + ... + Var(X_n)\\
       &= np(1-p)
\end{align*}

## Geometric Random Variable

* Made up of independent trials
* Each trial has one of two discrete outcomes (either success or failure)
* Probability of success on each trial is constant
* **Number of trials is not defined. How many trials will there be until the success?**

**Example**. Number of trials until a head occurs.

### Cumulative Geometric Probability (more than a value)

* If $V$ is a geometric random variable - the number of vehicles Emillia registers each day until she first registers an SUV. 
* The probability of success for each trial is equal to $p$ - SUV cars make up $p\cdot 100$ percent of the vehicles Emillia registers each day.
* What is the probability that the success will happen after the $k$ trials - Emillia will register k vehicles before she registers an SUV?

\begin{align*}
P(V > k) &= P(V=k+1) + P(V=k+2) + ...\\
         &= P(first k trials fail)\\
         &= (1-p)^k
\end{align*}

### Cumulative Geometric Probability (less than a value)

\begin{align*}
P(V < k) &= P(V=1) + P(V=2) + ... + P(V=k-1)\\
         &= p +p\cdot (1-p) + ... + p \cdot (1-p)^{k-2}\\
\end{align*}

Or an easyer way

\begin{align*}
P(V < k) &= P(V \leq k-1)\\
         &= 1 - P(V > k-1) \\
         &= 1 - (1-p)^{k-1}
\end{align*}

### Expected Value of Geometric Random Variable

Let's say we have a Geometric Random Variable, $X$. $X$ is a number of independent trials to get "success", where P(success) for each trial is $p$.

\begin{align*}
E(X) = \frac{1}{p}
\end{align*}

\begin{align*}
E(X) &= P(X=1)\cdot 1 + P(X=2)\cdot 2 + P(X=3)\cdot 3 + ...\\
E(X) &= 1p + 2p(1-p) + 3p(1-p)^2 + ... \\
(1-p)E(X) &= \quad\quad 1p(1-p) + 2p(1-p)^2 + 3p(1-p)^3 + ... \\
E(X) - (1-p)E(X) &= 1p + 1p(1-p) + 1p(1-p)^2 + ...\\
E(X) + (p-1)E(X) &= 1p + 1p(1-p) + 1p(1-p)^2 + ...\\
pE(X) &= p + p(1-p) + p(1-p)^2 + ...\\
E(X) &= 1 + (1-p) + (1-p)^2 + ... \stackrel{\text{geometric series}}{=} \frac{1}{1-(1-p)}\\
E(X) &= \frac{1}{p}
\end{align*}

### Probability Distribution and Probability Density Functions

* Probability Distribution Funstions - Discrete random variables
* Probability Density Functions - Continuous random variables


**Note**. The below code is incomplete, do it later

In [13]:
import random

def flip_fair_coins(n_coins):
    """ int (number of coins to flip) -> list of 1s (heads) and 0s (tails)
    """
    outcomes = []
    for i in range(n_coins):
        outcomes.append(random.randint(0,1))
    return outcomes

def trials(n_trials, n_coins):
    ## initialize a dictionary that will track the number of heads seen
    trial_dict = {}
    for i in range(n_coins+1):
        trial_dict['{}_heads'.format(i)] = (i, 0)
    
    for trial in range(n_trials):
        ## flip the coin and save the outcome
        outcome = flip_fair_coins(n_coins)
        
        ## update the heads dictionary based on how many heads we've seen in the current trial
        for heads in n_coins:
            if heads == outcome:
                trial_dict['{}_heads'.format(heads)][1] += 1
    ## divide the number of heads occurances by the number of total trials to get the probabilities
    for ele in trial_dict:
        ele[1] /= n_trials
    
    return trial_dict