topics: 

---

* frequency probability (YOEL)
* subjective probability (YOEL)
* random variables (RACHEL)
* continuous vs discrete R.V.s (RACHEL) 
* conditional probability (YOEL) 
* marginal probability (YOEL)
* joint probability (RACHEL)

---

## Assigning numbers to groups of things

In very loose terms the framework of probability theory tells you how you should assign numbers, in a very specific ways, to groups of things.

This might be a bit slow but we'll try and build up in a reasonable amount of time. Imagine you have 3 groups, and inside these groups you have numbers between 1 and 10. Each group has some subset of the numbers between 1 and 10

In [None]:
import numpy as np
from collections import Counter

groupA = np.array([1, 1, 1, 3, 3, 4, 5, 6, 7, 7, 7, 8, 9, 10])
print("groupA = : ", groupA)

groupB = np.arange(2, 10, 2)
print("groupB = : ", groupB)

groupC = np.setdiff1d(groupA, groupB)
print("groupC = : ", groupC)

groupA = :  [ 1  1  1  3  3  4  5  6  7  7  7  8  9 10]
groupB = :  [2 4 6 8]
groupC = :  [ 1  3  5  7  9 10]


If you read books on probability theory the above might have the following names attached to them: 

groupA: **sample space**

groupB: **event, a subset of the sample space**

3: **outcome of the event groupC**

it's not too important that you memorize these definitions but we just want to make you aware of them. Now that we have "groups of things" we want to use the rules of probability theory to consistently assign numbers to "things happening". Usually when someone asks about a probability of an event it roughly falls into an interpertation in terms of: 

## frequency of event happening

imagine you sample a number randomly from groupA, and you tally up the times that the number(s) comes up then divide by the total amount of times sampled. If we didn't know all of the possible outcomes of groupA then we could define the probability of those numbers as so: 

In [None]:
nSamples = 500000
samples = np.random.choice(groupA, nSamples)
sampleDist = [(key, val / nSamples) for key, val in sorted(Counter(samples).items())]
print("key, probability:")
sampleDist

key, probability:


[(1, 0.214932),
 (3, 0.14412),
 (4, 0.071102),
 (5, 0.071496),
 (6, 0.071382),
 (7, 0.213216),
 (8, 0.07143),
 (9, 0.071248),
 (10, 0.071074)]

it happens to be the case that because we do know the entire sample space we can analyically compute the probability of each number, which we do below _(we can almost never do this "in the real world" with real data)_

In [None]:
trueN = len(groupA)
print("key, probability:")
trueDist = [(key, val / trueN) for key, val in sorted(Counter(groupA).items())]
trueDist

key, probability:


[(1, 0.21428571428571427),
 (3, 0.14285714285714285),
 (4, 0.07142857142857142),
 (5, 0.07142857142857142),
 (6, 0.07142857142857142),
 (7, 0.21428571428571427),
 (8, 0.07142857142857142),
 (9, 0.07142857142857142),
 (10, 0.07142857142857142)]

The above is called usually called the frequency view of probability, but it isn't the only way we can decide to assign probability. For example, there are certain events that only happen once, and so in this case, running a simulation to try and get at the underlying probability of events is impossible. 

## Degree of belief

We could alternatively imagine a case where an event only happens once, as in a presidential election. In this case we might say that we are reasonaly sure our favorite candidate will win, and assign a number to that degree of belief. This is called a subjective probability. 

In any case both definitions of probability rely on the concept of **random variables**.


## Formal definition: 

A _probability space_ contains two elements, a sample space $G$ and a probability function $P(\cdot)$ which takes an event $A$ (that is a subset of $G$) as input and returns a real number between 0 and 1: $P(A)$

There are particular properties that together a sample space and probability function must meet, these properties further rely on 2 basic claims: 

1. non-negativity: $P(nothing) = 0$ and $P(G) = 1$
2. discrtization: $P(\bigcup_{A_i \in G} A_i) = \sum _{A_i \in G} P(A_i)$

in words, the above states that disjoint events are mutually exclusive: e.g. $P(A_i , A_j) = 0$ if $i \neq j$

# Random Variables

A random variable is just like a normal variable we use in algebra, except it's value depends on the outcome of a random phenomenon. Random variables are an important part of Probability Theory. 

For example, we can consider random variable $X$ that denotes the outcome of a coin flip. If the coin flips to heads, $X = 0$, to tails, $X = 1$. We can consider different properties of random variable X such as how likely $X = 0$ or $X = 1$. We use the random variable because we don't quite know the possible outcome, and we want to be able to measure the probabilities for each outcome (altogether forming a probaiblity distribution). 



## Discrete Random Variables

The coin flip described earlier is an example of a *discrete* random variable because the possible values for X is finite or countably infinite. 

We can consider functions on discrete variables such as the **probability mass function** of a random variable which describes how likely a random variable $X$ can take on some value $x$. We can formally define this for disecrete random variables as $p(x) = P(X = x)$ such that $x$ denotes each possible value that $X$ can take. 

In the coin flip case, we can say that the set of possible values for $X$ is $X = {0, 1}$, a finite set. If the coin is fair, then we can say that $p(0) = p(X = 0) = 0.5$ and $p(1) = p(X = 1) = 0.5$. 

You can see that it will always be true that $\sum_{x \in X} p(x) = 1$ or that the sum of the probability of X being all its possible values will be 1. 


We can also consider the **cumulative distribution function (cdf)** of a random variable which describes how likely a random variable $X$ can be less than or equal to some value $x$. This is formally defined as $F(x) = P(X <= x)$ and specifically for discrete variables as $F(x) = \sum_{t\in X: t x} p(t)$. 

Examples of discrete random variables include Bernoulli (eg. the coin flip, probability $p$ for one outcome, $1-p$ for the other outcome), Binomial (a series of Bernoulli's), Poisson, discrete uniform variables. 

**EXAMPL