# Random variables
When we conduct an experiment, we are usually concerned about the "function of the outcome", rather than the actual outcome itself.

For example, a company would conduct a survey on their products to determine "how likely they fail", as opposed to "how many products failed during the experiment".

These values are random quantities determined by the outcome of the experiment

**Example**

Suppose that we have an experiment on flipping two coins, and we are interested in the number of heads flipped.

We know that $S = \{HH, HT, TH, TT\}$.
For our goal, we define a random variable $X$(which is a function on $S$) which correspond to the number of heads flipped.

Thus, $X : S \to \mathbb{R}$, 
such that $X(HH) = 2, X(HT) = X(TH) = 1, X(TT) = 0$
where $\mathbb{R}$ is the set of real numbers

More formally, it can be written as:
$$
X(s) = 
\begin{cases}
2 \quad s=HH, \\
1 \quad s=HT \text{ or } TH \\
0 \quad s=TT,
\end{cases}
$$

Hence, the range space $R_X$ is $\{0, 1, 2\}$.

---

Thus, we arrive at our definition.
A **random variable** is any function $X$ which assigns a real number to every element $s \in S$

Note that it can be any function that maps elements in $S$ to $\mathbb{R}$, even somewhat nonsensical ones.
For example, we can set $X(HH) = -5, X(HT) = X(TT) = 2.34, X(TH) = \pi$ and it still satisfy as a random variable.

## Properties
* $X$ is a real-valued function
* The range space of $X$ is a set of real numbers
* If $S$ has elements that are real number, we typically set $X(s) = s$ for ease of notation
    * If all of elements of $S$ are real numbers, then $R_X = S$ using our convention above

## Equivalent events

Given a sample space $S$, with $X$ as a random variable and $R_X$ be the range space.
Suppose we have an event $A$ in $R_X$.
(Note $A$ is defined on the random variable instead of a sample point in $S$, *ie* $A \subset R_X$)
And suppose that we have event $B$ such that 
$$
B = \{s \in S | X(S) \in A\}
$$
In other words, $B$ contains all the sample points which becomes a value in $A$ after transforming by $X$.

Then we can say $A$ and $B$ are **equivalent events** and 
$$
Pr(A) = Pr(B)
$$

**Example**

Suppose we toss two coins. Let $X$ be the number of heads flipped.

Then $A = \{0\}$ is equivalent to $B = \{TT\}$

Also $A = \{0, 1\}$ is equivalent to $B = \{HT, TH, TT\}$

Notice that there is a shift in how we view $A$ and $B$.
Before, when querying $Pr(B)$, we are asking "what is the probability that $TT, HT$ or $TT$ occurs".
But when using the random variable, we are instead asking "what is the probability that at most 1 head appear".
Hence, it is no longer a question about the occurrence of an event in $S$.

---

Note that some events in $S$ may not have an equivalent event in $R_X$.
Like $\{HH, TH\}$ for example in the previous example.

## Discrete probability distribution
Given a random variable $X$.

If the number of possible values for $X$ is **finite or countable infinite**, we call $X$ a **discrete random variable**.

This means that each value of $X$ must be associated with a certain probability $f(x)$.
$f(x)$ is called the **probability function** of $x$.
We can pair each $x_i$ with their corresponding probability to get the **probability distribution of $X$**, $(x_i, f(x_i))$

### Properties
* $f(x_i) \geq 0$ for all $x_i$
* $\sum f(x_i) = 1$

**Example**

Suppose we toss two coins, and let $X$ be the number of heads obtained.

Then the probability distribution would be as follows:

| x  | 0 | 1 | 2 |
|:----:|:---:|:----:|:---:|
| $$f(x) = Pr(X=x)$$ | 1/4 | 1/2 | 1/4 |

We can also easily verify that the property above holds.

---

## Continuous probability distribution

Suppose that the range space $R_X$ of $X$ is defined as an **interval or collection of intervals**.
In this case, $X$ is a **continuous random variable**.

Now, $f(x)$ represents a **probability density function** which determines the $Pr(X=x)$.

### Properties
* $f(x) \geq 0$ for all $x \in R_X$
* $f(x) = 0$ for all $x \notin R_X$
* $\int _{-\infty} ^\infty f(x) dx = 1$

Note that they are similar to that in the discrete case.

To obtain probability of the random variable being between two values, we get

$$
Pr(c \leq X \leq d) = \int _c ^ d f(x) dx
$$

This corresponds to the area under the graph of $f(x)$ between $x = c$ and $x=d$.

For any value $x_0 \in X$, 
$$
Pr(X =x_0) = \int _{x_0} ^ {x_0} f(x) dx = 0
$$

In the continuous case, the probability that $X$ equals a fixed value is 0.

**Corollary**:
It means that we can use $\leq$ and $<$ interchangeably in the continuous case.

**Corollary**: It follows that $Pr(A) = 0$ does not necessary imply that $A=\emptyset$.

## Cumulative distribution function

We define the **cumulative distribution function**, $F(x)$, of a random variable $X$ as
$$
F(x) = Pr(X \leq x)
$$

### Discrete c.d.f
In the discrete case, it simplifies to
$$
F(x) = \sum _{k \leq x} Pr(X = k)
$$

Graphically, $F(x)$ is step function.

For any number $a,b$, $a \leq b$,
$$
Pr(a \leq X \leq b) = Pr(X \leq b) - Pr(X < a) = F(b) - F(a^-)
$$

where $a^-$ is the largest value of $X$ that is strictly less than $a$.

If $R_X$ is all integers, then we get
$$
Pr(a \leq X \leq b) = F(b) - F(a-1)
$$

**Corollary**: Setting $a=b$, we get
$$
Pr(X=a) = F(a) - F(a-1)
$$

### Continuous c.d.f
In the continuous case, we get
$$
F(x) = \int ^x _{\infty} f(k) dk
$$

**Corollary:**
$$
f(x) = \frac{d}{dx} F(x)
$$

when the derivative exists

Also, 
$$
Pr(a \leq X \leq b) = Pr(a < X \leq b) = F(b) - F(a)
$$

### Properties of c.d.f
* $F(x)$ is non-decreasing
* $0 \leq F(x) \leq 1$

## Aggregate values <span id="aggregate"/>

### Expected value
Given a random variable $X$ which has $R_X = \{x_1, x_2, \dots\}$, and a probability function $f(x)$.

#### Discrete

The **mean/expected value** of $X$ is defined as follows:

$$
\mu_X = E(X) = \sum _i x_if(x_i) = \sum _x x f(x)
$$

where both $\mu_X$ and $E(X)$ denotes the mean/expected value.

If $N = |R_x|, f(x) = 1/N$, (that is, each event is equally likely), then it simplifies to:
$$
E(X) = \frac{1}{N}\sum_i x_i
$$
which is simply the average of $N$ items.

Note that $E(X)$ may not be in $R_X$.
For example, the expected value from a roll of a die is $3.5$, which is not a possible roll of a die.

#### Continuous
Similarly, we define the following for the continuous case:


$$
\mu_X = E(X) = \int ^\infty_{-\infty} x f(x) dx
$$


### Expectation of a function of $X$
Given another function $g(X)$ of a random variable $X$ with $f_X(x)$.

#### Discrete
$$
E(g(X)) = \sum_x g(x)f_x(x)
$$

#### Continuous
$$
E(g(X)) = \int ^\infty _{-\infty} g(x)f_x(x) dx
$$


**Example**
It might be a bit abstract to see why we would want to apply a function to the probability function.

Thus, for example, we are playing a game which involves flipping two coins.
Let $X$ be the random variable associated with the number of heads flipped.
Our probability distribution function is as follow:
$$
f(x) = \begin{cases}
\frac{1}{4} \quad x = 0,\\
\frac{1}{2} \quad x = 1,\\
\frac{1}{4} \quad x = 2,\\
\end {cases}
$$

Now suppose that I will award you \$1 for playing the game, and an extra \$3 for each head you flip.

What is the expected amount you would gain from playing the game?

In this case, $g\left(x\right) = 3x + 1$.

In fact, we don't need to be concerned on how complicated $g\left(x\right)$ may be when dealing with the discrete case.
We simply need the values for each corresponding $x$ to compute the expected value.

Thus, we get
$$
\begin{align}
E\left(g\left(X\right)\right) &= \sum_x g\left(x\right) f_x\left(x\right) \\
&= \sum_x \left(3x + 1\right) f_x\left(x\right) \\
&= \left(3\left(0\right)+1\right) \left(\frac{1}{4}\right) + \left(3\left(1\right)+1\right) \left(\frac{1}{2}\right) + \left(3\left(2\right)+1\right) \left(\frac{1}{4}\right) \\
&= \left(\frac{1}{4}\right) + 4 \left(\frac{1}{2}\right) + 7\left(\frac{1}{4}\right) \\
&= 4
\end{align}
$$

Hence, we are expected to win \$4 when playing the game.

Indeed, this is (approximately) what we get when we simulate the game as per below:

In [16]:
import random

N = 1_000_000
total = 0

sum((random.choice([0,1,1,2])* 3 + 1 for _ in range(N)))/N

4.001464

---

#### Properties
* $E(aX + b) = a E(X) + b$

### Variance
The variance of $X$ is defined as:

$$
\sigma _x ^2 = V(X) = E((X-\mu_X)^2)
$$

the above can be further expanded using the previous formula by setting $g(x) = (x-\mu_x)^2$

#### Properties
* $V(X) \geq 0$
* $V(X) = E(X^2) - (E(X))^2$
* $V(aX + b) = a^2 V(x)$

<span hidden> TODO: Add proof<span/>

The **standard deviation** is defined as:
$$
\sigma _X = \sqrt {V(X)}
$$

## Chebyshev's inequality
If we know the probability distribution of $X$, then we can compute $E(X),V(X)$.
But given $E(X)$ and $V(X)$, we cannot reconstruct the probability distribution of $X$.
(A way to reason about it is that a decent chunk of information is lost once we look at only the mean and the variance).
It follows that we cannot compute 
$$
Pr(|X - E(X)| \leq c)
$$
without knowledge of the underlying distribution.

However,the **Chebyshev's inequality** provides us a bound to this probability.
It states that, for any **positive number** $k$,
$$
Pr(|X - E(X)| \geq k V(X)) \leq \frac{1}{k^2}
$$

**Corollary:**
$$
Pr(|X - E(X)| < k V(X)) \geq 1-\frac{1}{k^2}
$$

**Example**

A bus waiting time has $\mu_X = 15, \sigma_X = 2$ minutes.

Suppose that we wish to find a bound on $Pr(11 < X < 19)$.

$$
\begin{align}
Pr(11 < X < 19) &= Pr(15 - 2(2) < X < 15 + 2(2))\\
&= Pr(-2(2) < X-15 < 2(2))\\
&= Pr(|X-15| < 2(2)) \\
&\geq 1- \frac{1}{2^2}\\
&= 3/4
\end{align}
$$

---

<span hidden> TODO: Add example on looseness of bound <span/>