# Chapter 2 Random Variables Part I

#### *Zhuo Jianchao* 

Feb 9, 2020 *Rev 1*

## Random Variables

For elements in sample space, is it tedious to keep notations like $HH, HT, TH, TT$? In many cases, we don't care specific outcomes, instead we care some characteristics associated with them. 

We can introduce an "encoding mechanism", say, $X(\cdot)$, which takes into one possible outcome and produce one encoded number to transform different outcomes into integers, like

$$
X(HH)=1\\
X(HT)=2\\
X(TH)=3\\
X(TT)=4$$

Sometimes, this mechanism has analytical benefits. In an experiment to roll a two three times, the sample space is

$$S=\{HH,HT,TH,HH\}$$

If in this experiment, what we consider is how many times heads show up. We can follow textual intuition to encode like

$$
X(HH)=2\\
X(HT)=1\\
X(TH)=1\\
X(TT)=0\\
$$

This "encoding mechanism" is called a **random variable**, which we'll give a formal definition, and discuss in more details. 

Before we can really dive into random variable, we first need to introduce some prerequisite concepts.

### Definition 1 Measurable Function

Recall from Chapter 1 Part II, where we introduce the concept of *measurable space*. For every random experiment, we have a sample space $S$, and an associated $\sigma$-algebra $\mathbb{B}$, then $(S, \mathbb{B})$ is a measurable space.

Let's say there are two sample space $X$ and $Y$, and their corresponding $\sigma$-algebra $\mathcal{A}$ and $\mathcal{B}$ respectively, then $(X,\mathcal{A})$ and $(Y,\mathcal{B})$ are two measurable space. 

>A function $f: X \rightarrow Y$ is $(\mathcal{A}, \mathcal{B})$-measurable if and only if
>$$f^{-1}(B) \in \mathcal{A} \text{ for every } B \in \mathcal{B}$$

Axiomatical statement looks abstract and overwhelming. We can instead impose more practical meanings on these symbols, and try to better understand them.

### Example 1.1

Let's roll a dice once with three sides, $1, 2, 3$ respectively. The sample space for this random experiment is $X$, then
$$X=\{1,2,3\}$$

The $\sigma$-algebra associated with $X$ is $\mathcal{A}$, then
$$\mathcal{A}=\{\emptyset, \{1\}, \{2\}, \{3\}, \{1,2\}, \{1,3\}, \{2,3\}, \{1,2,3\}\}$$

So $(X,\mathcal{A})$ is a measurable space.

If we care how many times $1$ shows up, then the new sample space  $Y$ is now
$$Y=\{\emptyset, \{0\}, \{1\}, \{0,1\}\}$$

And the $\sigma$-algebra associated with $Y$, $\mathcal{B}$ is

$$\mathcal{B}=\{\emptyset, \{0\}, \{1\}, \{0,1\}\}$$

Now we defind a function $f(\cdot)$ to transform sample space $X$ to $Y$ by counting how many times head shows up in each element of $\mathcal{A}$ by the following rules

$$ \begin{align}
f(\emptyset)=\emptyset \tag{1} \\
f(\{1\})=1 \tag{2}\\
f(\{2\})=0 \\
f(\{3\})=0 \\
f(\{1,2\})=\{0,1\} \tag{3}\\
f(\{1,3\})=\{0,1\} \\
f(\{2,3\})=0 \\
f(\{1,2,3\})=\{0,1\}
\end{align}
$$

Take three of them to illustrate the transforming rules, 
* $(1)$ means we can't get something out of nothing. An event of nothing happening is still an event of nothing happening in other sample space;
* $(2)$ tells us the event of the number rolled being $1$ in $\mathcal{A}$ is the event that the number of head is $1$ in $\mathcal{B}$;
* $(3)$ indicates that the event of the number roll is $1$ **or** $2$ in $\mathcal{A}$ is the event that the number of head is $0$ or $1$ in $\mathcal{B}$.

All these rules follow same interpretations.

Why do we transform every element in the $\sigma$-algebra instead of basic outcome in sample space?
> Recall how probability function is defined where it is defined on the measurable sets, **events**, not to basic outcome, that's the reason why we are required to transform between elements of two $\sigma$-algebra generated by two sample space separately if we are to transform one sample space into another.

Notice the funtion $f$ transforms all elements of $\mathcal{A}$ into elements of $\mathcal{B}$. Now let's check the definition of measurable funtion which requires that a set of elements generated by every element in $\mathcal{B}$ through the reverse transform is included in $\mathcal{A}$.

Still confusing? There are four elements in $\mathcal{B}$, which we take reverse transform one by one.
1. $\emptyset$: $f^{-1}(\emptyset)=\emptyset \in \mathcal{A}$
2. $\{0\}$: $f^{-1}(\{0\})=\{2,3,\{2,3\}\} \subset \mathcal{A}$
3. $\{1\}$: $f^{-1}(\{1\})=\{1\} \in \mathcal{A}$
4. $\{0,1\}$: $f^{-1}(\{0,1\})=\{\{1,2\},\{1,3\}\{1,2,3\}\} \subset \mathcal{A}$

which satisfies the requirement, then we say the transform $f(\cdot)$ is a **measurable function**.

This function transforming $\mathcal{A}$ generated by $X$ into $\mathcal{B}$ generated by $Y$ is called **measurable** because the target space $\mathcal{B}$ can be constructed using measurable sets in $\mathcal{A}$, then $\mathcal{B}$ is measurable.

***Notice*** that a measurable function mapping one sample space into another only concerns properties of two $\sigma$-algebra regardless of the probability function defined on it.

The measurable function in this example is actually a **random variable**, with these knowledge we've learned so far, we can have a formal definition of it.

### Definition 2 Random Variable

A random variable, $X(\cdot)$ is a measurable function from the probability space $(S,\mathbb{B},P)$ to the measurable space $(\Omega, \Sigma, P')$, where $\Omega \subset \mathbb{R}$, $\Sigma$ is the $\sigma$-algebra generated by $\Omega$ such that for each basic outcome $s \in S$, there exits a corresponding unique real number  $X(s) \subset \Omega$.

Specifically, $X: S \rightarrow \Omega$.

### Definition 3 Induced Probability Function

We've mentioned that the measurability of a function only concerns the $\sigma$-algebras instead of the probability measure defined on them. Here we should continue by discussing the probability function.

Since the probability space $(\Omega, \Sigma, P')$ is constructed by $(S,\mathbb{B},P)$, the probability function $P'$ defined on the transformed probability space can be induced from original probability space.

Let's step back to our **example 1.1**, where we roll a trihedral dice. The sample space $S=\{1,2,3\}$.

The $\sigma$-algebra generated by $S$, $\mathbb{B}$ is
$$\mathbb{B}= \{\emptyset, \{1\}, \{2\}, \{3\}, \{1,2\}, \{1,3\}, \{2,3\}, \{1,2,3\}\}$$

On the measurable space $(S,\mathbb{B})$, define a probability function $P:\mathbb{B} \rightarrow [0,1]$, whose values are determined by this experiment's nature.

$$
P(\emptyset)=0\\
P(\{1\})=\frac{1}{3}\\
P(\{2\})=\frac{1}{3}\\
P(\{3\})=\frac{1}{3}\\
P(\{1,2\})=\frac{2}{3}\\
P(\{1,3\})=\frac{2}{3}\\
P(\{2,3\})=\frac{2}{3}\\
P(\{1,2,3\})=1
$$

What we care is the number of heads showing up, then we define a random variable $X:S \rightarrow \Omega \subset \mathbb{R}$, where $\Omega=\{0,1\}$ whose basic outcomes are numbers of $1$ showing up.

This random variable transforms one sample space into another by the following rules we've mentioned before.

$$
X(\emptyset)=\emptyset \\
X(\{1\})=1 \\
X(\{2\})=0 \\
X(\{3\})=0 \\
X(\{1,2\})=\{0,1\}\\
X(\{1,3\})=\{0,1\} \\
X(\{2,3\})=0 \\
X(\{1,2,3\})=\{0,1\}
$$


And $\sigma$-algebra generated by $\Omega$, $\Sigma$, is
$$\Sigma=\{\emptyset, \{0\}, \{1\}, \{0,1\}\}$$

On the measurable space $(\Omega, \Sigma)$, the probability function $P_{X}:\Sigma \rightarrow [0,1]$ of the random variable $X$ is induced by the probability function $P$.

For any event $A \subset \Sigma$, $P_X (A)=P(s \in S:X(s) \in A)$

There are four elements in $\Sigma$, we consider one by one, 
1. $\emptyset$: $P_X (\emptyset)=0$
2. $\{0\}$: $X(\{2\})=X(\{3\})=0$, then $P_X (\{0\})=P(\{2\})+P(\{3\})=\frac{2}{3}$
3. $\{1\}$: $X(\{1\})=1$, then $P_X (\{1\})=\frac{1}{3}$
4. $\{0,1\}$: $X(\{2\})=X(\{3\})=0, X(\{1\})=1$, then $P_X (\{0,1\})=P(\{1\})+P(\{2\})+P(\{3\})=1$

The induced probability function $P_X (\cdot)$ now assigns probability to all elements in $\Sigma$.