# Random variables

### Intro
One of the most important concepts in statistics is the concept of a **random variable**. This is an abstraction of the concept of a variable that we see in our datasets. Namely, a random variable is an abstract concept corresponding to a variable in our dataset, but which is not necessarily observed. Thus, it carries its own distribution, which is a function that describes how the values of the variable are distributed. In statistics, we are often interested in the underlying "true" distribution of the random variable, rather than the distribution visible just from the values that we observe in our dataset. The goal of (inferential) statistics is to *estimate* this true distribution from the observed values in our dataset.

The set of all possible values of a random variable $X$ is called its **support**, denoted $\textup{supp}(X)$. It is more proper to think of a random variable as the data of a pair $(\Omega, p_X)$, where $\Omega = \textup{supp}(X)$ is the support, and $p_X$ is a PMF or PDF:

1. **Discrete case**: In this case, $\Omega$ is finite, and $p_X$ is a PMF, i.e. $p_X: \Omega \to [0,1]$ is a function that satisfies  $$\sum_{x \in \Omega} p_X(x) = 1.$$ 
2. **Continuous case**: In this case, $\Omega = \mathbb{R}$ or some interval on $\R$. In fact, for convenience and/or simplicity, we can always assume that $\Omega = \mathbb{R}$ by defining the PDF $p_X$ to be $0$ outside of the interval of interest. So, we can always regard our PDF $p_X: \mathbb{R} \to [0,\infty)$ as a function whose values add up to $1$, where "add" here means *continuous summation*... that is, an integral: $$\int_{-\infty}^{\infty} p_X(x) dx = 1.$$ 

### Three important distributions
Multiple random variables can share the same probability distribution. Some frequently used distributions are:

1. **Uniform distribution**: 

    - *Discrete case*: Suppose we have a discrete random variable with support $\Omega = \{1,\dotsc,n\}$. The **discrete uniform distribution** is the distribution in which all values are equally likely. Thus, the PMF is given by $$p_X(i) = \frac{1}{n}, \quad i=1,\dotsc,n.$$ The typical example of this is the roll of a die, where $n=6$, or the flip of a coin, where $n=2$.
    - *Continuous case*: Suppose we have a continuous random variable with support $\Omega = [a,b]$. The **continuous uniform distribution** is the distribution in which all values are equally likely. More precisely, the probability density at all points within the interval $\Omega$ is constant. In order for the area under the PDF to equal $1$, it follows that the PDF is given by $$p_X(x) = \frac{1}{b-a}, \quad x \in [a,b].$$ This highlights a key difference between PMF's and PDF's: a PMF will always have values between $0$ and $1$, while a PDF can have values greater than $1$: if $X$ is uniformly distributed over the interval $[a,b]$, with $b-a < 1$, then $p_X(x) = 1/(b-1) > 1$ for all $x \in [a,b]$!
2. **Normal distribution**: The most commonly occuring distribution for continuous random variables is (arguably) the normal distribution, which has support $\Omega = \mathbb{R}$ and PDF defined by
$$p_X(x) = \frac{1}{\sqrt{2\pi} \sigma} e^{-\frac{(x - \mu)^2}{2\sigma^2}},$$
where $\mu$ is the mean and $\sigma$ is the standard deviation; we explain these parameters below. The PDF graph looks like a Bell Curve (nay, it *is the Bell Curve*). The most special case is when $\mu=0$ and $\sigma=1$, in which case we say that $X$ is a *standard normal variable*, and the corresponding PDF is denoted by $\varphi(x)$:
\begin{equation*}
    \varphi(x) = \frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}}.
\end{equation*}
3. **Bernoulli distribution**: The Bernoulli distribution is a discrete probability distribution for a random variable which takes the value $1$ with probability $p$ and the value $0$ with probability $1-p$. For example, this random variable describes the outcome of flipping a biased coin, which favors one side with probability $p$ and the other side with probability $1-p$. The PMF is given by
\begin{align*}
    p_X(x) & = p^x (1-p)^{1-x}, \quad x \in \{0,1\}\\
        & = \begin{cases}
            p & \textup{if } x=1\\
            1-p & \textup{if } x=0.
        \end{cases}
\end{align*}

### Sample spaces
The definition of random variable above is still incomplete. The reason is that we want to be able to do arithmetic and algebra with random variables. For example, if $X_1$ and $X_2$ are random variables which represent alcohol content and sugar content of wine, respectively, then we want to be able to compare the two variables, or even add them together to get a new random variable $X_3 = X_1 + X_2$ which represents the total content of alcohol and sugar in the wine. 

For this to make sense, we need to define a **sample space** $\Omega$ for our random variables (also known as a **population** in statistics). The idea is as follows:

- We start with a sample space $\Omega$ consisting of all possible instances of a particular type of object (e.g. wines, iris flowers, cars, etc.). 
- Then, every time we ask a question about a feature or attribute of the object, the answer can be viewed as a *function* from the sample space to the set of all possible values (in principle) that could be attained by the feature. This then is how we define the associated random variable: it is a function 
\begin{equation*}
    X: \Omega \to \{\textup{all possible values of the feature}\}
\end{equation*}
which assigns to each instance in the sample space the value of the feature for that instance. Note the the co-domain for this function is what we previously called the support of the random variable, $\textup{supp}(X)$.

Note that this is an extremely abstract concept, and imho it is up for debate how useful/pertinent to reality it is. For example, let's say the random variable $X$ represents "height of a person". Then, our sample space $\Omega$ could be interpreted as "all people that currently exist", or perhaps, as "all people who have ever existed." Regardless, $\Omega$ will (until the end of time) be a finite set, and thus the random variable $X$ will be a function from a finite set to a finite set (i.e. the support will be finite because there will forever have been only finitely many instances to observe). However, we would want to view this is as a *continuous* random variable supported on $[0,\infty)$, which seems a little strange because we *know* that infinitely many of these values will never be attained, even if they have non-zero probability densities!

### The true definition of random variable
To summarize, our *final* definition of a random variable is that of a triplet $(\Omega, X, p_X)$, where:

- $\Omega$ is the sample space (the space of all possuble instances of the object under consideration).
- $X: \Omega \to \textup{supp}(X)$ is a function from the sample space to the support of the random variable (the set of all possible values attained by the random variable).
- $p_X: \textup{supp}(X) \to [0,\infty)$ is a PMF or PDF (the probability distribution of the random variable).

There is a (imho disturbing) trend in the literature wherein authors simply suppress or ignore the sample space $\Omega$ as though it does not exist. A lot of times this is doable because a lot of what we want out of the random variable can be obtained simply by considering the support and probability distribution $p_X$. However, this is not always the case, and it is important to keep in mind that the sample space is an integral part of the definition of a random variable.