# **Random Variables**

**Population parameter**: A number that describes something about the population 

**Sample statistic**: An estimate of the number computed on a sample

### **17.1 Random Variables and Distribution**
* A random variable is a *function* from the outome of a random event to a number 
* It is *random* since our sample was drawn at random; it is *variable* because its exact value depends on how this random sample came out 

Here are some examples: 

#### **17.1.1 Tossing a coin**
* Let's define a fair coin toss 
* A far coin lands on heads $H$ or tails $T$, each with probability $0.5$. With these possible outcomes, we can define a random variable $X$ as: 

$$X = \begin{cases} 
1, & \text{if the coin lands heads} \\
0, & \text{if the coin lands tails}
\end{cases}$$

### **Distributions**

To define any random variable $X$, we need to be able to specify 2 things: 
1) **Possible values**: the set of values the random variable can take on 
2) **Probabilities**: the set of probabilities describing how the total probability of $100\%$ is split over the possible values

If $X$ is discrete (has a finite number of possible values), the probability that a random variable $X$ takes on the value $x$ is given by $P(X = x)$ and probabilities must sum to $1$ 

$$ \sum_{\text{all x}} P(X = x) = 1$$

Consider the example:

$$X = \begin{cases} 
0, & P(X = x) = \frac{1}{2}\\
1, & P(X = x) = \frac{1}{2}
\end{cases}$$

The **distribution** of a random vaiable $X$ describe how the total probability of $100\%$ is split across all possible values of $X$, fully defining a random variable 

The distribution of a discrete random variable can also be represented using a histogram. If the variable is **continuous**, meaning it can take on infinitely many values, then the histogram is smooth: 

<img src="https://ds100.org/course-notes/probability_1/images/discrete_continuous.png" alt="Image Alt Text" width="600" height="220">

We often don't know the (true) distribution and inseatd compute an empirical distribution

Probabilities are also areas
* For discrete random variables, the area of the *red* bars represents the probability $X$ falls within those values 
* For continuous random variables, the *red area* under the curve represents the probabiility that $Y$ falls within those values 

<img src="https://ds100.org/course-notes/probability_1/images/probability_areas.png" alt="Image Alt Text" width="600" height="270">

Summing up the areas under this curve should give us $1$

Here is what a **probability distribution table** looks like

<img src="https://ds100.org/course-notes/probability_1/images/distribution.png" alt="Image Alt Text" width="600" height="240">

The common distributions are listed bellow: 
1) Bernoulli($p$): If $X$ ~ Bernoulli($p$), then $X$ takes on a value 1 with probability $p$, and $0$ with probability $1-p$. Bernoulli random variables are also termed the “indicator” random variables.

2) Binomial($n$,$p$): If $X$~ Binomial($n$,$p$), then $X$ counts the number of $1$ s in $n$ independent Bernoulli($p$) trials.

3) Categorical ($p_1, ... p_k$) of values: The probability of each value is $1$ / (number of possible values).

4) Uniform on the unit interval $(0, 1)$: The density is flat at $1$ on $(0, 1)$ and $0$ elsewhere. We won’t get into what density means as much here, but intuitively, this is saying that there’s an equally likely chance of getting any value on the interval $(0, 1)$.

5) Normal($\mu$, $\sigma ^2$): The probability density is specified by $\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}\frac{(x-\mu)^2}{\sigma^2}}$. This bell-shaped distribution comes up fairly often in data, in part due to the Central Limit Theorem you saw back in Data 8.

### **Expectation and Variance**

**Expectation** of a random variable $X$ is the **weighted average** of the values of $X$, where the weights are the probabilities of each value occurring. 

1) Apply the weights one *sample* at a time: 

$$E[X] = \sum_{\text{all possible x}} x P(X = x)$$
* The expectation is a *number*, not a random variable 

* Expectation is a generalization of the average, and it has the same units as the random variable 

* Also noted as the center of gravity of the probability distribution histogram, meaning if we simulate the variable many times, it is the long-run average of the simulated values 

**Variance** of a random variable is a measue of its chance error. 
* Variance asks: how far does $X$ typically vary from its average value, just by chance?

$$\text{Var}(X) = E[X^2] - (E[X])^2$$

The units of variance are in the squared units of $X$. To get it back on the right scaled, we can define the notion of standard deviation 

$$\text{SD}(X) = \sqrt{\text{Var}(X)}$$

Calculating $E[X^2]$ is as follows: 

$${E}[X^2] = \sum_{x} x^2 P(X = x)$$

Here is an illustration of everything we know so far:

<img src="https://ds100.org/course-notes/probability_1/images/exp_var.png" alt="Image Alt Text" width="600" height="320">

### **Properties of Expectation**

1) **Linearity of expectation**. The expectation of the linear transformation $aX + b$ where $a$ and $b$ are constants, is: 

$$\mathbb{E}[aX+b] = aE[\mathbb{X}] + b$$

2) Expectation is also linear in *sums* of random variables 

$$\mathbb{E}[X+Y] = \mathbb{E}[X] + \mathbb{E}[Y]$$


### **Properties of Variance**

Unlike expectation, variance is *non-linear*. The variance of the linear transformation 

$$ \text{Var}(aX + b) = a^2 \text{Var} (X) $$

Subsequently, 


$$ \text{SD}(aX + b) = |a| \text{SD} (X) $$

* Shifting the distribution by $b$ *does not* impact the *spread* of the distribution. Thus, $\text{Var}(aX+b) = \text{Var}(aX)$

* Scaling the distribution by $a$ *does* impact the spread of the distribution 

<img src="https://ds100.org/course-notes/probability_1/images/transformation.png" alt="Image Alt Text" width="600" height="220">

Another property of the sum of variances, is that this sum is affected by the (in)dependence of the random variables 

$$\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{cov}(X,Y)$$

$$\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) \qquad \text{if } X, Y \text{ independent}$$


### **Properties of Variance**

We define **covariance** of two random variables as the expected product of deviations from its expectation: 
* More generally, it's a generalization of variance to variance 

$$\text{Cov}(X, Y) = \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])]$$

Using this, we can also use our notion of the **correlation coefficient**

$$r(X, Y) = \mathbb{E}\left[\left(\frac{X-\mathbb{E}[X]}{\text{SD}(X)}\right)\left(\frac{Y-\mathbb{E}[Y]}{\text{SD}(Y)}\right)\right] = \frac{\text{Cov}(X, Y)}{\text{SD}(X)\text{SD}(Y)}$$

If $X$ and $Y$ are independent, then $\text{Cov}(X,Y) = 0$ and $r(X,y) = 0$
* This is not the same around, two variables $X$ and $Y$ may have $\text{Cov}(X,Y) = 0$ and $r(X,y) = 0$ but not be independent

### **Equal vs. Identically Distributed**

Suppose that we have two random variables $X$ and $Y$
* $X$ and $Y$ are **equal** if $X(s) = Y(s)$ for every samples $s$. Regardless of the exact sample drawn, $X$ is always equal to $Y$
    * Same thing as saying for each outcome in the probability space $X$ and $Y$ will produce the same value
    * Specifically, $X = Y$ with probability $1$ 

* $X$ and $Y$ are **identically distributed** if the distribution of $X$ is equal to the distribution of $Y$. 
    * $X$ and $Y$ take on the same set of possible values, and each one of these possible values is taken with the same probability 
    * Imagine rolling two dice: the probability distribution for each is identical

* $X$ and $Y$ are **independent and identically distributed (i.i.d)** if 
    * The variables are identically distributed 
    * Knowing the outcome of one variable does not influence our belief of the outcome of the other  
    