---
numbering:
  title:
    offset: 1
---

(ch4.3)=
# Variance

Expected values summarize the position of a distribution on the number line with a central value. Often it is important to summarize both the position and the spread in a distribution. In terms of a random variable, we often want to return some best prediction (e.g. an expected value) plus or minus some anticipated degree of variation.  

This chapter will focus on variance and standard deviation. Standard deviation and variance both measure the degree of variability in a random variable. Equivalently, they are summary measures of the "breadth", "width", or "spread" in a distribution.

## Definition

To summarize the spread in a distribution, we will start by centering it.

:::{note} Centered Variables

A random variable $X$ is **centered** if $\mathbb{E}[X] = 0$.

To center a random variable, $X$, define a new random variable, $X_0 = X - \mathbb{E}[X]$.

:::

Centering is often this first step in *standardizing* a random variable. 

Next, we will try to measure the average *deviation* in $X$ by measuring the average *size* of $X_0$. If $X_0$ is typically small, then most samples are near the expected value, so the distribution can't spread much. If, on the other hand, $X_0$ is typically large, then most samples are far from their expected value, so the distribution must be very broad. 

To measure the average deviation, we could try $\mathbb{E}[X_0]$. To keep our notation concise, let $\bar{x} = \mathbb{E}[X]$. Then:

$$\text{E}[X_0] = \mathbb{E}[X - \bar{x}] $$

Next, by the translation property (linearity) of expectation:

$$\text{E}[X_0] = \mathbb{E}[X - \bar{x}] = \mathbb{E}[X] - \bar{x} = \bar{x} - \bar{x} = 0. $$

So, the expected value of $X_0$ is always zero. This is not a surprise, since $X_0$ was a centered variable. 

To measure the average *size* of $X_0$, we should find the expected value of some function of $X_0$, $s(X_0)$, chosen so that $s$ is nonnegative and monotonically increasing the farther $X_0$ is from 0. We want to use a nonnegative function since size is commonly understood, like distance, or length, to be nonnegative. Moreover, when measuring variability, we don't want positive and negative deviations to cancel out.

The most natural choice would be to measure the expected *absolute* deviation:

:::{note} Mean Absolute Deviation

The **mean absolute deviation (MAD)** in a random variable, $X$, is:

$$\text{MAD}[X] = \mathbb{E}[|X_0|] = \mathbb{E}[|X - \bar{x}|]. $$

:::

Most statisticians select a related measure. Instead of averaging the asbsolute deviation, it is common practice to average the *squared* deviation, then correct the square with a square root *outside* the expectation. These two steps define the *variance* and the *standard deviation*:

:::{note} Variance and Standard Deviation

The **variance** in a random variable, $X$, is:

$$\text{Var}[X] = \mathbb{E}[X_0^2] = \mathbb{E}[(X - \bar{x})^2]. $$

The **standard deviation (SD)** in a random variable, $X$, is:

$$\text{SD}[X] = \sqrt{\text{Var}[X]} = \mathbb{E}[(X - \bar{x})^2]^{1/2}. $$

In other words the *variance* in $X$ *is the expected square deviation* between $X$ and its expected value. The *standard deviation* is the *square root* of the *expected squared deviation*, or the *root mean square* deviation.

:::

Notice, if $X$ has unit $[x]$, then variance has units $[x]^2$. For instance, if $X$ is the price of an investment, then $\text{Var}[X]$ has unit of $\text{dollars}^2$ not $\text{dollars}.$ The standard deviation has units $\text{dollars}$. For this reason, it is really the standard deviation, not the variance, that measures the spread, or variability, in $X$. 

The variance is related to the spread, or variability, in $X$ through its relation to the standard deviation. Large variances indicate large standard deviations. Since the variance is an expected square, it's value alone is often hard to interpret and is easy to misread.

The standard deviation and mean absolute deviation are differ. In particular:

$$\text{MAD}[X] = \mathbb{E}[|X_0|] = \mathbb{E}[(|X_0|^2)^{1/2}] \leq \mathbb{E}[|X_0|^2]^{1/2} = \text{SD}[X]. $$

The middle inequality is *Jensen's inequality* applied to the square root. Square roots are concave functions, so expected roots are less than or equal to the square root of an expectation. 

The mean absolute deviation and standard deviation differ since the standard deviation averages squared deviations. As a result, it is much more sensitive to large deviations, and discounts small deviations.

Given an expectation and a standard deviation, it is common practice to *standardize* a random variable.

:::{note} Standard Variables

A **standard** variable, $Z$, is a random variable that is centered (has mean zero), and has standard deviation equal to 1:

$$\mathbb{E}[Z] = 0, \quad \text{SD}[Z] = 1. $$

To **standardize** a random variable, center it (subtract off its mean), then scale it by its standard deviation:

$$ Z = \frac{X_0}{\text{SD}[X_0]} = \frac{X - \bar{x}}{\text{SD}[X]} $$

Then:

$$X = \text{SD}[X] Z + \mathbb{E}[X].$$

:::

Notice, a random variable, and its standardization, are related by a linear transformation. Often, we will define distribution families by first posing some model for a standard variable, $Z$, then by allowing $X = a Z + b$ for any choice of $a$ and $b$. The choice of $b$ assigns the distribution a central location. The choice of $a$ selects its variability, or spread, about that central location. This is why we focused on linear transformations of the inputs and outputs to functions in [Section 3.2](#ch3.2). 

:::{tip} Example

If $Z$ is a random variable with PDF, $f_Z(z) \propto g(z)$, and $X = a Z + b$, then:

$$\text{PDF}(x) \propto g((x - b)/a) $$

since $(x - b)/a$ recovers $z$. To maintain normalization:

$$\text{PDF}(x) = f_X(x) = \frac{1}{|a|} f_Z((x - b)/a) $$

:::

While standard deviations provide direct measures of spread, we will focus our study on variances. It is easy to compute standard deviations from variances, and variances have stronger algebraic properties, so are more convenient to work with. 

## Rules of Variance

Like expectations, variances are popular summaries since they admit obey convenient rules. These rules make it possible to break problems down into simpler parts. We won't cover too many rules in this chapter. Instead, we'll just check the rules associated with linear transformations:

1. **Constants:** If $X=c$ then $\text{Var}[X] = 0$. 

    So, the *variance of a constant is zero.* 

1. **Translation:** Given any $b$, 

    $$\text{Var}[X + b] = \text{Var}[X].$$

    So, the *variance after a translation is the variance before the translation.* This is an entirely sensible rule. Variances are associated to the spread, or width, of a distribution. The spread, or width, are unchanged by translating the distribution. 

    It follows that:

    $$\text{SD}[X + b] = \text{SD}[X].$$

1. **Scaling:** Given any $a$,

    $$\text{Var}[a X] = a^2 \text{Var}[X].$$

    *Proof:* Just apply the definition, then use rules of expectation from [Section 4.2](#ch4.2):

    $$\begin{aligned}
    \text{Var}[a X] & = \mathbb{E}[(a X - \mathbb{E}[a X])^2] = \mathbb{E}[(a X - a \mathbb{E}[X])^2] \\
    & = \mathbb{E}[(a(X - \bar{x}))^2] = \mathbb{E}[a^2 X_0^2] = a^2 \mathbb{E}[X_0^2] = a^2 \text{Var}[X].
    \end{aligned}$$

    So, the *variance after a scaling is the variance before the scaling, multiplied by the scaling squared.* $\square$

    It follows that:

    $$\text{SD}[a X] = |a| \text{SD}[X] $$

    :::{caution}

    Remember, $\text{Var}[a X] \neq a \text{Var}[X]$. To avoid mixing this up, check units. The units of the variance are the units of $X$, squared. So, replacing $X$ with $a X$ should change the variance by a factor of $a^2$, not $a$.

    :::

## Computing Variances

Formally, the variance is the expected value of the nonnegative random variable $X_0^2 = (X - \bar{x})^2$. So, when $X$ is discrete:

$$\text{Var}[X] = \sum_{\text{all } x}  (x - \bar{x})^2 \text{PMF}(x).$$

When $X$ is continuous:

$$\text{Var}[X] = \int_{\text{all } x}  (x - \bar{x})^2 \text{PDF}(x) dx.$$

We'll often work with a formula that breaks the variance into simpler parts.

:::{note} Variance Expanded

Given a random variable $X$:

$$\text{Var}[X] = \mathbb{E}[X^2] - \mathbb{E}[X]^2. $$

In other words, the variance is the *expected square minus the squared expectation.*

:::

*Proof:* As usual, start from the definition, expand, then apply properties of expectation:

$$\text{Var}[X] = \mathbb{E}[(X - \bar{x})^2] = \mathbb{E}[X^2 - 2 \bar{x} X + \bar{x}^2].$$

Then, by the additivity of expectation:

$$\text{Var}[X] = \mathbb{E}[X^2] +\mathbb{E}[-2 \bar{x} X] + \mathbb{E}[\bar{x}^2].$$

Then, since $\bar{x}$ is a constant, we can use linearity to pull all constants outside the expectations:

$$\text{Var}[X] = \mathbb{E}[X^2] -2 \bar{x} \mathbb{E}[ X] + \bar{x}^2.$$

Finally, since $\mathbb{E}[X] = \bar{x}$:

$$\text{Var}[X] = \mathbb{E}[X^2] - 2 \bar{x}^2 + \bar{x}^2 = \mathbb{E}[X^2] - \bar{x}^2.$$

So, the variance in $X$ is the expected square, $\mathbb{E}[X^2]$, minus the squared expectation, $\bar{x}^2$. $\square$

If you get stuck trying to compute a variance, this is often the first formula you should try next. In many cases it is easier to evaluate $\mathbb{E}[X]$ and $\mathbb{E}[X^2]$ than it is to evaluate $\mathbb{E}[(X - \bar{x})^2]$ directly. 



## Other Moments

So far we've seen two summaries based on expectations. These are each examples of the *moments* of a distribution. 

The *raw* moments are expectations of the kind $\mathbb{E}[X^n]$ for various integers $n$. The *first raw moment* is the expected value since $n = 1$ returns $\mathbb{E}[X^1] = \mathbb{E}[X]$.

The *central* moments are the raw moments of the centered variable. These are expectations of the kind $\mathbb{E}[(X - \bar{x})^n]$ for various integers $n$. The *second central moment* is the variance. Central moments can always be recovered from linear combinations of raw moments like we saw above.

Higher order central moments have been used to define other shape summaries. For example, the third central moment is commonly used as a measure of the skew in a distribution, and the fourth central moment is used to check whether the distribution is "bell shaped" in the same fashion as the famous "normal" distribution. 