## Variance notebook

The variance of a r.v. $X$ is written $var(X)$ where $var(X) = E\big[ (X - E[X])^2 \big]$

The variance is itself an expected value. It is the expected value of the squared deviation from the expected value. In other words, it is how far $X$ is usually from $E[X]$. 

Don't let the formula $E\big[ (X - E[X])^2 \big]$ scare you. It's just an expected value. In general $E[X]=p(x_j)(x_j)$ where $x_j$ is the value of the r.v. $X$ on draw $j$. Here $x_j$ is $(x_j - E[X])^2$ and $E[X] = \Sigma p(x_j)(x_j - E[X])^2$. 

How do we know $p(x_k)$? Often, if you have some overvations and want to know their variance, then $p(x_k)$ can be just observed from the data. It is just the fraction of times you observe $x_k$ in $N$ draws of the data. The observed distribution is sometimes called the "empirical distribution."

The standard deviation you may have seen in stats courses, aka $\sigma$, is just the square root of the variance, i.e. $\sqrt{Var(X)}=\sigma$.

In [1]:
import numpy as np
import pandas as pd
import altair as alt

mu = 5 
sigma = 1

# draw 1000 points from a distribution with mean 5 and standard deviation 1
draws = np.random.normal(mu, sigma, 1000)

df = pd.DataFrame({"draw": draws})

alt.Chart(df).mark_bar().encode(
     alt.X("draw:Q", bin=alt.Bin(extent=[-20, 20], step=0.5), scale=alt.Scale(domain=(-20, 20))),
    y='count()',
)

### Questions 

1. What happens when you adjust mu? 
2. What happens when you adjust sigma
3. What is E[X], the expected value of the draws?
4. What is $(x_j - E[X])^2$ for each point $x_j$ in draws?
5. What probability should we assign to each $(x_j - E[X])^2$ based on the empirical distribution?
6. How can we compute the variance? 
7. How does the variance related to the standard deviation std given above?
8. Repeat these steps varying sigma