## Variance

Variance is defined as follows

$\text{Variance} = \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2$

- Describes spread of samples
- Units of variance is square of sample, that is why we prefer std deviation, which is sqrt of variance, standardized, same units as input. 
- To estimate population mean with sample data, apply bessel correction of N = n-1, unbiased estimator. Does not matter for large N. MLE, biased. Variance, Unbiased.

In [10]:
a = [2, 4, 6, 8, 10]
b = [2, 4, 6, 8, 10, 1200]

def get_mean(x:list[int]) -> float:
    return sum(x) / len(x)

def get_variance(x:list[int]) -> float:
    mean = get_mean(x)
    numerator = 0
    for x_i in x:
        numerator += (x_i - mean) ** 2
    variance = numerator / (len(x) - 1)
    return variance

print(get_variance(a))
print(get_variance(b))

10.0
237614.0


In [11]:
get_variance([1, 2, 3])

1.0

### Covariance
- Describes how two variances change together. Height, weight change together, temperature and ice cream change together. 
- Covariance(X, X) = variance(X)
- Covariance changes based on units, correlation is normalized

$\text{Covariance}(X, Y) = \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu_X)(y_i - \mu_Y)$


In [6]:
def get_covariance(x:list[int], y:list[int]) -> float:
    mean_x = get_mean(x)
    mean_y = get_mean(y)

    numerator = 0 
    for x_i, y_i in zip(x, y):
        numerator += (x_i - mean_x) * (y_i - mean_y)
    covariance = numerator / len(x)
    return covariance

a = [2, 4, 6, 8, 10]
b = [1, 2, 3, 4, 5]
c = [42, 12, 76, -32, 0]
print(get_covariance(a, b))
print(get_covariance(a, c))

4.0
-51.2


In [8]:
get_covariance(a, a)

8.0

## Covariance matrix
- Covariance matrix is defined for n variables, each entry (i, j) in matrix describes covariance between the variables

$$
\text{Covariance Matrix for 3 variables} = 
\begin{bmatrix}
\text{Cov}(a, a) & \text{Cov}(a, b) & \text{Cov}(a, c) \\
\text{Cov}(b, a) & \text{Cov}(b, b) & \text{Cov}(b, c) \\
\text{Cov}(c, a) & \text{Cov}(c, b) & \text{Cov}(c, c)
\end{bmatrix}
$$

Where:

- $\text{Cov}(a, a)$ is the variance of `a`
- $\text{Cov}(a, b)$ is the covariance between `a` and `b`
- $\text{Cov}(a, c)$ is the covariance between `a` and `c`
- $\text{Cov}(b, b)$ is the variance of `b`
- $\text{Cov}(b, c)$ is the covariance between `b` and `c`
- $\text{Cov}(c, c)$ is the variance of `c`

In [7]:
def get_covariance_matrix(x:list[list[int]]) -> list[list[float]]:
    n = len(x)
    covariance_matrix = []
    for i in range(n):
        covariance_vector = []
        for j in range(n):
            covariance = get_covariance(x[i], x[j])
            covariance_vector.append(covariance)
        covariance_matrix.append(covariance_vector)
    return covariance_matrix

a = [2, 4, 6, 8, 10]
b = [1, 2, 3, 4, 5]
c = [42, 12, 76, -32, 0]
X = [a, b, c]
print(get_covariance_matrix(X))

[[8.0, 4.0, -51.2], [4.0, 2.0, -25.6], [-51.2, -25.6, 1357.44]]
