## Covariance

From https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.cov.html#numpy.cov

numpy.cov  

numpy.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None)  

"Estimate a covariance matrix, given data and weights.

Covariance indicates the level to which **two variables vary together**. If we examine N-dimensional samples, $X = [x_1, x_2, ... x_N]^T$, then the covariance matrix element $C_{ij}$ is the covariance of $x_i$ and $x_j$. The element $C_{ii}$ is the variance of $x_i$."

In [104]:
import numpy as np
x = np.array([[0, 2], [1, 1], [2, 0]])
x

array([[0, 2],
       [1, 1],
       [2, 0]])

In [105]:
x.shape

(3, 2)

In [106]:
x = x.T
# Array x has been transposed and has 2 columns - easier to visualise relationship between columns
x.shape
# (2, 3)
print(x)
#[[0 1 2]
# [2 1 0]]
print(np.cov(x))
# [[ 1. -1.]
# [-1.  1.]]

[[0 1 2]
 [2 1 0]]
[[ 1. -1.]
 [-1.  1.]]


## The variance-covariance matrix

From https://support.minitab.com/en-us/minitab/18/help-and-how-to/modeling-statistics/anova/supporting-topics/anova-statistics/what-is-the-variance-covariance-matrix/

A variance-covariance matrix is a square matrix that contains the variances and covariances associated with several variables. The diagonal elements of the matrix contain the variances of the variables and the off-diagonal elements contain the covariances between all possible pairs of variables.

From https://math.stackexchange.com/questions/710214/how-to-construct-a-covariance-matrix-from-a-2x2-data-set

The variance-covariance matrix has the following structure:   

$$ \begin{bmatrix}
    var(x) & cov(x,y) \\
    cov(x,y) & var(y) 
\end{bmatrix} $$

where $$var(x)=\frac{1}{n-1}\sum(x_i -\bar{x})^2$$ and $$cov(x,y)=\frac{1}{n-1}\sum(x_i -\bar{x})(y_i -\bar{y})$$


In [107]:
# calculate variance var(x) "by hand"
print(np.sum((xcol - xcol.mean())**2)/(xcol.size - 1))
# 1.0

1.0


In [108]:
# Calculate covariance cov(x,y) "by hand"
# Separate columns
xcol = x[0]
ycol = x[1]
# cov(x,y)
print(np.sum((xcol - xcol.mean())*(ycol - ycol.mean()))/(xcol.size - 1))
# -1.0

-1.0
