## Updating Covariance over Time

**Variance:**
$$\text{Var}_t(X) = \frac{1}{t-1}\sum_{i=1}^t (x_i-\overline{X}_t)^2 $$

**Covariance:**
$$\text{Cov}_t(X, Y) = \frac{1}{t-1}\sum_{i=1}^t (x_i-\overline{X}_t)(y_i-\overline{Y}_t) $$

**Covariance Matrix:**
$$\begin{bmatrix}
\text{Cov}(X_1, X_1) & \dots & \text{Cov}(X_1, X_N)\\
\vdots & \ddots & \vdots \\
\text{Cov}(X_N, X_1) & \dots & \text{Cov}(X_N, X_N) 
\end{bmatrix} = 
\begin{bmatrix}
\text{Var}(X_1) & \dots & \text{Cov}(X_1, X_N)\\
\vdots & \ddots & \vdots \\
\text{Cov}(X_N, X_1) & \dots & \text{Var}(X_N) 
\end{bmatrix}$$

**mean update:**
$$\overline{X}_t = \frac{(t-1)\overline{X}_{t-1}+x_t}{t} = \overline{X}_{t-1}+\frac{x_t-\overline{X}_{t-1}}{t}$$

**Variance/ Covariance update ([Welford's Online Algorithm](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford%27s_online_algorithm)):**
$$M_t = \sum_{i=1}^t (x_i-\overline{X}_t)^2$$

then 

$$M_t = M_{t-1}+(x_t-\overline{X}_t)(x_t-\overline{X}_{t-1})$$

and 

$$\text{Var}_t(X) = \frac{M_t}{t-1}$$

**Proof:**

$$ M_{t-1}+(x_t-\overline{X}_t)(x_t-\overline{X}_{t-1}) = \sum_{i=1}^{t-1} (x_i-\overline{X}_{t-1})^2+(x_t-\overline{X}_t)(x_t-\overline{X}_{t-1})\\
= \sum_{i=1}^t (x_i-\overline{X}_{t-1})^2 - (x_t-\overline{X}_{t-1})^2+(x_t-\overline{X}_t)(x_t-\overline{X}_{t-1}) \\
= \sum_{i=1}^t (x_i-\overline{X}_{t-1})^2 - (x_t-\overline{X}_{t-1})(\overline{X}_t-\overline{X}_{t-1})
$$

Using

$$
\overline{X}_t-\overline{X}_{t-1} = \frac{(t-1)\overline{X}_{t-1}+x_t}{t}-\frac{t\overline{X}_{t-1}}{t} = \frac{x_t-\overline{X}_{t-1}}{t}
$$

and therefore

$$
x_t-\overline{X}_{t-1}=  t(\overline{X}_t-\overline{X}_{t-1}) 
$$

follows

$$
\sum_{i=1}^t (x_i-\overline{X}_{t-1})^2 - (x_t-\overline{X}_{t-1})(\overline{X}_t-\overline{X}_{t-1}) = \sum_{i=1}^t (x_i-\overline{X}_{t-1})^2 - t(\overline{X}_t-\overline{X}_{t-1})^2 \\
= \sum_{i=1}^t ((x_i-\overline{X}_{t-1})^2 - (\overline{X}_t-\overline{X}_{t-1})^2) \\
=  \sum_{i=1}^t (x_i-\overline{X}_{t-1}+\overline{X}_{t}-\overline{X}_{t-1})(x_i-\overline{X}_{t}) \\
= \sum_{i=1}^t (x_i-\overline{X}_{t})^2 +  (-2\overline{X}_t-2\overline{X}_{t-1})\sum_{i=1}^t (x_i-\overline{X}_{t}) \\
= \sum_{i=1}^t (x_i-\overline{X}_{t})^2 = M_t
$$

where the second last step says that the algebraic sum of deviations from the mean is zero and derives from 

$$
\sum_{i=1}^t (x_i-\overline{X}_t) = \sum_{i=1}^tx_i-\sum_{i=1}^t\overline{X}_t \\
= \sum_{i=1}^tx_i -t\overline{X}_t \\
= t(\frac{1}{t}\sum_{i=1}^tx_i -\overline{X}_t) \\
=  t(\overline{X}_t-\overline{X}_t) \\
= 0
$$

In a similar way you can update the covariance with
$$C_t = \sum_{i=1}^t (x_i-\overline{X}_t)(y_i-\overline{Y}_t)$$

and 

$$C_t = C_{t-1}+(x_t-\overline{X}_t)(y_t-\overline{Y}_{t-1}) \\
= C_{t-1}+(x_t-\overline{X}_{t-1})(y_t-\overline{Y}_{t})
$$

In [None]:
from copy import deepcopy
import numpy as np


class Covariance(object):
    def __init__(self, num_stocks: int) -> None:
        """
        Covariance Matrix class

        Parameters
        ----------
        num_stocks : int
            Number of of independent variables

        """
        self.number_stocks = num_stocks
        self.C_t = np.zeros(shape=(num_stocks, num_stocks))

        self.time = np.zeros((1, num_stocks))
        self._mean = np.zeros((1, num_stocks)) 
        self._cov = np.zeros(shape=(num_stocks, num_stocks))
        self.data = []
        return

    @property
    def mean(self):
        return self._mean

    @property
    def cov(self):
        return self._cov

    @property
    def variance(self):
        return np.diagonal(self.cov)

    @property
    def volatility(self):
        return np.sqrt(self.variance)

    def update(self, return_data: np.ndarray) -> None:
        """
        Update the covariance matrix with new data streamed in

        Parameters
        ----------
        batch_ind : np.ndarray
            Independent variable data
        """
        self.data.append(return_data)
        previous_mean = deepcopy(self._mean)
        self.time = self.time + np.where(return_data == np.nan, 0, 1)
        self._mean = np.nansum(
            [self._mean, (return_data - self._mean) / self.time], axis=0
        )

        vector_calc = (return_data - self._mean).T @ (return_data - previous_mean)

        self.C_t = np.nansum([self.C_t, (vector_calc)], axis=0)
        
        self._cov = self.C_t / np.minimum(
            (self.time - 1), (self.time - 1).T
        )




In [None]:
def test_simple_cov():
    m_size = 3
    test_data = np.random.rand(5, m_size)
    simple_cov = Covariance(m_size)
    results = []
    
    for this_data in test_data:
        print("------------------------")
        cov = simple_cov.update(this_data)
        print(simple_cov.cov)
        print(np.cov(list(map(list, zip(*simple_cov.data)))))
        results.append(cov)
    return

test_simple_cov()