
**Best Linear Prediction**

In best linear prediction to goal is to predict a value \\(y\\) from previous values \\(x_1\\), \\(x_2\\) etc. 

When using best linear predcition in time series, we try to predict a future value from historical values.

To ensure our prediction is the "best", the mean squared error \\(E[x_{n+m}-\widehat{x}_{n+m}]\\) is minimized. 

To guarantee that the MSE is minimized, the orthogonality principle is used. This states that the prediction error must be uncorrelated with the predictor variable. Mathematically, this is stated as

$$ cov(x_{n+m}-\widehat{x}_{n+m}, x_k) = E[(x_{n+m}-\widehat{x}_{n+m})x_k] = 0 $$

For \\(k=1,2,3,..,n\\)

Considering the case of one step prediction where \\(x_0, x_1,...,x_n\\) is given and we want to find \\(x_{n+1}\\)

$$ \widehat{x}_{n+1} = \phi_{n,1} x_n + \phi_{n,2} x_{n-2} + ... + \phi_{n,n} x_{1} = \sum_{j=1}^n \phi_{n,j} x_{n+1-j}$$ 

Here the coeffecients are denoted \\(\phi_{i,j}\\) to specify that they are coeffecient \\(j\\) calculated with samples available at time \\(i\\).

This leads to the following optimality criterion.

$$ E[(x_{n+1}-\widehat{x}_{n+1})x_{n+1-k}] = 0$$

For \\(k=1,2,3,..,n\\)

Inserting \\(\widehat{x}\\)

$$ E[(x_{n+1}-\sum_{j=1}^n \phi_{n,j} x_{n+1-j})x_{n+1-k}] = 0 $$

$$ E[x_{n+1}x_{n+1-k}]-\sum_{j=1}^n \phi_{n,j} E[x_{n+1-j}x_{n+1-k}] = 0 $$

$$ \sum_{j=1}^n \phi_{n,j} \gamma(k-j) = \gamma(k) $$

This can be written in matrix form

$$ \mathbf{\Gamma_n} \mathbf{\phi}_n = \mathbf{\gamma_n} $$

Where \\(\mathbf{\Gamma_n}\\) is the \\(n \times n\\) autocovariance matrix with values from \\(\gamma(0)\\) to \\(\gamma(n-1)\\) and \\(\mathbf{\gamma_n}\\) is the \\(n \times 1\\) vector with values \\(\gamma(1)\\) to \\(\gamma(n)\\). 

To estimate a model with 2 parameters, this will yield the two following equations

$$ \phi_{n,1} \gamma(-1) + \phi_{n,2} \gamma(0) = \gamma(1) $$

$$ \phi_{n,1} \gamma(0) + \phi_{n,2} \gamma(1) = \gamma(2) $$
'
Notice how we're trying to predict the k'th covariance from the two previous. Intuitively this makes sense, since we want to 

solving for the coeffecients yields

$$ \mathbf{\phi}_n = \mathbf{\Gamma_n}^{-1} \mathbf{\gamma_n} $$

The one step prediction in matrix for is

$$ \widehat{x}_{n+1} = \bold{\phi}_n^T \bold{x} $$




In [0]:
## Define autocovariance
def autocovariance(x, lag):
    n = len(x)
    mean_x = np.mean(x)
    return np.sum((x[:n - lag] - mean_x) * (x[lag:] - mean_x)) / n

## Define autocovariance matrix
def autocov_matrix(x, max_lag):
    return np.array([[autocovariance(x, abs(i - j)) for j in range(max_lag + 1)] for i in range(max_lag + 1)])

In [0]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal

N = 10000
mu, sigma = 0, 0.1
phi = [1.5, -0.75, 0.5, -0.3]
# We know the number of model coeficients
num_model_coefs = len(phi)

def gen_blp_data(num_samples, sigma, mu, phi):
    w = sigma * np.random.randn(num_samples) + mu
    x = np.zeros(num_samples)   
    
    # Generate data for the model
    # x[n] = phi_1 * x[n-1] + phi_2 * x[n-2] + .. + phi_k * x[n-k] + w[n]
    x = signal.lfilter(b=[1], a=[1] + [-phi_ for phi_ in phi], x=w)
    return x, w

def calc_blp_coefs(x, num_model_coefs):
    cov = autocov_matrix(x, num_model_coefs)
    # Solve the equation Ax=b to find the coefficients
    Gamma = cov[-num_model_coefs:, -num_model_coefs:]
    gamma = cov[1:, 0]
    coefs = np.linalg.inv(Gamma) @ gamma
    return coefs, cov, Gamma, gamma

def blp_predict(x, prediction_coefs):

    x_hat = np.zeros_like(x)
    num_prediction_coefs = len(prediction_coefs)
    for i in range(num_prediction_coefs , len(x)):
        #x_hat[i] = x[i-1] * prediction_coefs[0] + x[i-2] * prediction_coefs[1] + x[i-3] * prediction_coefs[2] + x[i-4] * prediction_coefs[3]
        for n in range(num_prediction_coefs):
            x_hat[i] += x[i-(n+1)] * prediction_coefs[n]

    # only calculate MSE on valid predictions. The first num_prediction_coefs prediction are 0, since we need that number of 
    # samples to predict the next sample.
    mse = np.mean((x[num_prediction_coefs:] - x_hat[num_prediction_coefs:])**2)
    return x_hat, mse

    
x, w = gen_blp_data(N, sigma, mu, phi)

coefs, cov, Gamma, gamma = calc_blp_coefs(x, num_model_coefs)

x_hat, mse = blp_predict(x, coefs)

print(f"Gamma: \n{Gamma}")
print(f"gamma: \n{gamma}")
print(f"Coefs: {coefs}")

plt.figure()
plt.plot(x[:100], label="x")
plt.plot(x_hat[:100], label="x estimated")
plt.legend()
plt.xlabel("samples")
plt.ylabel("Value")

plt.figure()
plt.plot(w[:100], label="w")
plt.plot(x[:100] - x_hat[:100], label="Estimate error")
plt.legend()
plt.xlabel("samples")
plt.ylabel("Value")

**Prediction Error**

The mean square one-step prediction error is defined as 

$$ P_{n+1} = E[(x_{n+1}-\widehat{x}_{n+1})^2] = E[(x_{n+1}-\bold{\phi}_n^T \bold{x})^2]$$

By inserting \\(\mathbf{\phi}_n = \mathbf{\Gamma_n}^{-1} \mathbf{\gamma_n}\\) we get

$$ P_{n+1} = E[(x_{n+1}-(\mathbf{\Gamma_n}^{-1} \mathbf{\gamma_n})^T \bold{x})^2] = E[(x_{n+1}-\mathbf{\gamma_n}^T\mathbf{\Gamma_n}^{-1} \bold{x})^2]$$

Since \\((\bold{A}\bold{x})^T=\bold{x}^T\bold{A}^T\\) and \\(\mathbf{\Gamma}_n^{-1}\\) is symmetric.

$$ E[(x_{n+1}-\mathbf{\gamma_n}^T\mathbf{\Gamma_n}^{-1} \bold{x})^2] = E[x_{n+1}^2] - 2 E[\mathbf{\gamma_n}^T\mathbf{\Gamma_n}^{-1} \bold{x} x_{n+1}] + E[\mathbf{\gamma_n}^T\mathbf{\Gamma_n}^{-1} \bold{x} \bold{x}^T \mathbf{\Gamma_n}^{-1} \mathbf{\gamma_n}]$$

And since

$$E[\bold{x} \bold{x}^T]=\mathbf{\Gamma_n}$$

$$E[\bold{x}x_{n+1}]=\gamma_{n}$$

$$E[x_{n+1}^2]= \gamma_n(0)$$
We can simplify to

$$ E[(x_{n+1}-\mathbf{\gamma_n}^T\mathbf{\Gamma_n}^{-1} \bold{x})^2] = E[x_{n+1}^2] - 2 \mathbf{\gamma_n}^T\mathbf{\Gamma_n}^{-1} \gamma_n + \mathbf{\gamma_n}^T\mathbf{\Gamma_n}^{-1} \mathbf{\Gamma_n} \mathbf{\Gamma_n}^{-1} \mathbf{\gamma_n} = \gamma_n(0) - \mathbf{\gamma_n}^T\mathbf{\Gamma_n}^{-1} \gamma_n$$

In [0]:
## Example on mean squared error of best linear prediction
# The empirical MSE converges towards the theoretical MSE as the number of samples increases

num_samples_test = [2**i for i in range(8, 16)]

def blp_theoretical_mse(cov, Gamma, gamma):
    
    return cov[0][0] - gamma.T @ np.linalg.inv(Gamma) @ gamma

theoretical_mses = []
empirical_mses = []

for num_samples in num_samples_test:
    x, w = gen_blp_data(num_samples, sigma, mu, phi)

    coefs, cov, Gamma, gamma = calc_blp_coefs(x, num_model_coefs)
    theoretical_mses += [blp_theoretical_mse(cov, Gamma, gamma)]
    x_hat, mse = blp_predict(x, coefs)
    empirical_mses += [mse]

plt.figure()
plt.plot(num_samples_test, theoretical_mses, label="Theoretical mse")
plt.plot(num_samples_test, empirical_mses, label="Empirical mse")
plt.yscale("log")
plt.xscale("log")
plt.legend()
plt.xlabel("Samples")
plt.ylabel("MSE")
plt.title("MSE of best linear prediction")

