In [0]:
import os
from scipy.io import loadmat

# Get current notebook directory
notebook_dir = os.getcwd()
file_1_path = os.path.join(notebook_dir, "input1.mat")
file_2_path = os.path.join(notebook_dir, "input2.mat")
file_3_path = os.path.join(notebook_dir, "input3.mat")

input_1 = loadmat(file_1_path)["X"].flatten()
input_2 = loadmat(file_2_path)["X"].flatten()
input_3 = loadmat(file_3_path)["X"].flatten()


###a) 
As a starting point, analyze the time series data sets and choose appropriate parameters e.g., filter order which is the same as the number of present and past input samples to predict the series. Based on your analysis of these time series, which filter order would you choose for the prediction of these series? Briefly explain the reasons behind your choices. (Hint: using ACF and partial ACF measures to guide your analysis) 

In [0]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt
import numpy as np

for n, data in enumerate([input_1, input_2, input_3]):

    fig, ax = plt.subplots(3, 1, figsize=(10, 8))

    ax[0].plot(data)
    ax[0].set_title(f"Input {n+1}: Raw data Mean = {np.mean(data):.2f}")

    # Plot ACF
    plot_acf(data, ax=ax[1], lags=20, title=f"Input {n+1}: Autocorrelation Function")

    # Plot PACF
    plot_pacf(data, ax=ax[2], lags=20, title=f"Input {n+1}: Partial Autocorrelation Function", method="ywm")

    plt.tight_layout()
    plt.show()


All inputs have a mean close to 0.

####Input 1

We see a decaying ACF and one significant peak in the PACF, so a filter order of one is chose for prediction
This implies that the data can be modelled using an AR(1) model.

####Input 2

We see a decaying ACF and one significant peak in the PACF, so a filter order of one is chose for prediction
This implies that the data can be modelled using an AR(1) model.

####Input 1

We see a decaying ACF and two significant peaks in the PACF, so a filter order of two is chose for prediction
This implies that the data can be modelled using an AR(2) model.

###b) 

By choosing to predict these time series from previous samples, we are implicitly assuming that the time series exhibit an autoregressive nature. From your analysis of the time series in the previous part, can you confirm whether this assumption is true?

It can be assumed that the series have an autoregressive nature. See response above.

###c) 
Design 1-step ahead Wiener filter for the given time series. Using the filter coefficients, predict the time series (1-step ahead) and plot the predicted signals along with the true signals.

From the theory on Wiener filters, we know that an optimal filter for linear prediction can be found as

$$
\mathbf{h}=R_{yy}^{-1}\mathbf{r}_{xy}
$$

Where \\(y\\) is the observed signal, and \\(x\\) is the parameter to estimate.

In prediction, the cross covariance can be expressed as 

$$
C_{xy}=\mathbb{E}[x_n \mathbf{y}_n]=\mathbb{E}[y_{N-1+I} \mathbf{y}_n]=\mathbb{E}[y_{N-1+I} [y_0,..y_{N-1}]]=[r_{yy}(N-1+I),..r_{yy}(I)]=\mathbf{r}_{yy}'
$$
And since we have one step prediction, we set \\(I=1\\)

In [0]:
from statsmodels.tsa.stattools import acf

P_ = [1, 1, 2]
I = 1
for n, x in enumerate([input_1, input_2, input_3]):
    P = P_[n]
    var_x = np.var(x)
    print(f"Input {n+1} - Filter length={P}, I={I}, Variance: {var_x}")

    acf_values = acf(x, nlags=P+I, fft=False) * var_x
    
    rxx = np.zeros((P,P))
    rxy = np.zeros(P)
    for i in range(P):
        rxy[i] = acf_values[i+I]
        for j in range(P):
            rxx[i, j] = acf_values[np.abs(i-j)]

    h = np.linalg.solve(rxx, rxy)
    print(f"Filter coeffecients: {h}")

    x_hat = np.convolve(x, h[::-1], mode='same')#[:len(x)]
    # Shift predcitions by one to line up with true value
    x_hat = np.concatenate([[0], x_hat])[:-1]

    # We ignore the first P samples where our estimate is invalid
    mse = np.mean((x[P:] - x_hat[P:])**2)
    rmse = mse / var_x

    plot_num = 100
    
    plt.figure()
    plt.plot(x[:plot_num], label=f"Input {n+1}")
    plt.plot(x_hat[:plot_num], label=f"Input {n+1} one-step-predicted")
    plt.legend()
    plt.title(f"Input {n+1} - MSE={mse:.3f}, rMSE={rmse:.3f}")

###d) 
Observe and comment on the accuracy of the predictors. To obtain a quantitative measure of accuracy, compute and compare the mean squared error of the resulting prediction filters.


To better interpret the MSE values, I have calculated the relative MSE as rMSE = MSE / Var(signal)

I also notice that for one step prediction, I have a MSE close to 1. This could indicate that the processes are generated using noise with variance 1, since when we perform one step prediction(On an AR(p) process), the expected error is the noise variance of the process.

#### Input 1
We have a high relative MSE which is expected since the low PACF values and filter coefficients imply a low correlation

#### Input 2
We have a low relative MSE which again is expected because of the high PACF values. However, since lag 1 have a high correlation with the current sample, the filter coefficient is large.

#### Input 3

Here we have a medium(Compared to the other two) relative MSE, again implied by the PACF values(Approx 0.75-0.5). The filter coeffecients acts a a low pass filter, weighing the lag 1 sample higher, and predicts the current value as the weighted mean of the two last.

###e) 
Observe the values of filter coefficients that you obtained. Do these coefficients give any information about the nature of the underlying time series (Hint: Note that you implicitly assumed the autoregressive (AR) nature of the time series.) 

See answer above