In [2]:
import sys, os
sys.path.append(os.path.join(os.path.dirname('.'), '..','src'))

import numpy as np
from basis import Bspline
from forecast import forecast
import tqdm

np.random.seed(2) 

# Forecasting Simulated Dataset

In this notebook we forecast our simulated data set using a functional time series style approach [1]. We consider a number of functional decompositions to facilitate this approach. In particular we consider the following decompositions:

* Functional Principal Component Analysis [2]
* Maximum Autocorrelation Factor Rotations [3]
* Regularised Functional Principal Component Analysis 
* Regularised Maximum Autocorrelation Factor Rotations

[1]: H. Shang Lin, ‘ftsa: An R Package for Analyzing Functional Time Series’, The R Journal, vol. 5, no. 1, p. 64, 2013, doi: 10.32614/RJ-2013-006.

[2]: J. O. Ramsay and B. W. Silverman, Functional data analysis. New York (N.Y.): Springer Science+Business Media, 2010.

[3]: G. Hooker and S. Roberts, ‘Maximal autocorrelation functions in functional data analysis’, Stat Comput, vol. 26, no. 5, pp. 945–950, Sep. 2016, doi: 10.1007/s11222-015-9582-5.


# Methodology

The general methodology for forecasting time series of functional data using a functional decomposition is given in [1]. An overview however is below, and can be generally broken into the following steps:

1. Obtain a functional decomposition (FPCA for example) of the time series of funcitonal observations.
2. This gives a collection of eigenfunctions and related scores.
3. Treat each series as independent and forecast using univariate time series methods. 
4. Reconstruct the functional variables for future time points by reconstructing from estimated eigenfunctions and the **forecast** for the corresponding score series for the future time points. 

In our work below we complete such steps 1-4 for each simulation generated with the various noise scenarios as described in [here]. We consider 1, 3, 10, 15, 20 and 25 steps ahead forecast for each simulated data set, with the first 90 time points being considered as a training set. We do this in a procedural manner with adding an observation and repeating the forecast with an additional observation for each simulation, allowing us to obtain an average over the data set for the 1, 3, 10, 15, 20, and 25 steps ahead forecasts. 

We complete this process for each simulation generated [here] and save the results for each noise type and decomposition type in a seperate `npz` binary file. 

[here]: ./data_generation.ipynb

# Setup

In the following codeblock we setup the domain parameters, a basis system for our functional decomposition and the various inner products needed for the decomposition methodology. These are constant and unaffected by the noise processes so we can calculate them just once for the whole notebook. In particular we set our basis to have $25$ basis functions in each dimension of our spatial functional data. 

In [3]:
## Domain parameters
S1, S2, T = 128, 128, 128
t = np.linspace(0,1,T)

## Basis system 
bs = Bspline((-1,1), 25, 4)
B = np.kron(bs(np.linspace(-1,1,S1)), bs(np.linspace(-1,1,S1)))
J = np.kron(np.eye(bs.K), bs.penalty(0)) + np.kron(bs.penalty(0), np.eye(bs.K))

## Penalties for regularisation and mafr. 
NDERIV = 2
P = np.kron(np.eye(bs.K), bs.penalty(NDERIV)) + np.kron(bs.penalty(NDERIV), np.eye(bs.K))
LOG_LAMBDA = -4.0

## Step sizes, training data set, and components to use.
STEPS = [1,3,10,15,20,25]
N_INIT = 90
N_COMP = 3

## Constant simulated data (nsimulations X datasets)
SIM_PATH = '../data/simulated.npz'
LN_PATH = '../data/simulated_ln.npz'
HN_PATH = '../data/simulated_hn.npz'
SN_PATH = '../data/simulated_sn.npz'
data = np.load(SIM_PATH)
C_arr = data['C']
Y = np.einsum("ij, kjl->kil", data['PHI'], C_arr).swapaxes(-1,1)

# FPCA
The following codeblock runs the forecasting using an FPCA decomposition for each of the low noise, high noise and structure noise scenario. 

In [3]:
for path, label in zip([LN_PATH, HN_PATH, SN_PATH], ['ln', 'hn', 'sn']):
    noise = np.load(path)
    Y_e = Y + noise['sim']
    results = np.zeros((len(C_arr), len(STEPS)))
    for i in tqdm.tqdm(np.arange(len(Y_e))):
        results[i, :] = forecast(Y_e[i], Y[i], t, B, P, -14, NDERIV, J, N_INIT, STEPS, N_COMP, False)
    np.savez('../results/simulated_fpca'+'_'+label+'.npz', results=results)

100%|██████████| 100/100 [09:37<00:00,  5.78s/it]
100%|██████████| 100/100 [09:37<00:00,  5.78s/it]
100%|██████████| 100/100 [11:11<00:00,  6.71s/it]


# MAFR
The following codeblock runs the forecasting using an MAFR decomposition for each of the low noise, high noise and structure noise scenario. 

In [4]:
for path, label in zip([LN_PATH, HN_PATH, SN_PATH], ['ln', 'hn', 'sn']):
    noise = np.load(path)
    Y_e = Y + noise['sim']
    results = np.zeros((len(C_arr), len(STEPS)))
    for i in tqdm.tqdm(np.arange(len(Y_e))):
        results[i, :] = forecast(Y_e[i], Y[i], t, B, P, -14, NDERIV, J, N_INIT, STEPS, N_COMP, True)
    np.savez('../results/simulated_mafr'+'_'+label+'.npz', results=results)

100%|██████████| 100/100 [09:52<00:00,  5.92s/it]
100%|██████████| 100/100 [09:59<00:00,  5.99s/it]
100%|██████████| 100/100 [10:44<00:00,  6.45s/it]


# Regularised FPCA
The following codeblock runs the forecasting using an regularised FPCA decomposition for each of the low noise, high noise and structure noise scenario. 

In [5]:
for path, label in zip([LN_PATH, HN_PATH, SN_PATH], ['ln', 'hn', 'sn']):
    noise = np.load(path)
    Y_e = Y + noise['sim']
    results = np.zeros((len(C_arr), len(STEPS)))
    for i in tqdm.tqdm(np.arange(len(Y_e))):
        results[i, :] = forecast(Y_e[i], Y[i], t, B, P, LOG_LAMBDA, NDERIV, J, N_INIT, STEPS, N_COMP, False)
    np.savez('../results/simulated_reg-fpca'+'_'+label+'.npz', results=results)

100%|██████████| 100/100 [09:22<00:00,  5.63s/it]
100%|██████████| 100/100 [09:26<00:00,  5.66s/it]
100%|██████████| 100/100 [10:17<00:00,  6.18s/it]


# Regularised MAFR
The following codeblock runs the forecasting using an regularised MAFR decomposition for each of the low noise, high noise and structure noise scenario. 

In [6]:
for path, label in zip([LN_PATH, HN_PATH, SN_PATH], ['ln', 'hn', 'sn']):
    noise = np.load(path)
    Y_e = Y + noise['sim']
    results = np.zeros((len(C_arr), len(STEPS)))
    for i in tqdm.tqdm(np.arange(len(Y_e))):
        results[i, :] = forecast(Y_e[i], Y[i], t, B, P, LOG_LAMBDA, NDERIV, J, N_INIT, STEPS, N_COMP, True)
    np.savez('../results/simulated_reg-mafr'+'_'+label+'.npz', results=results)

100%|██████████| 100/100 [09:20<00:00,  5.61s/it]
100%|██████████| 100/100 [09:14<00:00,  5.54s/it]
100%|██████████| 100/100 [09:56<00:00,  5.97s/it]
