# Nonnegative Matrix Factorization (NMF) for Time Series Forecasting

# Import package and codes

We import the modules cointaning the algorithm we want to tests. File include_amf_new_1.py contains all the routines for the masked AMF (solved via accelerated PALM, see Alg. 5 in De Castro and Mencarelli (2024)), while in file include_nmf_new_1.py there are all the procedures for the masked NMF (solved via accelerated HALS, see Alg. 1 in De Castro and Mencarelli (2024)). Same for overlap versions of the codes. File include_benchmark.py conatins all the routine for the benchmark algorithms we are going to compare against mAMF and mNMF.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import time
from tabulate import tabulate
import os
import random

import include_amf_new_1 as amf
import include_nmf_new_1 as nmf
import include_amf_new_1_overlap as amf_overlap
import include_nmf_new_1_overlap as nmf_overlap
import include_benchmark as bmrk

# Data Preprocessing

We import and pre-process the dataset. In this case, we consider the daily electricity consumptions of 370 Portuguese customers during the period 2011-2014, see Trindade (2015). We set the period to forecast (namely the last 28 days) and we save the values of our dataset in X_original variable.

In [None]:
data = pd.io.parsers.read_csv('data/LD2011_2014.txt', sep=";", index_col=0, header=0, low_memory=False, decimal=',')
df = data
df = df.iloc[2*96:, :]
df = df.iloc[:-1, :]
df = df.iloc[:-3*96, :]

df.index = pd.to_datetime(df.index, format='%Y-%m-%d')
df = df.groupby(pd.Grouper(freq="D")).sum()

print(df)

name = "electricity_day_"
periods_to_forecast = 4*7
X_original = df.transpose().values

# Choose parameters: W, P and K for NMF 

We choose the parameter for the mAMF and mNMF algorithms, namely the periodicity P, the number of the consecutive sub-blocks we want to pile in the same row and the ranks K for the NMF-like algorithms. The definition of \Pi(M) is automatic and is contained in the include_amf_new_1*.py and include_nmf_new_1*.py files.  

In [None]:
list_w1 = [4,13]
list_rank1 = [5, 10, 20, 30, 40, 50]
periodicity1 = 28

list_w2 = [4,13]
list_rank2 = [5, 10, 20, 30, 40, 50]
periodicity2 = 2*periodicity1

# Run forecasting procedure based on NMF 

We define array RESULT which is going to contains all the performances for the algorithm we will test. Each row of RESULT variable refers to a given algorithm, in the columns we will stock the RRMSE and RMPE indices, and the total CPU time.

In [None]:
RESULT = [[] for i in range(10)]

## 1) mAMF without overlap

We test mAMF without overlap solved by accelerated PALM. The input of amf.experiments_amf are the dataset values contained in X_original, the number of periods to forecast, the array containing the parameters W, the ranks K, and the periodicity P we want to consider in our experiments, and a string corresponding to the log file (in this latter file we print out all the performances in terms of RRMSE and RMPE indices, and total CPU time for each tuple (P,W,K)). The function amf.experiments_amf returns the best RRMSE and RPME indices among all the combinations of parameters and the corresponding CPU times.

In [None]:
w_best_rrmse_AMF, p_best_rrmse_AMF, best_error_rrmse_AMF, elapsed_time_rrmse_AMF, w_best_mpe_AMF, p_best_mpe_AMF, best_error_mpe_AMF, elapsed_time_mpe_AMF = amf.experiments_amf(X_original,periods_to_forecast,list_w1,list_rank1,periodicity1,name+"log_file_AMF")

RESULT[1].append("AMF")
RESULT[1].append(best_error_rrmse_AMF)
RESULT[1].append(elapsed_time_rrmse_AMF)
RESULT[1].append(best_error_mpe_AMF)
RESULT[1].append(elapsed_time_mpe_AMF)

## 2) mAMF with overlap

We test mAMF with overlap solved by accelerated PALM. The inputs and the outputs of amf_overlap.experiments_amf are the same as in 1) mAMF without overlap.

In [None]:
w_best_rrmse_AMF_OVERLAP, p_best_rrmse_AMF_OVERLAP, best_error_rrmse_AMF_OVERLAP, elapsed_time_rrmse_AMF_OVERLAP, w_best_mpe_AMF_OVERLAP, p_best_mpe_AMF_OVERLAP, best_error_mpe_AMF_OVERLAP, elapsed_time_mpe_AMF_OVERLAP = amf_overlap.experiments_amf(X_original,periods_to_forecast,list_w2,list_rank2,periodicity2,name+"log_file_AMF_OVERLAP")

RESULT[2].append("AMF_OVERLAP")
RESULT[2].append(best_error_rrmse_AMF_OVERLAP)
RESULT[2].append(elapsed_time_rrmse_AMF_OVERLAP)
RESULT[2].append(best_error_mpe_AMF_OVERLAP)
RESULT[2].append(elapsed_time_mpe_AMF_OVERLAP)

## 3) mNMF without overlap

We test mNMF without overlap solved by accelerated HALS. The inputs and the outputs of nmf.experiments_amf are the same as above.

In [None]:
w_best_rrmse_NMF, p_best_rrmse_NMF, best_error_rrmse_NMF, elapsed_time_rrmse_NMF, w_best_mpe_NMF, p_best_mpe_NMF, best_error_mpe_NMF, elapsed_time_mpe_NMF = nmf.experiments_amf(X_original,periods_to_forecast,list_w1,list_rank1,periodicity1,name+"log_file_NMF")

RESULT[3].append("NMF")
RESULT[3].append(best_error_rrmse_NMF)
RESULT[3].append(elapsed_time_rrmse_NMF)
RESULT[3].append(best_error_mpe_NMF)
RESULT[3].append(elapsed_time_mpe_NMF)

## 4) mNMF with overlap

We test mNMF with overlap solved by accelerated HALS. The inputs and the outputs of nmf_overlap.experiments_amf are the same as above.

In [None]:
w_best_rrmse_NMF_OVERLAP, p_best_rrmse_NMF_OVERLAP, best_error_rrmse_NMF_OVERLAP, elapsed_time_rrmse_NMF_OVERLAP, w_best_mpe_NMF_OVERLAP, p_best_mpe_NMF_OVERLAP, best_error_mpe_NMF_OVERLAP, elapsed_time_mpe_NMF_OVERLAP = nmf_overlap.experiments_amf(X_original,periods_to_forecast,list_w2,list_rank2,periodicity2,name+"log_file_NMF_OVERLAP")

RESULT[4].append("NMF_OVERLAP")
RESULT[4].append(best_error_rrmse_NMF_OVERLAP)
RESULT[4].append(elapsed_time_rrmse_NMF_OVERLAP)
RESULT[4].append(best_error_mpe_NMF_OVERLAP)
RESULT[4].append(elapsed_time_mpe_NMF_OVERLAP)

# Run benchmarks

## 1) Random Forest Regression (RFR)

We test Random Forest Regression (RFR). The time series forecasting problem is converted into a supervised learning problem by splitting the sequance conating the past values in sub-sequences, see https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/, and RFR methodology is applied, see https://machinelearningmastery.com/random-forest-for-time-series-forecasting/

In [None]:
elapsed_timeRFR, error_rrmseRFR, error_mpeRFR = bmrk.experiments_rfr(X_original,periods_to_forecast,periodicity1)

RESULT[5].append("RFR")
RESULT[5].append(error_rrmseRFR)
RESULT[5].append(elapsed_timeRFR)
RESULT[5].append(error_mpeRFR)
RESULT[5].append(elapsed_timeRFR)

## 2) Long Short-Term Memory (LSTM) 

We test deep learning approach, namely Long Short-Term Memory (LSTM). The values are standardizated in (-1,1) and the time series forecasting problem is converted into a supervised learning problem as above, and LSTM methodology is applied, see https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/

In [None]:
elapsed_timeLSTM, error_rrmseLSTM, error_mpeLSTM = bmrk.experiments_neural_network_LSTM(X_original,periods_to_forecast,periodicity1)

RESULT[6].append("LSTM")
RESULT[6].append(error_rrmseLSTM)
RESULT[6].append(elapsed_timeLSTM)
RESULT[6].append(error_mpeLSTM)
RESULT[6].append(elapsed_timeLSTM)

## 3) Gated Recurrent Units (GRU)

We test deep learning approach, namely Gated Recurrent Units (GRU), see the reference in 2).

In [None]:
elapsed_timeGRU, error_rrmseGRU, error_mpeGRU = bmrk.experiments_neural_network_GRU(X_original,periods_to_forecast,periodicity1)

RESULT[7].append("GRU")
RESULT[7].append(error_rrmseGRU)
RESULT[7].append(elapsed_timeGRU)
RESULT[7].append(error_mpeGRU)
RESULT[7].append(elapsed_timeGRU)

## 4) Exponential Smoothing (EXP)

We test Exponential Smoothing (EXP), see https://machinelearningmastery.com/exponential-smoothing-for-time-series-forecasting-in-python/

In [None]:
elapsed_timeEXP, error_rrmseEXP, error_mpeEXP = bmrk.experiments_exponential_smoothing(X_original,periods_to_forecast,periodicity1)

RESULT[8].append("EXP")
RESULT[8].append(error_rrmseEXP)
RESULT[8].append(elapsed_timeEXP)
RESULT[8].append(error_mpeEXP)
RESULT[8].append(elapsed_timeEXP)

## 5) Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors (SARIMAX)

Finally, we test Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors (SARIMAX) by means of statsmodels.tsa.statespace.sarimax functions, see https://www.statsmodels.org/devel/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html

In [None]:
elapsed_timeSARIMAX, error_rrmseSARIMAX, error_mpeSARIMAX = bmrk.experiments_SARIMAX(X_original,periods_to_forecast,periodicity1)

RESULT[9].append("SARIMAX")
RESULT[9].append(error_rrmseSARIMAX)
RESULT[9].append(elapsed_timeSARIMAX)
RESULT[9].append(error_mpeSARIMAX)
RESULT[9].append(elapsed_timeSARIMAX)

# Output tables for NMF-like procedures and benchmarks

Finally, we print out the table containing all the results for the NMF-like procedures and for the benchmark algorithms we tested so far.

In [None]:
tabulate(RESULT, headers='firstrow')