Does Long-Term Series Forecasting Need Complex Attention and Extra Long Inputs?

The repo is the official implementation for the paper: Does Long-Term Series Forecasting Need Complex Attention and Extra Long Inputs?.

Introduction

Transformer-based application on Long-term Time series Forecasting (LTSF) tasks still has two major issues that need to be further investigated: 1) Whether the sparse attention mechanism designed by these methods actually reduce the running time on real devices; 2) Whether these models need extra long input sequences to guarantee their performance? The answers given in this paper are negative. Therefore, to better copy with these two issues, we design a lightweight Period-Attention mechanism (Periodformer). Furthermore, to take full advantage of GPUs for fast hyperparameter optimization (e.g., finding the suitable input length), a Multi-GPU Asynchronous parallel algorithm based on Bayesian Optimization (MABO) is presented. Compared with the state-of-the-art methods, the prediction error of Periodformer reduced by 13% and 26% for multivariate and univariate forecasting, respectively. In addition, MABO reduces the average search time by 46% while finding better hyperparameters. As a conclusion, this paper indicates that LTSF may not need complex attention and extra long input sequences.

Contributions

It is found that although the computational complexity of those traditional Transformer-based LTSF methods is theoretically reduced, their running time on practical devices remains unchanged. Meanwhile, it is found that both the input length of the series and the kernel size of the MA have impacts on the final forecast.
A novel Period-Attention mechanism (Periodformer) is proposed, which renovates the aggregation of long-term subseries via explicit periodicity and short-term subseries via built-in proximity. In addition, a gate mechanism is built into Period-Attention to adjust the influence of the attention score to its output, which guarantees higher prediction performance and shorter running time on real devices.
A multi-GPU asynchronous parallel search algorithm based on Bayesian optimization (MABO) is presented. MABO allocates a process to each GPU via a queue mechanism, and then creates multiple trials at a time for asynchronous parallel search, which greatly accelerates the search speed.
Periodformer reduces the prediction error of state-of-the-art (SOTA) methods by around 14.8% and 22.6% for multivariate and univariate forecasting, respectively. Besides, MABO reduces the average search time by around 46% while finding out better hyperparameters.