This is the official implementation of FITS. Please run the scripts in scripts\FITS for results. Scripts without _best
are for ablation study and grid search for parameters. Scripts with _best
are for multiple run on the optimal parameters.
See updates here: Update
Wanna see something beyond FITS? Check:
"Beyond Trend and Periodicity: Guiding Time Series Forecasting with Textual Cues" Paper Code Dataset
- We add a notebook for interpretability. We analyze FITS on synthetic datasets to show its capability of modeling sinusodial waves.
- We add a model Real_FITS which use two linear layer to simulate the complex multiplication. This model can achieve the same result of FITS. Real_FITS can be used on devices that do not support complex number calculation (e.g. RTX4090).
- We add a onnx implementation of FITS with the architecture of Real_FITS. ONNX is an open format built to represent machine learning models. It can be directly deploy on embedded system devices such as STM32. As far as we know, there is compatability issue on Cube AI with onnx opset17.
- All the training scripts are updated!
- Files for anomaly detection are uploaded! Please check the instruction here
- ⚠ We find a long standing bug in our code which may affect a wide range of research work. Please check the Important Notice section for more information. We have been actively fixing this bug and rerun all our experiments as well as the baseline models we compared with.
- We have updated the final results of FITS in this repo. Also, the arxiv version of paper is updated.
- The experiment scripts are updated and logs for FITS are updated.
- FITS is accepted by ICLR 2024 as Spotlight presentation!!! We will update the new results in camera ready version.
- 2024-09-19 FITS reaches 100 GITHUB STARS!!! Thanks for the support and recognition of you guys! 🎉🎉🎉
We've identified a significant bug in our code, originally found in Informer (AAAI 2021 Best Paper), thanks to Luke Nicholas Darlow from the University of Edinburgh. This issue has implications for a broad spectrum of research on time series forecasting, including but not limited to:
- PatchTST (ICLR 2023) - Link to affected code
TimesNet (ICLR 2023) - Link to affected code(Note: We later find TimesNet uses batch_size=1 during testing. Thus, it is not impacted by this issue.)- DLinear (AAAI 2022 reported version) - Link to affected code
- Informer (AAAI 2021 Best Paper) - Link to affected code
- Autoformer (NIPS 2021 reported version) - Link to affected code
- Fedformer (ICML 2022) - Link to affected code
- FiLM (ICLR 2023) - Link to affected code
iTransformer (ICLR 2024 score: 8886) - Link to affected code(Note: We later find iTransformer uses batch_size=1 during testing. Thus, it is not impacted by this issue.)
Efforts are underway to correct this bug, and we will update our Arxiv submission and this repository with the revised results. A bug fix method will also be released to assist the community in addressing this issue in their work.
The bug stems from an incorrect implementation in the data loader. Specifically, the test dataloader uses drop_last=True
, which may exclude a significant portion of test data, particularly with large batch sizes, leading to unfair model comparisons.
To fix this issue in codebases using LSTF-Linear's architecture:
-
In data_factory.py within the data_provider folder (usually on line 19), change:
if flag == 'test': shuffle_flag = False drop_last = True batch_size = args.batch_size freq = args.freq
To:
if flag == 'test': shuffle_flag = False drop_last = False #True batch_size = args.batch_size freq = args.freq
-
In your experiment script (e.g., ./exp/exp_main.py), modify the following (around line 290):
From:
preds = np.array(preds) trues = np.array(trues) inputx = np.array(inputx) # some times there is not this line, it does not matter
To:
preds = np.concatenate(preds, axis=0) trues = np.concatenate(trues, axis=0) inputx = np.concatenate(inputx, axis=0) # if there is not that line, ignore this
If you do not do this, it will generate an error during testing because of the dimension 0 (batch_size) is not aligned. Maybe this is why everyone is dropping the last batch. But concatenate them on the 0 axis (batch_size) can solve this problem.
-
Run the officially provided scripts!
The best result is in bold and the second best is in italic. The results are reported in terms of MSE. This is still preliminary results for FITS. We are rerunning the parameter search, ablation study and multi-runs for the final results. The final results will be updated in the paper. Following are our final results. We have reported these results in the ICLR final version.
Model | ETTh1-96 | ETTh1-192 | ETTh1-336 | ETTh1-720 | ETTh2-96 | ETTh2-192 | ETTh2-336 | ETTh2-720 | ETTm1-96 | ETTm1-192 | ETTm1-336 | ETTm1-720 | ETTm2-96 | ETTm2-192 | ETTm2-336 | ETTm2-720 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PatchTST | 0.385 | 0.413 | 0.44 | 0.456 | 0.274 | 0.338 | 0.367 | 0.391 | 0.292 | 0.33 | 0.365 | 0.419 | 0.163 | 0.219 | 0.276 | 0.368 |
Dlinear | 0.384 | 0.443 | 0.446 | 0.504 | 0.282 | 0.350 | 0.414 | 0.588 | 0.301 | 0.335 | 0.371 | 0.426 | 0.171 | 0.237 | 0.294 | 0.426 |
FedFormer | 0.375 | 0.427 | 0.459 | 0.484 | 0.340 | 0.433 | 0.508 | 0.480 | 0.362 | 0.393 | 0.442 | 0.483 | 0.189 | 0.256 | 0.326 | 0.437 |
TimesNet | 0.384 | 0.436 | 0.491 | 0.521 | 0.340 | 0.402 | 0.452 | 0.462 | 0.338 | 0.374 | 0.410 | 0.478 | 0.187 | 0.249 | 0.321 | 0.408 |
FITS | 0.372 | 0.404 | 0.427 | 0.424 | 0.271 | 0.331 | 0.354 | 0.377 | 0.303 | 0.337 | 0.366 | 0.415 | 0.162 | 0.216 | 0.268 | 0.348 |
IMP | 0.003 | 0.009 | 0.013 | 0.032 | 0.003 | 0.007 | 0.013 | 0.014 | -0.011 | -0.007 | -0.001 | 0.004 | 0.001 | 0.003 | 0.008 | 0.020 |
Model | Weather-96 | Weather-192 | Weather-336 | Weather-720 | Electricity-96 | Electricity-192 | Electricity-336 | Electricity-720 | Traffic-96 | Traffic-192 | Traffic-336 | Traffic-720 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
PatchTST | 0.151 | 0.195 | 0.249 | 0.321 | 0.129 | 0.149 | 0.166 | 0.210 | 0.366 | 0.388 | 0.398 | 0.457 |
Dlinear | 0.174 | 0.217 | 0.262 | 0.332 | 0.140 | 0.153 | 0.169 | 0.204 | 0.413 | 0.423 | 0.437 | 0.466 |
Fedformer | 0.246 | 0.292 | 0.378 | 0.447 | 0.188 | 0.197 | 0.212 | 0.244 | 0.573 | 0.611 | 0.621 | 0.630 |
TimesNet | 0.172 | 0.219 | 0.280 | 0.365 | 0.168 | 0.184 | 0.198 | 0.220 | 0.593 | 0.617 | 0.629 | 0.640 |
FITS | 0.143 | 0.186 | 0.236 | 0.307 | 0.134 | 0.149 | 0.165 | 0.203 | 0.385 | 0.397 | 0.410 | 0.448 |
IMP | 0.008 | 0.009 | 0.013 | 0.014 | -0.005 | 0.000 | 0.001 | 0.001 | -0.019 | -0.009 | -0.012 | 0.009 |
The discovered bug predominantly impacts results on smaller datasets like ETTh1 and ETTh2. Interestingly, for other datasets, certain models, such as PatchTST on ETTm1, demonstrate enhanced performance. FITS still maintains its good enough and comparable-to-sota performance.
-
We have uploaded the training logs for community review. Additionally, we've provided logs for other baseline models. It's important to note that these logs were generated using their respective official codebases, not the versions in this repository.
-
We will update the training scripts of FITS very soon. -
We Have update the training scripts.
-
For fairness, we have conducted baseline runs using freshly cloned codebases with the original hyperparameters. (Note: Avoid using versions from this repository.) TimesNet, which is unaffected by this issue, was not re-run and is mentioned here only for reference.
-
We encourage the community to apply the provided bug fix and re-conduct their experiments.
(A minor note: The only change we made in hyperparameters was reducing the learning rate for DLinear on ETTh2 from 0.05 to 0.005, resulting in improved outcomes.)
(A word of caution: Training PatchTST, particularly on datasets like traffic and electricity, can be extremely time-consuming.)
(We failed to reproduce the FiLM result since it takes over 40GB GPU memory and over 2 hour per epoch on an A800. Further, the provided scripts seems to have flaws, i.e. the 'modes1' parameter is set to 1032 in ETTh1 instead of the '32' in others, the train_epoch is 1 in ETTh2 which may result in a downgraded performance. Thus, we exclude FiLM in the following analysis since we can not ensure a fair comparison.)
In previous anomaly detection works, anomaly threshold is calculated based on the test_set, see affected code in Anomaly Transformer. Such setting may violate the assumption that the test_set should be unavailable before deploying the model. Such method may cause information leakage and cherrypicked result on the test_set.
As claimed in the paper, FITS directly uses the validation set for threshold selecting as indicated in code.
However, we still compare FITS with the results reported in their original paper which may have potential information leakage. And we encourage the community to reevaluate the affected methods for further reference. XD
FITS benefits from large batch size. Our latest version uses batch size of 128. Some results are not updated due to limited time.Please run the scripts for ETT datasets with _fin.
We thank Luke Darlow from the University of Edinburgh who find the bug.