## Informer Forecasting Notebook

In the following notebook we will train the Informer on the synthetic and German wind generation data for three forecasting scenarios with prediction windows of 24, 168 and 720 hours.

To optimize the model training and forecasting experience for Jupyter Notebook, minor changes were made to `Informer2020/exp/exp_informer.py`. To allow for using the synthetic and wind datasets further amendments were made to `Informer2020/exp/exp_informer.py` and `Informer2020/data/data_loader.py`.

For the initial prototype, all models are trained for 8 epochs as in the original paper.

In [2]:
# General imports
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8')
import numpy as np
import pandas as pd
import torch
# Add Informer to path
import sys
if not 'Informer2020' in sys.path:
    sys.path += ['Informer2020']
# Import informer and argument parser
from utils.tools import dotdict
from exp.exp_informer import Exp_Informer

# **1. Synthetic Data Forecasting**

### **1.1) 24-Hour Windows**

Below we define a long list of model arguments, much of which will be reusable for further forecasting experiments.

**For ease of replication:** `args.data`, `args.root_path` and `args.data_path` can be changed to train and predict on a different dataset, and `args.pred_len` can be changed to switch to a different prediction window. Rest of the specifications are defined in accordance with the experiments of the original paper that yielded best results.

In [3]:
args = dotdict()

args.model = 'informerstack' # model of experiment, options: [informer, informerstack, informerlight(TBD)]
# Use synthetic data
args.data = 'SYNTHh1' # data
args.root_path = './SYNTHDataset/' # root path of data file
args.data_path = 'SYNTHh1.csv' # data file
# Set up univariate forecasting
args.features = 'S' # forecasting task, options:[M, S, MS]; M:multivariate predict multivariate, S:univariate predict univariate, MS:multivariate predict univariate
args.target = 'TARGET' # target feature in S or MS task
args.freq = 'h' # freq for time features encoding, options:[s:secondly, t:minutely, h:hourly, d:daily, b:business days, w:weekly, m:monthly], you can also use more detailed freq like 15min or 3h
args.checkpoints = './informer_checkpoints' # location of model checkpoints

args.seq_len = 96 # input sequence length of Informer encoder
args.label_len = 48 # start token length of Informer decoder
args.pred_len = 24 # prediction sequence length
# Informer decoder input: concat[start token series(label_len), zero padding series(pred_len)]

# Architecture specifics
args.enc_in = 1 # encoder input size
args.dec_in = 1 # decoder input size
args.c_out = 1 # output size
args.factor = 5 # probsparse attn factor
args.d_model = 512 # dimension of model
args.n_heads = 8 # num of heads
args.s_layers = [3, 2, 1] # num of encoder layers
args.d_layers = 2 # num of decoder layers
args.d_ff = 2048 # dimension of fcn in model
args.dropout = 0.05 # dropout
args.attn = 'prob' # attention used in encoder, options:[prob, full]
args.embed = 'timeF' # time features encoding, options:[timeF, fixed, learned]
args.activation = 'gelu' # activation
args.distil = True # whether to use distilling in encoder
args.output_attention = False # whether to output attention in ecoder
args.mix = True
args.padding = 0
args.freq = 'h'

args.batch_size = 32 
args.learning_rate = 1e-4
args.loss = 'mse'
args.lradj = 'type1'
args.use_amp = False # whether to use automatic mixed precision training

args.num_workers = 0
args.itr = 1
args.train_epochs = 8
args.patience = 3
args.des = 'exp'

args.use_gpu = True if torch.cuda.is_available() else False
args.gpu = 0

args.use_multi_gpu = False
args.devices = '0,1,2,3'

# GPU Handling
args.use_gpu = True if torch.cuda.is_available() and args.use_gpu else False

if args.use_gpu and args.use_multi_gpu:
    args.devices = args.devices.replace(' ','')
    device_ids = args.devices.split(',')
    args.device_ids = [int(id_) for id_ in device_ids]
    args.gpu = args.device_ids[0]

# Data parser
# Set augments by using data name
data_parser = {
    'ETTh1':{'data':'ETTh1.csv','T':'OT','M':[7,7,7],'S':[1,1,1],'MS':[7,7,1]},
    'ETTh2':{'data':'ETTh2.csv','T':'OT','M':[7,7,7],'S':[1,1,1],'MS':[7,7,1]},
    'ETTm1':{'data':'ETTm1.csv','T':'OT','M':[7,7,7],'S':[1,1,1],'MS':[7,7,1]},
    'ETTm2':{'data':'ETTm2.csv','T':'OT','M':[7,7,7],'S':[1,1,1],'MS':[7,7,1]},
    'SYNTHh1':{'data':'SYNTHh1.csv','T':'TARGET','M':[7,7,7],'S':[1,1,1],'MS':[7,7,1]}, ## our new dataset
    'SYNTHh2':{'data':'SYNTHh2.csv','T':'TARGET','M':[7,7,7],'S':[1,1,1],'MS':[7,7,1]}, ## our new dataset
    'DEWINDh_large':{'data':'DEWINDh_large.csv','T':'TARGET','M':[7,7,7],'S':[1,1,1],'MS':[7,7,1]}, ## our new dataset
    'DEWINDh_small':{'data':'DEWINDh_small.csv','T':'TARGET','M':[7,7,7],'S':[1,1,1],'MS':[7,7,1]}, ## our new dataset
}
if args.data in data_parser.keys():
    data_info = data_parser[args.data]
    args.data_path = data_info['data']
    args.target = data_info['T']
    args.enc_in, args.dec_in, args.c_out = data_info[args.features]

args.detail_freq = args.freq
args.freq = args.freq[-1:]

Below the experiment specifications can be inspected.

In [10]:
print('Args in experiment:')
print(args)

Args in experiment:
{'model': 'informerstack', 'data': 'SYNTHh1', 'root_path': './SYNTHDataset/', 'data_path': 'SYNTHh1.csv', 'features': 'S', 'target': 'TARGET', 'freq': 'h', 'checkpoints': './informer_checkpoints', 'seq_len': 96, 'label_len': 48, 'pred_len': 24, 'enc_in': 1, 'dec_in': 1, 'c_out': 1, 'factor': 5, 'd_model': 512, 'n_heads': 8, 's_layers': [3, 2, 1], 'd_layers': 2, 'd_ff': 2048, 'dropout': 0.05, 'attn': 'prob', 'embed': 'timeF', 'activation': 'gelu', 'distil': True, 'output_attention': False, 'mix': True, 'padding': 0, 'batch_size': 32, 'learning_rate': 0.0001, 'loss': 'mse', 'lradj': 'type1', 'use_amp': False, 'num_workers': 0, 'itr': 1, 'train_epochs': 8, 'patience': 3, 'des': 'exp', 'use_gpu': True, 'gpu': 0, 'use_multi_gpu': False, 'devices': '0,1,2,3', 'detail_freq': 'h'}


In [7]:
# Initialize informer
Exp_synth_24 = Exp_Informer

In [8]:
# train and predict 24 hour windows
for ii in range(args.itr):
    # setting record of experiments
    setting = '{}_{}_ft{}_sl{}_ll{}_pl{}_dm{}_nh{}_el{}_dl{}_df{}_at{}_fc{}_eb{}_dt{}_mx{}_{}_{}'.format(args.model, args.data, args.features, 
                args.seq_len, args.label_len, args.pred_len,
                args.d_model, args.n_heads, args.s_layers, args.d_layers, args.d_ff, args.attn, args.factor, args.embed, args.distil, args.mix, args.des, ii)
    # set experiments
    exp_synth_24 = Exp_synth_24(args)
    
    # train
    print('>>>>>>>start training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting))
    
    model = exp_synth_24.train(setting)
    
    # test
    print('>>>>>>>testing : {}<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'.format(setting))
    
    # We can now return prediction windows, true value windows and metrics, as well as the first batch from the test set for possible trouble-shooting
    synth_pred_24, synth_true_24, synth_mse_24, synth_mae_24, synth_24_first_batch_test = exp_synth_24.test(setting)

    torch.cuda.empty_cache()

Use GPU: cuda:0
>>>>>>>start training : informerstack_SYNTHh1_ftS_sl96_ll48_pl24_dm512_nh8_el[3, 2, 1]_dl2_df2048_atprob_fc5_ebtimeF_dtTrue_mxTrue_exp_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8521
val 2857
test 2857
	iters: 100, epoch: 1 | loss: 0.1211558
	speed: 0.1539s/iter; left time: 312.3286s
	iters: 200, epoch: 1 | loss: 0.0378935
	speed: 0.1457s/iter; left time: 281.0465s
Epoch: 1 cost time: 39.40832161903381
Epoch: 1, Steps: 266 | Train Loss: 0.1696828 Vali Loss: 0.0457727 Test Loss: 0.0442011
Validation loss decreased (inf --> 0.045773).  Saving model ...
Updating learning rate to 0.0001
	iters: 100, epoch: 2 | loss: 0.0371760
	speed: 0.3366s/iter; left time: 593.5103s
	iters: 200, epoch: 2 | loss: 0.0327412
	speed: 0.1444s/iter; left time: 240.1888s
Epoch: 2 cost time: 38.64145588874817
Epoch: 2, Steps: 266 | Train Loss: 0.0387668 Vali Loss: 0.0467871 Test Loss: 0.0459842
EarlyStopping counter: 1 out of 3
Updating learning rate to 5e-05
	iters: 100, epoch: 3 | loss: 0.0370206
	speed

### **1.2) 168-Hour Windows**

In [12]:
# Increase prediction length
args.pred_len = 168 

# Inspect new experiment arguments
print('Args in experiment:')
print(args)

# Initialize informer 
Exp_synth_168 = Exp_Informer

Args in experiment:
{'model': 'informerstack', 'data': 'SYNTHh1', 'root_path': './SYNTHDataset/', 'data_path': 'SYNTHh1.csv', 'features': 'S', 'target': 'TARGET', 'freq': 'h', 'checkpoints': './informer_checkpoints', 'seq_len': 96, 'label_len': 48, 'pred_len': 168, 'enc_in': 1, 'dec_in': 1, 'c_out': 1, 'factor': 5, 'd_model': 512, 'n_heads': 8, 's_layers': [3, 2, 1], 'd_layers': 2, 'd_ff': 2048, 'dropout': 0.05, 'attn': 'prob', 'embed': 'timeF', 'activation': 'gelu', 'distil': True, 'output_attention': False, 'mix': True, 'padding': 0, 'batch_size': 32, 'learning_rate': 0.0001, 'loss': 'mse', 'lradj': 'type1', 'use_amp': False, 'num_workers': 0, 'itr': 1, 'train_epochs': 8, 'patience': 3, 'des': 'exp', 'use_gpu': True, 'gpu': 0, 'use_multi_gpu': False, 'devices': '0,1,2,3', 'detail_freq': 'h'}


In [13]:
# Train and predict 168 hour windows
for ii in range(args.itr):
    # setting record of experiments
    setting = '{}_{}_ft{}_sl{}_ll{}_pl{}_dm{}_nh{}_el{}_dl{}_df{}_at{}_fc{}_eb{}_dt{}_mx{}_{}_{}'.format(args.model, args.data, args.features, 
                args.seq_len, args.label_len, args.pred_len,
                args.d_model, args.n_heads, args.s_layers, args.d_layers, args.d_ff, args.attn, args.factor, args.embed, args.distil, args.mix, args.des, ii)
    # set experiments
    exp_synth_168 = Exp_synth_168(args)
    
    # train
    print('>>>>>>>start training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting))
    
    model = exp_synth_168.train(setting)
    
    # test
    print('>>>>>>>testing : {}<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'.format(setting))
    
    # We can now return prediction windows, true value windows and metrics, as well as the first batch from the test set for possible trouble-shooting
    synth_pred_168, synth_true_168, synth_mse_168, synth_mae_168, synth_168_first_batch_test = exp_synth_168.test(setting)

    torch.cuda.empty_cache()

Use GPU: cuda:0
>>>>>>>start training : informerstack_SYNTHh1_ftS_sl96_ll48_pl168_dm512_nh8_el[3, 2, 1]_dl2_df2048_atprob_fc5_ebtimeF_dtTrue_mxTrue_exp_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8377
val 2713
test 2713
	iters: 100, epoch: 1 | loss: 0.2414376
	speed: 0.2124s/iter; left time: 422.3871s
	iters: 200, epoch: 1 | loss: 0.0868091
	speed: 0.2148s/iter; left time: 405.6950s
Epoch: 1 cost time: 55.95195031166077
Epoch: 1, Steps: 261 | Train Loss: 0.2622942 Vali Loss: 0.1683587 Test Loss: 0.1533240
Validation loss decreased (inf --> 0.168359).  Saving model ...
Updating learning rate to 0.0001
	iters: 100, epoch: 2 | loss: 0.0605847
	speed: 0.4739s/iter; left time: 818.8904s
	iters: 200, epoch: 2 | loss: 0.0458365
	speed: 0.2149s/iter; left time: 349.8871s
Epoch: 2 cost time: 53.554131507873535
Epoch: 2, Steps: 261 | Train Loss: 0.0608510 Vali Loss: 0.1112561 Test Loss: 0.1085942
Validation loss decreased (0.168359 --> 0.111256).  Saving model ...
Updating learning rate to 5e-05
	iters: 1

### **1.3) 720-Hour Windows**

In [18]:
# Increase prediction length
args.pred_len = 720 

# Inspect new experiment arguments
print('Args in experiment:')
print(args)

# Initialize informer 
Exp_synth_720 = Exp_Informer

Args in experiment:
{'model': 'informerstack', 'data': 'SYNTHh1', 'root_path': './SYNTHDataset/', 'data_path': 'SYNTHh1.csv', 'features': 'S', 'target': 'TARGET', 'freq': 'h', 'checkpoints': './informer_checkpoints', 'seq_len': 96, 'label_len': 48, 'pred_len': 720, 'enc_in': 1, 'dec_in': 1, 'c_out': 1, 'factor': 5, 'd_model': 512, 'n_heads': 8, 's_layers': [3, 2, 1], 'd_layers': 2, 'd_ff': 2048, 'dropout': 0.05, 'attn': 'prob', 'embed': 'timeF', 'activation': 'gelu', 'distil': True, 'output_attention': False, 'mix': True, 'padding': 0, 'batch_size': 32, 'learning_rate': 0.0001, 'loss': 'mse', 'lradj': 'type1', 'use_amp': False, 'num_workers': 0, 'itr': 1, 'train_epochs': 8, 'patience': 3, 'des': 'exp', 'use_gpu': True, 'gpu': 0, 'use_multi_gpu': False, 'devices': '0,1,2,3', 'detail_freq': 'h'}


In [19]:
# Train and predict 720 hour windows
for ii in range(args.itr):
    # setting record of experiments
    setting = '{}_{}_ft{}_sl{}_ll{}_pl{}_dm{}_nh{}_el{}_dl{}_df{}_at{}_fc{}_eb{}_dt{}_mx{}_{}_{}'.format(args.model, args.data, args.features, 
                args.seq_len, args.label_len, args.pred_len,
                args.d_model, args.n_heads, args.s_layers, args.d_layers, args.d_ff, args.attn, args.factor, args.embed, args.distil, args.mix, args.des, ii)
    # set experiments
    exp_synth_720 = Exp_synth_720(args)
    
    # train
    print('>>>>>>>start training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting))
    
    model = exp_synth_720.train(setting)
    
    # test
    print('>>>>>>>testing : {}<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'.format(setting))
    
    # We can now return prediction windows, true value windows and metrics, as well as the first batch from the test set for possible trouble-shooting
    synth_pred_720, synth_true_720, synth_mse_720, synth_mae_720, synth_720_first_batch_test = exp_synth_720.test(setting)

    torch.cuda.empty_cache()

Use GPU: cuda:0
>>>>>>>start training : informerstack_SYNTHh1_ftS_sl96_ll48_pl720_dm512_nh8_el[3, 2, 1]_dl2_df2048_atprob_fc5_ebtimeF_dtTrue_mxTrue_exp_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 7825
val 2161
test 2161
	iters: 100, epoch: 1 | loss: 0.3717384
	speed: 0.3830s/iter; left time: 709.7644s
	iters: 200, epoch: 1 | loss: 0.3124363
	speed: 0.4056s/iter; left time: 710.9963s
Epoch: 1 cost time: 96.83217573165894
Epoch: 1, Steps: 244 | Train Loss: 0.4210257 Vali Loss: 0.4713870 Test Loss: 0.4362603
Validation loss decreased (inf --> 0.471387).  Saving model ...
Updating learning rate to 0.0001
	iters: 100, epoch: 2 | loss: 0.3111594
	speed: 0.8266s/iter; left time: 1330.0097s
	iters: 200, epoch: 2 | loss: 0.0722514
	speed: 0.3895s/iter; left time: 587.7827s
Epoch: 2 cost time: 97.56407308578491
Epoch: 2, Steps: 244 | Train Loss: 0.2055912 Vali Loss: 0.1760287 Test Loss: 0.1403989
Validation loss decreased (0.471387 --> 0.176029).  Saving model ...
Updating learning rate to 5e-05
	iters: 1

# 2. **Wind Data Forecasting**

### **2.1) 24-Hour Windows**

In [20]:
# Set data to wind
args.data = 'DEWINDh_small' # data
args.root_path = './WINDataset/' # root path of data file
args.data_path = 'DEWINDh_small.csv' # data file

# Re-set prediction length to 24
args.pred_len = 24 

# Inspect new experiment arguments
print('Args in experiment:')
print(args)

# Initialize informer 
Exp_wind_24 = Exp_Informer

Args in experiment:
{'model': 'informerstack', 'data': 'DEWINDh_small', 'root_path': './WINDataset/', 'data_path': 'DEWINDh_small.csv', 'features': 'S', 'target': 'TARGET', 'freq': 'h', 'checkpoints': './informer_checkpoints', 'seq_len': 96, 'label_len': 48, 'pred_len': 24, 'enc_in': 1, 'dec_in': 1, 'c_out': 1, 'factor': 5, 'd_model': 512, 'n_heads': 8, 's_layers': [3, 2, 1], 'd_layers': 2, 'd_ff': 2048, 'dropout': 0.05, 'attn': 'prob', 'embed': 'timeF', 'activation': 'gelu', 'distil': True, 'output_attention': False, 'mix': True, 'padding': 0, 'batch_size': 32, 'learning_rate': 0.0001, 'loss': 'mse', 'lradj': 'type1', 'use_amp': False, 'num_workers': 0, 'itr': 1, 'train_epochs': 8, 'patience': 3, 'des': 'exp', 'use_gpu': True, 'gpu': 0, 'use_multi_gpu': False, 'devices': '0,1,2,3', 'detail_freq': 'h'}


In [21]:
# train and predict 24 hour windows
for ii in range(args.itr):
    # setting record of experiments
    setting = '{}_{}_ft{}_sl{}_ll{}_pl{}_dm{}_nh{}_el{}_dl{}_df{}_at{}_fc{}_eb{}_dt{}_mx{}_{}_{}'.format(args.model, args.data, args.features, 
                args.seq_len, args.label_len, args.pred_len,
                args.d_model, args.n_heads, args.s_layers, args.d_layers, args.d_ff, args.attn, args.factor, args.embed, args.distil, args.mix, args.des, ii)
    # set experiments
    exp_wind_24 = Exp_wind_24(args)
    
    # train
    print('>>>>>>>start training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting))
    
    model = exp_wind_24.train(setting)
    
    # test
    print('>>>>>>>testing : {}<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'.format(setting))
    
    # We can now return prediction windows, true value windows and metrics, as well as the first batch from the test set for possible trouble-shooting
    wind_pred_24, wind_true_24, wind_mse_24, wind_mae_24, wind_24_first_batch_test = exp_wind_24.test(setting)

    torch.cuda.empty_cache()

Use GPU: cuda:0
>>>>>>>start training : informerstack_DEWINDh_small_ftS_sl96_ll48_pl24_dm512_nh8_el[3, 2, 1]_dl2_df2048_atprob_fc5_ebtimeF_dtTrue_mxTrue_exp_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8521
val 2857
test 2857
	iters: 100, epoch: 1 | loss: 0.1922024
	speed: 0.1273s/iter; left time: 258.3546s
	iters: 200, epoch: 1 | loss: 0.1572273
	speed: 0.1439s/iter; left time: 277.4938s
Epoch: 1 cost time: 36.89381670951843
Epoch: 1, Steps: 266 | Train Loss: 0.2932847 Vali Loss: 0.1607707 Test Loss: 0.1650750
Validation loss decreased (inf --> 0.160771).  Saving model ...
Updating learning rate to 0.0001
	iters: 100, epoch: 2 | loss: 0.1601011
	speed: 0.3434s/iter; left time: 605.3271s
	iters: 200, epoch: 2 | loss: 0.1065265
	speed: 0.1480s/iter; left time: 246.1596s
Epoch: 2 cost time: 39.424134969711304
Epoch: 2, Steps: 266 | Train Loss: 0.1376669 Vali Loss: 0.1673882 Test Loss: 0.1686659
EarlyStopping counter: 1 out of 3
Updating learning rate to 5e-05
	iters: 100, epoch: 3 | loss: 0.1297745

### **1.2) 168-Hour Windows**

In [22]:
# Increase prediction length
args.pred_len = 168 

# Inspect new experiment arguments
print('Args in experiment:')
print(args)

# Initialize informer 
Exp_wind_168 = Exp_Informer

Args in experiment:
{'model': 'informerstack', 'data': 'DEWINDh_small', 'root_path': './WINDataset/', 'data_path': 'DEWINDh_small.csv', 'features': 'S', 'target': 'TARGET', 'freq': 'h', 'checkpoints': './informer_checkpoints', 'seq_len': 96, 'label_len': 48, 'pred_len': 168, 'enc_in': 1, 'dec_in': 1, 'c_out': 1, 'factor': 5, 'd_model': 512, 'n_heads': 8, 's_layers': [3, 2, 1], 'd_layers': 2, 'd_ff': 2048, 'dropout': 0.05, 'attn': 'prob', 'embed': 'timeF', 'activation': 'gelu', 'distil': True, 'output_attention': False, 'mix': True, 'padding': 0, 'batch_size': 32, 'learning_rate': 0.0001, 'loss': 'mse', 'lradj': 'type1', 'use_amp': False, 'num_workers': 0, 'itr': 1, 'train_epochs': 8, 'patience': 3, 'des': 'exp', 'use_gpu': True, 'gpu': 0, 'use_multi_gpu': False, 'devices': '0,1,2,3', 'detail_freq': 'h'}


In [23]:
# Train and predict 168 hour windows
for ii in range(args.itr):
    # setting record of experiments
    setting = '{}_{}_ft{}_sl{}_ll{}_pl{}_dm{}_nh{}_el{}_dl{}_df{}_at{}_fc{}_eb{}_dt{}_mx{}_{}_{}'.format(args.model, args.data, args.features, 
                args.seq_len, args.label_len, args.pred_len,
                args.d_model, args.n_heads, args.s_layers, args.d_layers, args.d_ff, args.attn, args.factor, args.embed, args.distil, args.mix, args.des, ii)
    # set experiments
    exp_wind_168 = Exp_wind_168(args)
    
    # train
    print('>>>>>>>start training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting))
    
    model = exp_wind_168.train(setting)
    
    # test
    print('>>>>>>>testing : {}<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'.format(setting))
    
    # We can now return prediction windows, true value windows and metrics, as well as the first batch from the test set for possible trouble-shooting
    wind_pred_168, wind_true_168, wind_mse_168, wind_mae_168, wind_168_first_batch_test = exp_wind_168.test(setting)

    torch.cuda.empty_cache()

Use GPU: cuda:0
>>>>>>>start training : informerstack_DEWINDh_small_ftS_sl96_ll48_pl168_dm512_nh8_el[3, 2, 1]_dl2_df2048_atprob_fc5_ebtimeF_dtTrue_mxTrue_exp_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8377
val 2713
test 2713
	iters: 100, epoch: 1 | loss: 0.2307981
	speed: 0.2120s/iter; left time: 421.7603s
	iters: 200, epoch: 1 | loss: 0.1850148
	speed: 0.1871s/iter; left time: 353.4714s
Epoch: 1 cost time: 53.085575342178345
Epoch: 1, Steps: 261 | Train Loss: 0.3990811 Vali Loss: 0.1926250 Test Loss: 0.1978261
Validation loss decreased (inf --> 0.192625).  Saving model ...
Updating learning rate to 0.0001
	iters: 100, epoch: 2 | loss: 0.1603144
	speed: 0.4977s/iter; left time: 860.1030s
	iters: 200, epoch: 2 | loss: 0.1607429
	speed: 0.2151s/iter; left time: 350.1922s
Epoch: 2 cost time: 56.17479872703552
Epoch: 2, Steps: 261 | Train Loss: 0.1606259 Vali Loss: 0.2184845 Test Loss: 0.2397944
EarlyStopping counter: 1 out of 3
Updating learning rate to 5e-05
	iters: 100, epoch: 3 | loss: 0.129506

### **1.3) 720-Hour Windows**

In [24]:
# Increase prediction length
args.pred_len = 720 

# Inspect new experiment arguments
print('Args in experiment:')
print(args)

# Initialize informer 
Exp_wind_720 = Exp_Informer

Args in experiment:
{'model': 'informerstack', 'data': 'DEWINDh_small', 'root_path': './WINDataset/', 'data_path': 'DEWINDh_small.csv', 'features': 'S', 'target': 'TARGET', 'freq': 'h', 'checkpoints': './informer_checkpoints', 'seq_len': 96, 'label_len': 48, 'pred_len': 720, 'enc_in': 1, 'dec_in': 1, 'c_out': 1, 'factor': 5, 'd_model': 512, 'n_heads': 8, 's_layers': [3, 2, 1], 'd_layers': 2, 'd_ff': 2048, 'dropout': 0.05, 'attn': 'prob', 'embed': 'timeF', 'activation': 'gelu', 'distil': True, 'output_attention': False, 'mix': True, 'padding': 0, 'batch_size': 32, 'learning_rate': 0.0001, 'loss': 'mse', 'lradj': 'type1', 'use_amp': False, 'num_workers': 0, 'itr': 1, 'train_epochs': 8, 'patience': 3, 'des': 'exp', 'use_gpu': True, 'gpu': 0, 'use_multi_gpu': False, 'devices': '0,1,2,3', 'detail_freq': 'h'}


In [25]:
# Train and predict 720 hour windows
for ii in range(args.itr):
    # setting record of experiments
    setting = '{}_{}_ft{}_sl{}_ll{}_pl{}_dm{}_nh{}_el{}_dl{}_df{}_at{}_fc{}_eb{}_dt{}_mx{}_{}_{}'.format(args.model, args.data, args.features, 
                args.seq_len, args.label_len, args.pred_len,
                args.d_model, args.n_heads, args.s_layers, args.d_layers, args.d_ff, args.attn, args.factor, args.embed, args.distil, args.mix, args.des, ii)
    # set experiments
    exp_wind_720 = Exp_wind_720(args)
    
    # train
    print('>>>>>>>start training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting))
    
    model = exp_wind_720.train(setting)
    
    # test
    print('>>>>>>>testing : {}<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'.format(setting))
    
    # We can now return prediction windows, true value windows and metrics, as well as the first batch from the test set for possible trouble-shooting
    wind_pred_720, wind_true_720, wind_mse_720, wind_mae_720, wind_720_first_batch_test = exp_wind_720.test(setting)

    torch.cuda.empty_cache()

Use GPU: cuda:0
>>>>>>>start training : informerstack_DEWINDh_small_ftS_sl96_ll48_pl720_dm512_nh8_el[3, 2, 1]_dl2_df2048_atprob_fc5_ebtimeF_dtTrue_mxTrue_exp_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 7825
val 2161
test 2161
	iters: 100, epoch: 1 | loss: 0.9777046
	speed: 0.3842s/iter; left time: 711.8428s
	iters: 200, epoch: 1 | loss: 0.4322140
	speed: 0.4028s/iter; left time: 706.1086s
Epoch: 1 cost time: 96.67148399353027
Epoch: 1, Steps: 244 | Train Loss: 0.8364656 Vali Loss: 0.2677885 Test Loss: 0.2772596
Validation loss decreased (inf --> 0.267788).  Saving model ...
Updating learning rate to 0.0001
	iters: 100, epoch: 2 | loss: 0.1833613
	speed: 0.8226s/iter; left time: 1323.6332s
	iters: 200, epoch: 2 | loss: 0.1839741
	speed: 0.3829s/iter; left time: 577.7579s
Epoch: 2 cost time: 96.47203469276428
Epoch: 2, Steps: 244 | Train Loss: 0.1913905 Vali Loss: 0.2042943 Test Loss: 0.2050335
Validation loss decreased (0.267788 --> 0.204294).  Saving model ...
Updating learning rate to 5e-05
	it