# Static forecasting

The method of static forecasting uses a lagged value of the variable to be forecast as the forecast value. This is done for each time series. In our application, we want to forecast only for the next month, i.e., the horizon equals 1. We consider only lags of 1, i.e., monthly lags. That means, for example, that in order to forecast the sales value for 06-2021 in Blekinge, we would use the sales value from 05-2021 in Blekinge as the sales forecast value.

Static forecasting is meant to serve as a baseline for later comparison with the GTS model.

For our 57 time series with 24 observations each, we choose the train-validation-test split such that we have the first 15 observations, i.e., the ones from 09-2019 to 11-2020, in the training set, the following 3 observations, i.e., the ones from 12-2020 to 02-2021, in the validation set and the last 6 observations, i.e., the ones from 03-2021 to 08-2021, in the test set.

This implementation is based on the [implementation of static forecasting](https://github.com/chaoshangcs/GTS/blob/main/scripts/eval_baseline_methods.py), which <span style="font-variant:small-caps;">Shang et al. (2021)</span>, the authors of the paper *Discrete Graph Structure Learning for Forecasting Multiple Time Series*, provide on GitHub together with their implementation of the GTS model.

In [1]:
import argparse
import numpy as np
import pandas as pd
import logging
import os
import sys

## Define helper functions

In [2]:
def masked_rmse_np(preds, labels, null_val=np.nan):
    return np.sqrt(masked_mse_np(preds=preds, labels=labels, null_val=null_val))


def masked_mse_np(preds, labels, null_val=np.nan):
    with np.errstate(divide='ignore', invalid='ignore'):
        if np.isnan(null_val):
            mask = ~np.isnan(labels)
        else:
            mask = np.not_equal(labels, null_val)
        mask = mask.astype('float32')
        mask /= np.mean(mask)
        rmse = np.square(np.subtract(preds, labels)).astype('float32')
        rmse = np.nan_to_num(rmse * mask)
        return np.mean(rmse)


def masked_mae_np(preds, labels, null_val=np.nan):
    with np.errstate(divide='ignore', invalid='ignore'):
        if np.isnan(null_val):
            mask = ~np.isnan(labels)
        else:
            mask = np.not_equal(labels, null_val)
        mask = mask.astype('float32')
        mask /= np.mean(mask)
        mae = np.abs(np.subtract(preds, labels)).astype('float32')
        mae = np.nan_to_num(mae * mask)
        return np.mean(mae)


def masked_mape_np(preds, labels, null_val=np.nan):
    with np.errstate(divide='ignore', invalid='ignore'):
        if np.isnan(null_val):
            mask = ~np.isnan(labels)
        else:
            mask = np.not_equal(labels, null_val)
        mask = mask.astype('float32')
        mask /= np.mean(mask)
        mape = np.abs(np.divide(np.subtract(preds, labels).astype('float32'), labels))
        mape = np.nan_to_num(mask * mape)
        return np.mean(mape)

In [3]:
def get_logger(log_dir, name, log_filename='info.log', level=logging.INFO):
    logger = logging.getLogger(name)
    logger.setLevel(level)
    if (logger.hasHandlers()): 
        logger.handlers.clear() 
    # Add file handler and stdout handler
    formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
    file_handler = logging.FileHandler(os.path.join(log_dir, log_filename))
    file_handler.setFormatter(formatter)
    # Add console handler.
    console_formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
    console_handler = logging.StreamHandler(sys.stdout)
    console_handler.setFormatter(console_formatter)
    logger.addHandler(file_handler)
    logger.addHandler(console_handler)
    # Add google cloud log handler
    logger.info('Log directory: %s', log_dir)
    return logger

In [4]:
def static_predict(df, n_forward, test_ratio=9/24):
    test_num = int(round(df.shape[0] * test_ratio))
    y_val = df[-test_num:][0:3]
    y_predict_val = df.shift(n_forward).iloc[-test_num:][0:3]
    y_test = df[-test_num:][3:]
    y_predict_test = df.shift(n_forward).iloc[-test_num:][3:]
    return y_predict_val, y_val, y_predict_test, y_test

In [5]:
def eval_static(sales):
    logger.info('Static')
    horizons = [1]
    logger.info('\t'.join(['Model', 'Horizon', 'RMSE', 'MAPE', 'MAE']))
    for horizon in horizons:
        y_predict_val, y_val, y_predict_test, y_test = static_predict(sales, n_forward=horizon, test_ratio=9/24)
        val_rmse = masked_rmse_np(preds=y_predict_val, labels=y_val, null_val=0)
        val_mape = masked_mape_np(preds=y_predict_val, labels=y_val, null_val=0)
        val_mae = masked_mae_np(preds=y_predict_val, labels=y_val, null_val=0)
        test_rmse = masked_rmse_np(preds=y_predict_test, labels=y_test, null_val=0)
        test_mape = masked_mape_np(preds=y_predict_test, labels=y_test, null_val=0)
        test_mae = masked_mae_np(preds=y_predict_test, labels=y_test, null_val=0)
        line_val = 'Static\t%d\t%.2f\t%.2f\t%.2f' % (horizon, val_rmse, val_mape * 100, val_mae)
        line_test = 'Static\t%d\t%.2f\t%.2f\t%.2f' % (horizon, test_rmse, test_mape * 100, test_mae)
        logger.info(line_val)
        logger.info(line_test)

    return y_predict_val, y_val, y_predict_test, y_test

## Forecasting

When running the GTS model with unchanged data, the error message appears that the column with index 20 (Norrbotten-Sunderbyn) is constant. To eliminate this error message and thus enable the GTS model to run, we slightly change the data in column 20 by increasing some of the entries with value 0 to value 1. We do this in the column with index 20 for the rows with index 0, 3, 8, 14 and 17. Of course, for the sake of consistency, we use the slightly modified data for both the GTS model and the three baselines.

This modification of the data has minimal impact on the forecasts and the final metrics, but allows the GTS model to be run.

In [6]:
logger = get_logger('data/model', 'Baseline')
sales = pd.read_hdf("data/sales.h5")
sales[sales.columns[20]][0] = 1 
sales[sales.columns[20]][3] = 1 
sales[sales.columns[20]][8] = 1
sales[sales.columns[20]][14] = 1 
sales[sales.columns[20]][17] = 1 
y_predict_val_st, y_val, y_predict_test_st, y_test = eval_static(sales)

2022-07-27 09:43:10,451 - INFO - Log directory: data/model
2022-07-27 09:43:11,135 - INFO - Static
2022-07-27 09:43:11,136 - INFO - Model	Horizon	RMSE	MAPE	MAE
2022-07-27 09:43:11,153 - INFO - Static	1	142.18	65.45	77.53
2022-07-27 09:43:11,155 - INFO - Static	1	172.27	54.35	92.17


## Forecasts

We now take a closer look at the forecasts for both the validation set and the test set.

### Validation set

In [7]:
y_predict_val_st

territory,Blekinge,Blekinge ONCO,Dalarna,Dalarna ONCO,Gävleborg-Gävle,Gävleborg-Gävle ONCO,Halland-Halmstad,Halland-Halmstad ONCO,Halland-Varberg-Falkenberg,Jämtland,...,Västra Götaland-Göteborg ONCO,Västra Götaland-Lidköping,Västra Götaland-Skövde,Västra Götaland-SÄS ONCO,Västra Götaland-Uddevalla,Örebro-Örebro,Örebro-Örebro ONCO,Östergötland-Linköping,Östergötland-Linköping ONCO,Östergötland-Norrköping
15,120.0,229.0,35.0,185.0,21.0,850.0,85.0,146.0,57.0,14.0,...,2184.0,21.0,35.0,1031.0,85.0,85.0,309.0,80.0,571.0,0.0
16,142.0,149.0,42.0,90.0,21.0,381.0,120.0,161.0,21.0,21.0,...,2016.0,7.0,21.0,1294.0,170.0,43.0,185.0,66.0,631.0,14.0
17,142.0,144.0,42.0,38.0,64.0,382.0,120.0,0.0,85.0,14.0,...,1397.0,0.0,35.0,1205.0,170.0,128.0,426.0,94.0,510.0,0.0


In [8]:
y_predict_val_st.isnull().values.any()

False

In [9]:
y_val

territory,Blekinge,Blekinge ONCO,Dalarna,Dalarna ONCO,Gävleborg-Gävle,Gävleborg-Gävle ONCO,Halland-Halmstad,Halland-Halmstad ONCO,Halland-Varberg-Falkenberg,Jämtland,...,Västra Götaland-Göteborg ONCO,Västra Götaland-Lidköping,Västra Götaland-Skövde,Västra Götaland-SÄS ONCO,Västra Götaland-Uddevalla,Örebro-Örebro,Örebro-Örebro ONCO,Östergötland-Linköping,Östergötland-Linköping ONCO,Östergötland-Norrköping
15,142.0,149.0,42.0,90.0,21.0,381.0,120.0,161.0,21.0,21.0,...,2016.0,7.0,21.0,1294.0,170.0,43.0,185.0,66.0,631.0,14.0
16,142.0,144.0,42.0,38.0,64.0,382.0,120.0,0.0,85.0,14.0,...,1397.0,0.0,35.0,1205.0,170.0,128.0,426.0,94.0,510.0,0.0
17,106.0,90.0,7.0,96.0,43.0,221.0,21.0,108.0,64.0,35.0,...,1420.0,0.0,21.0,1038.0,85.0,64.0,112.0,84.0,567.0,21.0


Comparing the static forecasting values for the validation set in `y_predict_val_st` with the true values of the validation set in `y_val` yields that forecasting has been performed correctly, i.e., for each of the 57 regions, the values from the rows with indices 15 and 16 from the true values of the validation set are now the values from the rows with indices 16 and 17 from the static forecasting values for the validation set. The lagging by one time period, i.e., one month, has thus been perfomed correctly. 

There are no null values.

### Test set

In [10]:
y_predict_test_st

territory,Blekinge,Blekinge ONCO,Dalarna,Dalarna ONCO,Gävleborg-Gävle,Gävleborg-Gävle ONCO,Halland-Halmstad,Halland-Halmstad ONCO,Halland-Varberg-Falkenberg,Jämtland,...,Västra Götaland-Göteborg ONCO,Västra Götaland-Lidköping,Västra Götaland-Skövde,Västra Götaland-SÄS ONCO,Västra Götaland-Uddevalla,Örebro-Örebro,Örebro-Örebro ONCO,Östergötland-Linköping,Östergötland-Linköping ONCO,Östergötland-Norrköping
18,106.0,90.0,7.0,96.0,43.0,221.0,21.0,108.0,64.0,35.0,...,1420.0,0.0,21.0,1038.0,85.0,64.0,112.0,84.0,567.0,21.0
19,149.0,185.0,50.0,339.0,64.0,400.0,99.0,161.0,43.0,43.0,...,1909.0,28.0,28.0,1630.0,149.0,128.0,274.0,43.0,277.0,0.0
20,170.0,185.0,28.0,248.0,64.0,387.0,57.0,257.0,43.0,35.0,...,1796.0,21.0,35.0,903.0,213.0,149.0,200.0,35.0,603.0,0.0
21,142.0,145.0,14.0,217.0,94.0,199.0,71.0,19.0,30.0,43.0,...,1826.0,50.0,78.0,1281.0,149.0,106.0,216.0,35.0,555.0,0.0
22,248.0,248.0,64.0,138.0,85.0,181.0,123.0,232.0,21.0,64.0,...,2424.0,28.0,21.0,1351.0,201.0,106.0,515.0,78.0,689.0,0.0
23,135.0,421.0,50.0,242.0,64.0,415.0,0.0,90.0,21.0,64.0,...,3352.0,28.0,0.0,1219.0,137.0,64.0,229.0,99.0,695.0,0.0


In [11]:
y_predict_test_st.isnull().values.any()

False

In [12]:
y_test

territory,Blekinge,Blekinge ONCO,Dalarna,Dalarna ONCO,Gävleborg-Gävle,Gävleborg-Gävle ONCO,Halland-Halmstad,Halland-Halmstad ONCO,Halland-Varberg-Falkenberg,Jämtland,...,Västra Götaland-Göteborg ONCO,Västra Götaland-Lidköping,Västra Götaland-Skövde,Västra Götaland-SÄS ONCO,Västra Götaland-Uddevalla,Örebro-Örebro,Örebro-Örebro ONCO,Östergötland-Linköping,Östergötland-Linköping ONCO,Östergötland-Norrköping
18,149.0,185.0,50.0,339.0,64.0,400.0,99.0,161.0,43.0,43.0,...,1909.0,28.0,28.0,1630.0,149.0,128.0,274.0,43.0,277.0,0.0
19,170.0,185.0,28.0,248.0,64.0,387.0,57.0,257.0,43.0,35.0,...,1796.0,21.0,35.0,903.0,213.0,149.0,200.0,35.0,603.0,0.0
20,142.0,145.0,14.0,217.0,94.0,199.0,71.0,19.0,30.0,43.0,...,1826.0,50.0,78.0,1281.0,149.0,106.0,216.0,35.0,555.0,0.0
21,248.0,248.0,64.0,138.0,85.0,181.0,123.0,232.0,21.0,64.0,...,2424.0,28.0,21.0,1351.0,201.0,106.0,515.0,78.0,689.0,0.0
22,135.0,421.0,50.0,242.0,64.0,415.0,0.0,90.0,21.0,64.0,...,3352.0,28.0,0.0,1219.0,137.0,64.0,229.0,99.0,695.0,0.0
23,152.0,282.0,71.0,235.0,43.0,193.0,57.0,216.0,103.0,71.0,...,2222.0,28.0,14.0,1686.0,191.0,57.0,121.0,101.0,769.0,0.0


Comparing the static forecasting values for the test set in `y_predict_test_st` with the true values of the test set in `y_test` yields that forecasting has been performed correctly, i.e., for each of the 57 regions, the values from the rows with indices 18, 19, 20, 21 and 22 from the true values of the validation set are now the values from the rows with indices 19, 20, 21, 22 and 23 from the static forecasting values for the test set. The lagging by one time period, i.e., one month, has thus been perfomed correctly. 

There are no null values.

## Save forecasts

Before saving the forecasts, we extend the two data frames `y_predict_val_st` and `y_predict_test_st` by the columns `time` and `type`, indicating the respective month for which the forecasts were generated and the method with which they were generated (in this case: 'static').

In [13]:
# Create forecasts folder
route0 = "./forecasts"

if not os.path.exists(route0):
    os.mkdir(route0)

In [14]:
date_dict = dict({0: '09-2019', 1: '10-2019', 2: '11-2019', 3: '12-2019', 4: '01-2020', 5: '02-2020', 6: '03-2020', 
                  7: '04-2020', 8: '05-2020', 9: '06-2020', 10: '07-2020', 11: '08-2020', 12: '09-2020', 13: '10-2020',
                 14: '11-2020', 15: '12-2020', 16: '01-2021', 17: '02-2021', 18: '03-2021', 19: '04-2021', 20: '05-2021', 
                 21: '06-2021', 22: '07-2021', 23: '08-2021'})

### Validation set

In [15]:
y_predict_val_st['time'] = y_predict_val_st.index.map(date_dict)
y_predict_val_st['type'] = 'static'

In [16]:
print("saving file corresponding to y_predict_val_st.pkl")
y_predict_val_st.to_pickle(f"{route0}/y_predict_val_st.pkl")

saving file corresponding to y_predict_val_st.pkl


In [17]:
y_predict_val_st = pd.read_pickle(f"{route0}/y_predict_val_st.pkl") 
y_predict_val_st

territory,Blekinge,Blekinge ONCO,Dalarna,Dalarna ONCO,Gävleborg-Gävle,Gävleborg-Gävle ONCO,Halland-Halmstad,Halland-Halmstad ONCO,Halland-Varberg-Falkenberg,Jämtland,...,Västra Götaland-Skövde,Västra Götaland-SÄS ONCO,Västra Götaland-Uddevalla,Örebro-Örebro,Örebro-Örebro ONCO,Östergötland-Linköping,Östergötland-Linköping ONCO,Östergötland-Norrköping,time,type
15,120.0,229.0,35.0,185.0,21.0,850.0,85.0,146.0,57.0,14.0,...,35.0,1031.0,85.0,85.0,309.0,80.0,571.0,0.0,12-2020,static
16,142.0,149.0,42.0,90.0,21.0,381.0,120.0,161.0,21.0,21.0,...,21.0,1294.0,170.0,43.0,185.0,66.0,631.0,14.0,01-2021,static
17,142.0,144.0,42.0,38.0,64.0,382.0,120.0,0.0,85.0,14.0,...,35.0,1205.0,170.0,128.0,426.0,94.0,510.0,0.0,02-2021,static


### Test set

In [18]:
y_predict_test_st['time'] = y_predict_test_st.index.map(date_dict)
y_predict_test_st['type'] = 'static'

In [19]:
print("saving file corresponding to y_predict_test_st.pkl")
y_predict_test_st.to_pickle(f"{route0}/y_predict_test_st.pkl")

saving file corresponding to y_predict_test_st.pkl


In [20]:
y_predict_test_st = pd.read_pickle(f"{route0}/y_predict_test_st.pkl") 
y_predict_test_st

territory,Blekinge,Blekinge ONCO,Dalarna,Dalarna ONCO,Gävleborg-Gävle,Gävleborg-Gävle ONCO,Halland-Halmstad,Halland-Halmstad ONCO,Halland-Varberg-Falkenberg,Jämtland,...,Västra Götaland-Skövde,Västra Götaland-SÄS ONCO,Västra Götaland-Uddevalla,Örebro-Örebro,Örebro-Örebro ONCO,Östergötland-Linköping,Östergötland-Linköping ONCO,Östergötland-Norrköping,time,type
18,106.0,90.0,7.0,96.0,43.0,221.0,21.0,108.0,64.0,35.0,...,21.0,1038.0,85.0,64.0,112.0,84.0,567.0,21.0,03-2021,static
19,149.0,185.0,50.0,339.0,64.0,400.0,99.0,161.0,43.0,43.0,...,28.0,1630.0,149.0,128.0,274.0,43.0,277.0,0.0,04-2021,static
20,170.0,185.0,28.0,248.0,64.0,387.0,57.0,257.0,43.0,35.0,...,35.0,903.0,213.0,149.0,200.0,35.0,603.0,0.0,05-2021,static
21,142.0,145.0,14.0,217.0,94.0,199.0,71.0,19.0,30.0,43.0,...,78.0,1281.0,149.0,106.0,216.0,35.0,555.0,0.0,06-2021,static
22,248.0,248.0,64.0,138.0,85.0,181.0,123.0,232.0,21.0,64.0,...,21.0,1351.0,201.0,106.0,515.0,78.0,689.0,0.0,07-2021,static
23,135.0,421.0,50.0,242.0,64.0,415.0,0.0,90.0,21.0,64.0,...,0.0,1219.0,137.0,64.0,229.0,99.0,695.0,0.0,08-2021,static
