# 5.- Future Forecasting

> Important source: https://www.kaggle.com/code/ahmedabdulhamid/recursive-multistep-time-series-forecasting

## Sequence Length 6 and Prediction Length 3

| Model | MAE | RMSE | sMAPE | rRMSE |
|-------|-----|------|-------|-------|
| Transformer | 0.6057 | 1.6401 | 57.1570 | 15.8035 |
| Autoformer | 0.7611 | 2.1966 | 58.9506 | 21.1662 |
| Reformer | 0.6895 | 2.1401 | 57.6583 | 20.6216 |


## Sequence Length 4 and Prediction Length 3

| Model | MAE | RMSE | sMAPE | rRMSE |
|-------|-----|------|-------|-------|
| Reformer | 0.5098 | 1.4642 | 56.0617 | 14.1087 |
| Transformer | 0.4832 | 1.2848 | 56.0317 | 12.3799 |
| Autoformer | 0.6769 | 1.9190 | 58.3377 | 18.4909 |

## Prediction Analysis

In [78]:
import torch 
import os 
import torch
import numpy as np
import pandas as pd
from io import StringIO

In [61]:
FOLDER = "../models/predictions/"
predictions = {}
for file in os.listdir(FOLDER):
    if file.endswith(".pt"):
        filepath = os.path.join(FOLDER, file)
        data = torch.load(filepath)
        data_np = data.numpy()
        predictions[file] = data_np
        print(f"File: {file}, Shape: {data_np.shape}")

File: future_predictions_seq6_pred3_Transformer.pt, Shape: (1, 6, 274)
File: future_predictions_seq4_pred3_Transformer.pt, Shape: (1, 6, 274)
File: future_predictions_seq6_pred3_Autoformer.pt, Shape: (1, 6, 274)
File: future_predictions_seq4_pred3_Autoformer.pt, Shape: (1, 6, 274)
File: future_predictions_seq6_pred3_Reformer.pt, Shape: (1, 6, 274)
File: future_predictions_seq4_pred3_Reformer.pt, Shape: (1, 6, 274)


At this point, we have generated **6-month** predictions for the three best performing models (**Transformer**, **Autoformer**, and **Reformer**) trained with `seq_len` = 6 and `pred_len` = 3, as well as with `seq_len` = 4 and `pred_len` = 3. We have six **tensors** where each tensor has the shape `(batches, pred_len, num_features)`.


### Join dataset with predictions

In [62]:
dataset = pd.read_csv("../data/green_skill_classification/data_for_timeseries.csv")
dataset.shape

(274, 14)

In [63]:
predictions = {"Transformer" : {
        "seq_len_6_pred_len_3": torch.load(os.path.join(FOLDER, "future_predictions_seq6_pred3_Transformer.pt")).numpy(),
        "seq_len_4_pred_len_3": torch.load(os.path.join(FOLDER, "future_predictions_seq4_pred3_Transformer.pt")).numpy()
    }, 
    "Autoformer" : {
        "seq_len_6_pred_len_3": torch.load(os.path.join(FOLDER, "future_predictions_seq6_pred3_Autoformer.pt")).numpy(),
        "seq_len_4_pred_len_3": torch.load(os.path.join(FOLDER, "future_predictions_seq4_pred3_Autoformer.pt")).numpy()
    },
    "Reformer" : {
        "seq_len_6_pred_len_3": torch.load(os.path.join(FOLDER, "future_predictions_seq6_pred3_Reformer.pt")).numpy(),
        "seq_len_4_pred_len_3": torch.load(os.path.join(FOLDER, "future_predictions_seq4_pred3_Reformer.pt")).numpy()
    }
}

predictions


{'Transformer': {'seq_len_6_pred_len_3': array([[[0.62443626, 0.77204406, 0.        , ..., 0.43774697,
           0.        , 0.        ],
          [0.16925685, 0.2686366 , 0.        , ..., 0.935793  ,
           2.0052707 , 1.5548707 ],
          [0.        , 0.4417882 , 0.00557862, ..., 0.39964044,
           1.147005  , 0.41140926],
          [0.68071425, 0.5803419 , 0.        , ..., 0.48165452,
           0.15721042, 0.        ],
          [0.15612577, 0.25465474, 0.        , ..., 0.9210227 ,
           2.1698883 , 1.3665144 ],
          [0.        , 0.5010111 , 0.02302299, ..., 0.59912235,
           1.4363145 , 0.7879764 ]]], shape=(1, 6, 274), dtype=float32),
  'seq_len_4_pred_len_3': array([[[0.58020693, 0.5535207 , 0.        , ..., 0.22261715,
           0.897269  , 0.        ],
          [0.49364397, 0.24448545, 0.        , ..., 1.1946288 ,
           2.8610911 , 1.9264534 ],
          [0.        , 0.61960006, 0.        , ..., 0.36615238,
           1.1885024 , 0.        ],


In [65]:
# region_id,skill_id,2024-07,2024-08,2024-09,2024-10,2024-11,2024-12,2025-01,2025-03,2025-04,2025-05,2025-06,2025-07
SAVE_ON = "../data/predictions/"

def create_future_dataframe() -> pd.DataFrame:
    new_dataframe = pd.DataFrame(columns=["region_id", "skill_id", "2024-07", "2024-08", "2024-09", "2024-10", "2024-11", "2024-12",
                                      "2025-01", "2025-03", "2025-04", "2025-05", "2025-06", "2025-07",
                                      "2025-08", "2025-09", "2025-10", "2025-11", "2025-12", "2026-01"])
    return new_dataframe

for model_name, model_preds in predictions.items():
    model_k = ""
    for model_key, preds in model_preds.items():
        model_k = model_key
        new_frame = create_future_dataframe()

        for feature_idx in range(0, preds.shape[2]):
            row = list(dataset.iloc[feature_idx])
            
            for month_idx in range(0, preds.shape[1]):
                row.append(round(float(preds[0, month_idx, feature_idx]), 1))

            new_frame.loc[feature_idx] = row
        output_filepath = os.path.join(SAVE_ON, f"future_predictions_{model_name}_{model_k}_6months.csv")
        new_frame.to_csv(output_filepath, index=False)
        print(f"Saved predictions to {output_filepath}")

Saved predictions to ../data/predictions/future_predictions_Transformer_seq_len_6_pred_len_3_6months.csv
Saved predictions to ../data/predictions/future_predictions_Transformer_seq_len_4_pred_len_3_6months.csv
Saved predictions to ../data/predictions/future_predictions_Autoformer_seq_len_6_pred_len_3_6months.csv
Saved predictions to ../data/predictions/future_predictions_Autoformer_seq_len_4_pred_len_3_6months.csv
Saved predictions to ../data/predictions/future_predictions_Reformer_seq_len_6_pred_len_3_6months.csv
Saved predictions to ../data/predictions/future_predictions_Reformer_seq_len_4_pred_len_3_6months.csv


# Growth rate calculation
Based on the predictions generated, we can analyze which skills are projected to have the highest fluctuations in the next **6-months**, for that, we could use simple statistical measures: 

\begin{equation*}
    \text{growth rate(\%)} = \frac{point_{end} - point_{start}}{point_{start}} \times 100
\end{equation*}
---

\begin{equation*}
    \text{growth} = \ln\left(\frac{point_{end} + \epsilon}{point_{start} + epsilon}\right)
\end{equation*} 

In [122]:
FOLDER = "../data/predictions/"

for file in os.listdir(FOLDER):
    if file.endswith(".csv"):
        filepath = os.path.join(FOLDER, file)
        data = pd.read_csv(filepath)
        data["growth_rate"] = np.where(
            data["2025-07"] == 0,
            np.nan,
            round(((data["2026-01"] - data["2025-07"]) / data["2025-07"]) * 100, 2)
        )

        EPS = 1e-6
        data["log_growth_rate"] = np.where(
            data["2025-07"] + EPS <= 0,
            np.nan,
            round(np.log((data["2026-01"] + EPS) / (data["2025-07"] + EPS)) * 100, 2)
        )

        data.sort_values(by="growth_rate", ascending=False, inplace=True)
        data.to_csv(filepath, index=False)


Analysis of highest growth rates per model and configuration.

In [109]:
FOLDER = "../data/predictions/"
MAPPING_PATH = "../data/green_skill_classification/mapping/map_skills.json"

def print_top_growth_rates(file_path: str, top_n: int = 5) -> None:
    data = pd.read_csv(file_path).head(top_n)

    with open(MAPPING_PATH, "r") as f:
        skill_map = pd.read_json(f.read(), typ='series').to_dict()
        data["esco_skill_name"] = data["skill_id"].map(skill_map)

    print(f"Top {top_n} growth rates for {os.path.basename(file_path)}:")
    print(data[["esco_skill_name", "skill_id", "growth_rate"]].head(top_n))
    print("\n")

for file in os.listdir(FOLDER):
    if file.endswith(".csv"):
        print_top_growth_rates(os.path.join(FOLDER, file))

Top 5 growth rates for future_predictions_Autoformer_seq_len_6_pred_len_3_6months.csv:
                                     esco_skill_name  skill_id  growth_rate
0  ensure responsible sourcing in food supply chains     124.0       2160.0
1    ensure efficient utilisation of warehouse space     123.0        702.5
2               identify new recycling opportunities     140.0        670.0
3  ensure compliance with environmental legislati...     120.0        490.0
4           recognise the hazards of dangerous goods     244.0        470.0


Top 5 growth rates for future_predictions_Transformer_seq_len_6_pred_len_3_6months.csv:
                                     esco_skill_name  skill_id  growth_rate
0  ensure responsible sourcing in food supply chains     124.0      1220.00
1              evaluate vehicle ecological footprint     128.0       720.00
2               identify new recycling opportunities     140.0       430.00
3  perform cleaning activities in an environmenta...     217.0 

  skill_map = pd.read_json(f.read(), typ='series').to_dict()
  skill_map = pd.read_json(f.read(), typ='series').to_dict()
  skill_map = pd.read_json(f.read(), typ='series').to_dict()
  skill_map = pd.read_json(f.read(), typ='series').to_dict()
  skill_map = pd.read_json(f.read(), typ='series').to_dict()
  skill_map = pd.read_json(f.read(), typ='series').to_dict()


## Divide in high-frequency and low-frequency skills

Based on **Job-SDF** claim: "*we define low-frequency skills as those that appear fewer than twice in the time slices of the training set*" 

In [None]:
def separate_high_low_frequency_skills(file_path: str, threshold: int = 2) -> pd.DataFrame:
    data = pd.read_csv(file_path)

    df_split = data.copy()
    df_split = df_split.iloc[:, 2:13]
    
    for index, row in df_split.iterrows():
        ctr = 0
        for value in row:
            if value > 0:
                ctr += 1
        if ctr >= threshold:
            df_split.at[index, 'frequency_type'] = 'high-frequency'
        else:
            df_split.at[index, 'frequency_type'] = 'low-frequency'

    return df_split



    2024-07  2024-08  2024-09  2024-10  2024-11  2024-12  2025-01  2025-03  \
0         1        0        3        1        2        2        1        1   
1         2        1        1        1        1        1        0        0   
2         0        0        0        0        1        1        0        0   
3         0        2        1        0        1        5        2        7   
4         2        0        0        0        1        1        0        2   
5         0        2        1        1        1        3        4        0   
6         0        0        0        1        2        2        0        1   
7         0        0        0        0        0        0        0        1   
8         3        3        2        0        1        5        3        4   
9         1        5        1        0        2        0        4        7   
10        0        1        1        0        0        0        0        0   
11        0        0        0        0        0        0        