## 3.2 Results of benchmark

In this notebook, we will compare the performance of all models and their variants using this predictions CSV.

## 1. Environment configuration
To **import the Python modules that we have created locally in the project** we do the following:

In [1]:
import sys
from pathlib import Path

# path to the notebook
notebook_path = Path().resolve()

# path to the project root (path to ppf)
project_path = notebook_path.parents[0]

# path to the directory where is placed the source code of the project  (path to ppf/src/ppf/)
code_project_path = notebook_path.parents[0] / 'src'  / 'ppf'

# insert the paths where to search for python modules (import)
sys.path.insert(0, str(code_project_path))

# Output directory of benchmark
DIR = "../outputs"

## 2. Compare the performance of all models and their variants

### 2.1. Does the error of the models follow a normal distribution?

Next, we will evaluate whether the model's error values, measured as the deviation from the start (start_dev) and end (end_dev) dates, follow a normal distribution. To do this, we have developed a specific method that allows a model to be randomly selected from the model_name column of the dataset and generates graphs that facilitate this verification: a histogram, which visualizes the general shape of the error distribution, and a QQ-plot, which compares the data quantiles with those of a theoretical normal distribution, providing a visual assessment of its fit to normality.

In [2]:
from plot_random_model_distribution import plot_random_model_distribution

plot_random_model_distribution(csv_path=DIR + "/evaluations/mps_evaluation_percentage_2000-2022_-182_365_7_0.025_0.975.csv", columnas=["start_dev", "end_dev"], save_path= DIR + "/distribution/")

Modelo seleccionado aleatoriamente: chronos.conservative.bolt.small
Gráfico guardado en: ../outputs/distribution/chronos.conservative.bolt.small_start_dev.png
Gráfico guardado en: ../outputs/distribution/chronos.conservative.bolt.small_end_dev.png


**It does not follow a normal distribution**

### 2.1 Calculate Kruskal Wallis

In [3]:
from statistical_comparisons import kruskal_wallis_by_models

columnas_a_comparar = ["start_dev", "end_dev"]

res = kruskal_wallis_by_models(
    evaluation_path= DIR + "/evaluations/mps_evaluation_percentage_2000-2022_-182_365_7_0.025_0.975.csv",
    model_names_include=["moirai", "chronos", "nhits", "nbeats"],
    columnas_a_comparar=columnas_a_comparar,
    uses_covariates=True
)

for col in columnas_a_comparar:
    if res[col]["p_value"] is None:
        print(f"[{col}] No se pudo calcular: {res[col]['mensaje']}")
    elif res[col]["p_value"] < 0.05:
        print(f"[{col}] Diferencias significativas (p = {res[col]['p_value']:.4f})")
    else:
        print(f"[{col}] No hay diferencias significativas (p = {res[col]['p_value']:.4f})")


[start_dev] No hay diferencias significativas (p = 0.3076)
[end_dev] Diferencias significativas (p = 0.0000)


### 2.2 Generate boxplots

In [4]:
from test_all_models import generate_boxplots

generate_boxplots(evaluation_path= DIR + "/evaluations/mps_evaluation_percentage_2000-2022_-182_365_7_0.025_0.975.csv", output_dir= DIR + "/boxplots/")

NameError: name 'matplotlib' is not defined

### 2.3 Generate table to Mann-Whitney U with Holm-Bonferroni correction

In [None]:
from statistical_comparisons import compare_independent_mw

# === intra-model comparisons ===
compare_independent_mw(
    "../outputs/evaluations/mps_evaluation_percentage_2000-2022_-182_365_7_0.025_0.975.csv",
    ['chronos'],
    "../outputs/mannwhitneyuwithholmbonferroni/chronos_comparation.png",
)

compare_independent_mw(
    "../outputs/evaluations/mps_evaluation_percentage_2000-2022_-182_365_7_0.025_0.975.csv",
    ['moirai'],
    "../outputs/mannwhitneyuwithholmbonferroni/moirai_comparation.png",
)

compare_independent_mw(
    "../outputs/evaluations/mps_evaluation_percentage_2000-2022_-182_365_7_0.025_0.975.csv",
    ['nhits'],
    "../outputs/mannwhitneyuwithholmbonferroni/nhits_comparation.png",
)

compare_independent_mw(
    "../outputs/evaluations/mps_evaluation_percentage_2000-2022_-182_365_7_0.025_0.975.csv",
    ['nbeats'],
    "../outputs/mannwhitneyuwithholmbonferroni/nbeats_comparation.png",
)

# === inter-model comparison ===
compare_independent_mw(
    "../outputs/evaluations/mps_evaluation_percentage_2000-2022_-182_365_7_0.025_0.975.csv",
    ['chronos.auto.t5', 'moirai.deterministic.large', 'moirai.deterministic.small', 'moirai.stochastic.large', 'nhits', 'nbeats'],
    "../outputs/mannwhitneyuwithholmbonferroni/all_comparation.png",
)
