# Evaluation - Notebook

Within this document, we carry out an assessment of the implemented models. Furthermore, hyperparameter optimization is executed for each model wherever feasible. The following models are subjected to evaluation:

1. Baseline
2. Moving Average
3. Various variants of linear regression
4. Neural Network
5. XGBoost
6. LSTM

For a more detailed description of the models, please refeir to their corresponding section.

In [1]:
# TODO: look into pipeline framework for hyperparameter tuning
# ###: 1. try different window sizes for the models (where possible) (Part of hyperparameter tuning)
# ###: 2. discount rate for moving average
# ###: 3. alpha for lasso and ridge regression
# ###: 4. polynomial features for linear regression 
# ###: 5. Different Features (difficult)
# TODO: Rewrite loading.py file
# TODO: try the transfer learning approach
# TODO: implement command line interface
# TODO: implement logging
# TODO: Documentation 
# TODO: Hand-in

In [2]:
# imports
import numpy as np
import pandas as pd
from sklearn.pipeline import Pipeline

from sklearn.model_selection import TimeSeriesSplit



import sys

sys.path.append("..")
sys.path.append("../src")

# from src.DataHandling.processing import supervised_transform
import src.DataHandling.visualization as vis

# models
from src.Models.ma import MovingAverage
from src.Models.lr import Regression

# Transformers
from src.DataHandling.preprocessing import (
    # DataCleaner,
    SupervisedTransformer,
    train_test_split,
)

# hyperparameter tuning
from src.Models.selection import GridSearch

In [3]:
# load cleaned data
turbine_brit = pd.read_csv(f"../data/cleaned/turbine_brit_{2}.csv")
turbine_braz = pd.read_csv(f"../data/cleaned/turbine_braz_{1}.csv")

# use date column as index and convert to datetime
turbine_brit["Date"] = pd.to_datetime(turbine_brit["Date"])
turbine_brit.set_index("Date", inplace=True)

turbine_braz["Date"] = pd.to_datetime(turbine_braz["Date"])
turbine_braz.set_index("Date", inplace=True)

DATA = {"British": turbine_brit, "Brazilian": turbine_braz}

For understandibility.... models do transformation to a supervised learning problem inherently...

Before, we begin blablabla ... their are shared parameters across models like window_size etc.

In [4]:
benchmarks = pd.read_csv("../results_wind.csv")
test_start = benchmarks["test_start"][0]
test_end = benchmarks["test_end"][0]


X_train, y_train, X_test, y_test = train_test_split(
    test_start=test_start, test_end=test_end, df=turbine_brit, target_var="Power (kW)"
)

In [5]:
param_grid = {
    "st__horizon": [1, 6, 144],
    "st__window_size": [1, 6, 10],
}
pipeline = Pipeline(
    [
        ("st", SupervisedTransformer()),
        ("Model", MovingAverage()),
    ]
)

## 1. Moving Average

**Explanation of the Model:** 

In contrast to the Baseline model, the Moving Average model offers the flexibility of selecting a window size. This window size determines the number of preceding time steps taken into account for generating forecasts. Furthermore, there is an option to specify a discount factor (default = 1), which regulates the extent of reduction in influence for more distant past time steps. This strategy is similar to how a discount factor is used in calculating future rewards within the framework of reinforcement learning. Note that when a window size of one is chosen, the Moving Average model is identical with the Baseline model.

In [6]:
param_grid["Model__discount"] = np.linspace(0.1, 1., 10)

grid_search = GridSearch(pipeline, param_grid)
grid_search.fit(X_train, y_train, X_test, y_test)

In [7]:
grid_search.best_params

{'Model__discount': 0.2, 'st__horizon': 1, 'st__window_size': 10}

## 3. Different kinds of Linear Regression