## ML

El concepto backtest es crucial cuando evaluamos una previsión, una estrategia de trading... Ya que tenemos que reproducir su comportamiento en el pasado. En este caso estamos en el mercado eléctrico diario
Antes de que se produzca la casación de OMIE, todos los sujectos de mercado deben saber la previsión diaría. Los traders pueden tener la información antes de que ocurra y actuar en consecuencia:

- Precio
- Demanda
- Generación de los productores

Una vez ocurre esto, se entraría en el mercado intradiario

Intentamos anticipar la demanda que va a ocurrir en el mercado diario antes de que ocurra.

### Walk-forward back testing method

- Lanzamos el código a las 11 de la mañana del dia d, la predición de d+1 basandose en los datos hasta d-1
- Pred days: YYYY MM DD 11:00 (momento en el que se lanza la predicción, todas las 11 de la mañana de cada día)
- Begin forecast: 2015 12 31 11:00h (día que se lanza la previsión para el día 1)
- End forecast: 2021 30 12 11:00 (día que se lanza la previsión para el último día)
- step: 1 día
- Bucle a recorrer: momento de la precisión (pasos del bucle, steps)-> Begin forecast-End forecast
- Training frequency: cada mes tenemos una tarea de entrenamiento del model (una vez sabemos el resultado de las medidas, entrenar, una vez al mes)

## Modelo sin lags
- Hacer las previsiones de golpe (asumimos que conocemos las temperaturas)

## Modelo con lags
- A las 11 de la mañana se han publicada la demanda real del día pasado
- La técnica de predit: predict with feedback (recursive)
- En cada pred date tenemos un gap de 11 horas desde el día pasado hasta la primera hora que querríamos predecir. La predición empieza a las 12 de la noche del día de predicción (predición pasada -1d hasta llegar al día d+1). Cada predicción se usa como medida para la siguiente.

In [44]:
# Ignore warnings ignore
import warnings
warnings.filterwarnings('ignore')

In [45]:
import pandas as pd
import numpy as np
import random as rd
from datetime import datetime, date,timedelta
# train test split
from sklearn.model_selection import train_test_split
# model
from xgboost import XGBRegressor
# error
from sklearn.metrics import mean_squared_error,r2_score,mean_absolute_error

In [46]:
df_electricity_demand=pd.read_csv("../../Data/Intermediate_Data/electricity_demand.csv")

## Defining back test

In [47]:
# Changing time format
df_electricity_demand['Time']=pd.to_datetime(df_electricity_demand['Time'], format="%Y-%m-%d %H:%M:%S")
df_electricity_demand

Unnamed: 0.1,Unnamed: 0,Time,Date,Year,Month,Day,Hour,Demand_MWh,Temp_K,Country_Bank_Holiday,Partial_Bank_Holiday,Partial_Bank_Holiday_Weight,Population
0,0,2015-01-01 00:00:00,2015-01-01,2015,1,1,0,24511.5000,272.368163,1.0,0.0,0.0,43249750.0
1,1,2015-01-01 01:00:00,2015-01-01,2015,1,1,1,22866.1667,272.047456,1.0,0.0,0.0,43249750.0
2,2,2015-01-01 02:00:00,2015-01-01,2015,1,1,2,21392.8333,271.796548,1.0,0.0,0.0,43249750.0
3,3,2015-01-01 03:00:00,2015-01-01,2015,1,1,3,20319.6667,271.602937,1.0,0.0,0.0,43249750.0
4,4,2015-01-01 04:00:00,2015-01-01,2015,1,1,4,19923.0000,271.459464,1.0,0.0,0.0,43249750.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
61363,61363,2021-12-31 19:00:00,2021-12-31,2021,12,31,19,27653.1667,281.005748,0.0,0.0,0.0,43869377.0
61364,61364,2021-12-31 20:00:00,2021-12-31,2021,12,31,20,26746.5000,280.474065,0.0,0.0,0.0,43869377.0
61365,61365,2021-12-31 21:00:00,2021-12-31,2021,12,31,21,23952.6667,279.770309,0.0,0.0,0.0,43869377.0
61366,61366,2021-12-31 22:00:00,2021-12-31,2021,12,31,22,22324.8333,279.171545,0.0,0.0,0.0,43869377.0


In [125]:
# Defining parameters
begin_training=datetime.strptime('2015-01-01 00:00:00', '%Y-%m-%d %H:%M:%S')
begin_forecast=datetime.strptime('2015-12-31 09:00:00', '%Y-%m-%d %H:%M:%S')
end_forecast=datetime.strptime('2021-12-30 09:00:00', '%Y-%m-%d %H:%M:%S')
end_forecast=datetime.strptime('2016-03-16 09:00:00', '%Y-%m-%d %H:%M:%S')
step=24 #hours
training_frequency=30 #days
predict_with_feedback=False

In [126]:
# Defining predict times
Pred_Dates = pd.DataFrame({"Pred_Date": pd.date_range(begin_forecast, end_forecast)})

In [127]:
def get_xgb_model(df,section):
    if section=='train':
        # Splitting train and test
        X=df[['Month','Day','Hour','Temp_K']]
        y=df['Demand_MWh']
        X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
        # 1. XGBRegressor 
        model_xgb=XGBRegressor(n_estimators=500,colsample_bylevel=1,colsample_bynode=1,
                         colsample_bytree=0.8,reg_alpha=1, reg_lambda=1,gamma=0,learning_rate=0.1, random_state=42)
        model_xgb.fit(X, y)
        model_xgb.save_model("../../Models/XGB_model.json")

    elif section=='predict':
        model_xgb = XGBRegressor()
        model_xgb.load_model("../../Models/XGB_model.json")
        X_test=df[['Month','Day','Hour','Temp_K']]
        predictions=model_xgb.predict(X_test)
        return predictions.tolist()

In [134]:
%%time
# Back test estructure
final_preds=pd.DataFrame()
for index, row in Pred_Dates.iterrows():
    index=index+1
    
    # train section
    if index % training_frequency == 0 or index==1:
        section='train'
        end_training=row['Pred_Date'].floor('d')-timedelta(hours = 1)
        df_training=df_electricity_demand[(df_electricity_demand['Time']>=begin_training)&\
                              (df_electricity_demand['Time']<=end_training)]
        df_training=df_training[['Time','Demand_MWh','Month','Day','Hour','Temp_K']]
        print('training ',begin_training,' - ',end_training)
        get_xgb_model(df_training,section)
        
    # predict section
    section='predict'
    begin_pred=row['Pred_Date'].ceil('d')
    end_pred=begin_pred+timedelta(days = 1)
    if predict_with_feedback:
        # TODO: recursive predition
        print(row['Pred_Date'],begin_pred,end_pred)
    else:
        df_predict=df_electricity_demand.drop(columns=['Demand_MWh'])
        df_predict=df_electricity_demand[(df_electricity_demand['Time']>=begin_pred)&\
                              (df_electricity_demand['Time']<end_pred)]
        df_predict=df_predict[['Time','Month','Day','Hour','Temp_K']]
        
        preds=get_xgb_model(df_predict,section)
        test_preds=pd.concat([pd.DataFrame(df_predict['Time'].tolist()),pd.DataFrame(preds)],axis=1,ignore_index=True)
        test_preds.columns = ['Time', 'Forecast']
        
    final_preds=final_preds.append(test_preds)
    
# Assessing (evaluation)
final_results=pd.merge(final_preds,df_electricity_demand[['Time','Demand_MWh']],on="Time",how="left")
rmse_val = mean_squared_error(final_results['Demand_MWh'], final_results['Forecast'])**0.5
mae_val=mean_absolute_error(final_results['Demand_MWh'], final_results['Forecast'])
mae_normalized=mae_val/final_results['Demand_MWh'].mean()*100

print('preditions: ',final_results)
print('rmse: ',rmse_val)
print('mae: ',mae_val)
print('mae normalized: ',mae_normalized, ' %')

training  2015-01-01 00:00:00  -  2015-12-30 23:00:00
training  2015-01-01 00:00:00  -  2016-01-28 23:00:00
training  2015-01-01 00:00:00  -  2016-02-27 23:00:00
preditions:                      Time      Forecast  Demand_MWh
0    2016-01-01 00:00:00  22441.255859  21745.1667
1    2016-01-01 01:00:00  20485.585938  20483.3333
2    2016-01-01 02:00:00  19857.810547  19246.3333
3    2016-01-01 03:00:00  19728.542969  18358.1667
4    2016-01-01 04:00:00  19393.937500  18057.3333
...                  ...           ...         ...
1843 2016-03-17 19:00:00  34421.816406  36240.8333
1844 2016-03-17 20:00:00  34127.117188  35709.0000
1845 2016-03-17 21:00:00  32282.003906  33281.6667
1846 2016-03-17 22:00:00  30026.855469  30355.1667
1847 2016-03-17 23:00:00  27598.593750  27994.0000

[1848 rows x 3 columns]
rmse:  2966.653406034985
mae:  2227.2812828175734
mae normalized:  7.506552315351951  %
CPU times: user 1min 24s, sys: 2.44 s, total: 1min 26s
Wall time: 11.5 s
