## Desafio ENACOM - Bootcamp Seletivo Data Analytics 2024
Desenvolvimento de modelos preditivos para geração de energia elétrica por fonte no Brasil

Desenvolvido por: Gustavo Basílio Lima

Import das bibliotecas utilizadas ao decorrer do notebook

In [1]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from sklearn.preprocessing import MinMaxScaler
from sklearn.multioutput import MultiOutputRegressor
from sklearn.experimental import enable_halving_search_cv
from sklearn.model_selection import HalvingGridSearchCV
from sklearn.metrics import mean_squared_error
from sklearn.metrics import root_mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_absolute_percentage_error
from sklearn.metrics import median_absolute_error
from utils import functions as f

import warnings
warnings.filterwarnings('ignore')

## Leitura dos dados de geração por fontes de energia

In [2]:
df = pd.read_csv(r'..\dados\segmentados\df_val_geracao_by_macro_resource.csv', index_col=0)
df

Unnamed: 0_level_0,val_geracao_renovavel,val_geracao_nao_renovavel
din_instante,Unnamed: 1_level_1,Unnamed: 2_level_1
2000-01,27.72,1.79
2000-02,26.85,1.78
2000-03,28.76,1.66
2000-04,27.82,1.40
2000-05,28.80,1.32
...,...,...
2021-08,34.50,14.58
2021-09,35.42,13.93
2021-10,34.29,14.74
2021-11,37.17,12.96


## Verificação de dados faltantes no dataframe

In [3]:
df.describe()

Unnamed: 0,val_geracao_renovavel,val_geracao_nao_renovavel
count,264.0,264.0
mean,33.787462,5.372348
std,4.846072,3.317938
min,20.58,1.32
25%,30.435,2.65
50%,34.095,4.01
75%,36.835,7.9325
max,47.39,14.74


## Separação dos dados para treino e validação

Antes do treinamento dos modelos preditivos, faz-se necessário a separação dos dados para treino (do ano de 2000 até 2018) e validação (entre os anos de 2019 e 2020).

Além disso, são definidos as janelas de entrada e saída de dados, para este caso, o modelo recebe a quantidade de dados referentes à 12 meses, para prever o mês seguinte.

Com essa definição, os dados são reagrupados em listas de arrays, com as features (x) e targets (y) arranjados em janelas que se movem ao passo do tempo de previsão desejado.

Por fim, para se ter um melhor desempenho dos modelos, esses dados são normalizados, pois assim evitam que valores com escalas maiores dominem o processo de aprendizagem.

In [4]:
x_train, y_train, x_val, y_val = [], [], [], []
#Separação dos dados de treino e validação
train_dataset = df[:'2018-12'].values
val_dataset   = df['2019-01':'2020-12'].values
#Normalização dos dados
x_train_scaler, y_train_scaler = MinMaxScaler(), MinMaxScaler()
x_val_scaler, y_val_scaler = MinMaxScaler(), MinMaxScaler()

n_future = 1  #1 mês para frente de previsão
n_past   = 12 #12 meses para trás

#Separação dos features e targets pro treino
for i in range(0, len(train_dataset) - n_past - n_future + 1):
    x_train.append(train_dataset[i : i + n_past, : ])
    y_train.append(train_dataset[i + n_past : i + n_past + n_future, :])
x_train , y_train = np.array(x_train), np.array(y_train)
x_train = x_train.reshape((x_train.shape[0], x_train.shape[1]*x_train.shape[2]))
y_train = y_train.reshape((y_train.shape[0], y_train.shape[1]*y_train.shape[2]))
#Separação dos features e targets pra validação
for i in range(0, len(val_dataset) - n_past - n_future + 1):
    x_val.append(val_dataset[i : i + n_past, : ])
    y_val.append(val_dataset[i + n_past : i + n_past + n_future, : ])
x_val , y_val = np.array(x_val), np.array(y_val)
x_val = x_val.reshape((x_val.shape[0], x_val.shape[1]*x_val.shape[2]))
y_val = y_val.reshape((y_val.shape[0], y_val.shape[1]*y_val.shape[2]))

x_train_normalized = x_train_scaler.fit_transform(x_train)
y_train_normalized = y_train_scaler.fit_transform(y_train)
x_val_normalized = x_val_scaler.fit_transform(x_val)
y_val_normalized = y_val_scaler.fit_transform(y_val)

## Técnicas de previsão selecionadas

Com o objetivo de buscar um bom modelo preditivo, são selecionadas algumas técnicas já conhecidas pela literatura, onde são varridos os parâmetro com melhor desempenho. Sendo elas:
1. Linear Regression: técnica clássica que modela a relação entre uma variável dependente e uma ou mais variáveis independentes por meio de uma equação linear, ideal para entender a relação linear entre os dados;
2. Ridge Regression: extensão da regressão linear que adiciona um termo de regularização L2 à função de perda, útil para lidar com multicolinearidade e prevenir overfitting nos dados;
3. Lasso Regression: similar à Ridge Regression, mas utiliza regularização L1, promovendo a esparsidade nos coeficientes do modelo, o que pode ser útil para seleção de características em conjuntos de dados com muitas variáveis;
4. ElasticNet Regression: combina os termos de regularização L1 e L2, oferecendo uma flexibilidade entre a esparsidade promovida pelo Lasso e as propriedades de estabilidade da Ridge Regression;
5. Support Vector Regression: técnica que utiliza vetores de suporte para construir um hiperplano de regressão, adequado para problemas com complexidade não linear e robustez a outliers;
6. k-Nearest Neighbors (kNN): método simples de aprendizado baseado em instância que prevê o valor de uma observação baseado na média dos valores de k vizinhos mais próximos no espaço de características;
7. Random Forest: modelo de conjunto que combina múltiplas árvores de decisão para realizar regressão, reduzindo o overfitting e capturando relações não lineares e interações entre as variáveis;
8. Gradient Boosting Regressor: técnica de aprendizado de máquina que constrói um modelo aditivo de forma iterativa, onde cada novo modelo corrige os erros dos modelos anteriores, resultando em uma predição mais precisa;
9. Decision Tree Regressor: modelo que divide o espaço de características em regiões retangulares e atribui a cada região uma previsão constante, tornando-o interpretable e robusto a outliers;
10. Gaussian Process Regressor: método bayesiano que modela a distribuição de probabilidade sobre funções possíveis, permitindo quantificar a incerteza nas previsões e fornecendo estimativas não lineares;
11. Multi-Layer Perceptron (MLP): arquitetura de rede neural artificial composta por múltiplas camadas de neurônios, capaz de aprender relações complexas entre os dados através de treinamento supervisionado.

Para cada modelo a ser desenvolvido, os blocos de códigos representam:
1. Varredura dos hiperparâmetros de acordo com a função HalvinGridSearchCV();
2. Treinamento do modelo com os melhores hiperparâmetros encontrados e bbtençãos dos valores em escala normal da previsão, antes normalizados;
3. Avaliação das métricas de desempenho dos modelos baseados nos dados de validação;
4. Plot da comparação entre os valores reais e previstos.

### Linear Regression

In [5]:
from sklearn.linear_model import LinearRegression

lin_reg_search = HalvingGridSearchCV(LinearRegression(), param_grid={'fit_intercept': [True, False],
                                                                     'copy_X': [True, False],
                                                                     'n_jobs': [-1, 1, 2, 4, 8]})
lin_reg_search.fit(x_train_normalized, y_train_normalized)
lin_reg_search.best_params_

{'copy_X': True, 'fit_intercept': True, 'n_jobs': 2}

In [6]:
lin_reg = LinearRegression(copy_X=lin_reg_search.best_params_['copy_X'],
                           fit_intercept=lin_reg_search.best_params_['fit_intercept'],
                           n_jobs=lin_reg_search.best_params_['n_jobs'])
lin_reg.fit(x_train_normalized, y_train_normalized)
y_pred_lr_normalized= lin_reg.predict(x_val_normalized)
y_pred_lr = y_val_scaler.inverse_transform(y_pred_lr_normalized)
y_pred_lr

array([[40.67049119,  7.87293077],
       [39.44111608,  7.72025601],
       [43.13905575,  4.74047888],
       [40.81166692,  5.28574458],
       [38.82088186,  5.68461045],
       [35.52385777,  5.97858769],
       [35.89195666,  5.65156003],
       [38.23767853,  5.33481719],
       [39.02528082,  5.40408971],
       [41.6706038 ,  5.552191  ],
       [39.1916401 ,  9.0437363 ],
       [39.38730965,  8.42568442]])

In [7]:
metrics_lr = f.get_metrics(y_val, y_pred_lr)
metrics_lr

{'MSE': 4.552809421436291,
 'RMSE': 2.0903721884195074,
 'MAE': 1.7632212422574143,
 'MAPE': 0.129145281600198,
 'MedE': 1.6605117561526292}

In [8]:
f.plot_results(df, y_val, y_pred_lr, 'Linear Regression')

### Ridge Regression

In [9]:
from sklearn.linear_model import Ridge

ridge_search = HalvingGridSearchCV(Ridge(), param_grid={'alpha': [0.1, 1, 10, 100],
                                                        'fit_intercept': [True, False],
                                                        'copy_X': [True, False],
                                                        'max_iter': [None, 100, 1000, 10000],
                                                        'tol': [0.001, 0.0001, 0.00001],
                                                        'solver': ['auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga']})
ridge_search.fit(x_train_normalized, y_train_normalized)
ridge_search.best_params_

{'alpha': 0.1,
 'copy_X': True,
 'fit_intercept': False,
 'max_iter': None,
 'solver': 'sparse_cg',
 'tol': 1e-05}

In [10]:
ridge = Ridge(alpha=ridge_search.best_params_['alpha'],
              copy_X=ridge_search.best_params_['copy_X'],
              fit_intercept=ridge_search.best_params_['fit_intercept'],
              max_iter=ridge_search.best_params_['max_iter'],
              solver=ridge_search.best_params_['solver'],
              tol=ridge_search.best_params_['tol'])
ridge.fit(x_train_normalized, y_train_normalized)
y_pred_ridge_normalized = ridge.predict(x_val_normalized)
y_pred_ridge = y_val_scaler.inverse_transform(y_pred_ridge_normalized)
y_pred_ridge

array([[40.64725419,  7.95438076],
       [39.55109373,  7.70223413],
       [42.72446417,  5.29254972],
       [40.87260163,  5.14951277],
       [38.81381637,  6.07706001],
       [35.74129408,  6.03599492],
       [35.88717595,  5.92609138],
       [38.24874374,  5.33865778],
       [38.82408781,  5.49759242],
       [41.68492175,  5.44849989],
       [39.41870618,  8.63152801],
       [39.32149301,  8.68122545]])

In [11]:
metrics_ridge = f.get_metrics(y_val, y_pred_ridge)
metrics_ridge

{'MSE': 4.776473064995479,
 'RMSE': 2.1431228773740636,
 'MAE': 1.7977660350938467,
 'MAPE': 0.13229583852632393,
 'MedE': 1.6730041939000064}

In [12]:
f.plot_results(df, y_val, y_pred_ridge, 'Ridge Regression')

### Lasso Regression

In [13]:
from sklearn.linear_model import Lasso

lasso_search = HalvingGridSearchCV(Lasso(), param_grid={'alpha': [0.1, 1, 10, 100],
                                                        'fit_intercept': [True, False],
                                                        'precompute': [True, False],
                                                        'copy_X': [True, False],
                                                        'max_iter': [None, 100, 1000, 10000],
                                                        'tol': [0.001, 0.0001, 0.00001],
                                                        'warm_start': [True, False],
                                                        'selection': ['cyclic', 'random']})
lasso_search.fit(x_train_normalized, y_train_normalized)
lasso_search.best_params_

{'alpha': 0.1,
 'copy_X': True,
 'fit_intercept': False,
 'max_iter': 10000,
 'precompute': True,
 'selection': 'random',
 'tol': 0.001,
 'warm_start': False}

In [14]:
lasso = Lasso(alpha=lasso_search.best_params_['alpha'],
              fit_intercept=lasso_search.best_params_['fit_intercept'],
              precompute=lasso_search.best_params_['precompute'],
              copy_X=lasso_search.best_params_['copy_X'],
              max_iter=lasso_search.best_params_['max_iter'],
              tol=lasso_search.best_params_['tol'],
              warm_start=lasso_search.best_params_['warm_start'],
              selection=lasso_search.best_params_['selection'])
lasso.fit(x_train_normalized, y_train_normalized)
y_pred_lasso_normalized = lasso.predict(x_val_normalized)
y_pred_lasso = y_val_scaler.inverse_transform(y_pred_lasso_normalized)
y_pred_lasso

array([[39.01330056,  6.33254512],
       [38.90755635,  6.12570287],
       [39.60832471,  5.45797356],
       [39.98977023,  4.91585663],
       [37.63247945,  4.56783879],
       [36.35595256,  4.42072488],
       [36.10494479,  4.42220033],
       [37.43447815,  4.38147691],
       [37.98314207,  4.43006393],
       [39.09078045,  4.77685066],
       [38.5404102 ,  5.98197315],
       [36.91699175,  6.39638637]])

In [15]:

metrics_lasso = f.get_metrics(y_val, y_pred_lasso)
metrics_lasso

{'MSE': 7.136433310901253,
 'RMSE': 2.6635549029394583,
 'MAE': 2.0872378832270604,
 'MAPE': 0.12270048263053143,
 'MedE': 1.59782825188033}

In [16]:
f.plot_results(df, y_val, y_pred_lasso, 'Lasso Regression')

### ElasticNet Regression

In [17]:
from sklearn.linear_model import ElasticNet

ela_net_search = HalvingGridSearchCV(ElasticNet(), param_grid={'alpha': [0.1, 1, 10, 100],
                                                               'l1_ratio': [0, 0.25, 0.5, 0.75, 1],
                                                               'fit_intercept': [True, False],
                                                               'precompute': [True, False],
                                                               'copy_X': [True, False],
                                                               'max_iter': [None, 100, 1000, 10000],
                                                               'warm_start': [True, False],
                                                               'selection': ['cyclic', 'random']})
ela_net_search.fit(x_train_normalized, y_train_normalized)
ela_net_search.best_params_

{'alpha': 0.1,
 'copy_X': True,
 'fit_intercept': False,
 'l1_ratio': 0,
 'max_iter': 1000,
 'precompute': False,
 'selection': 'cyclic',
 'warm_start': True}

In [18]:
ela_net = ElasticNet(alpha=ela_net_search.best_params_['alpha'],
                     l1_ratio=ela_net_search.best_params_['l1_ratio'],
                     fit_intercept=ela_net_search.best_params_['fit_intercept'],
                     precompute=ela_net_search.best_params_['precompute'],
                     copy_X=ela_net_search.best_params_['copy_X'],
                     max_iter=ela_net_search.best_params_['max_iter'],
                     warm_start=ela_net_search.best_params_['warm_start'],
                     selection=ela_net_search.best_params_['selection'])
ela_net.fit(x_train_normalized, y_train_normalized)
y_pred_ela_net_normalized = ela_net.predict(x_val_normalized)
y_pred_ela_net = y_val_scaler.inverse_transform(y_pred_ela_net_normalized)
y_pred_ela_net

array([[39.28326837,  7.87180952],
       [39.21645113,  7.88696286],
       [39.65579294,  7.50043731],
       [39.81274749,  7.04557983],
       [39.29853607,  6.96496874],
       [38.61197768,  6.94745934],
       [38.36379152,  6.97989228],
       [38.76611469,  6.68334529],
       [39.09950104,  6.38617512],
       [39.76549435,  6.12196131],
       [39.67830456,  6.48905312],
       [39.21131429,  7.05404718]])

In [19]:
metrics_ela_net = f.get_metrics(y_val, y_pred_ela_net)
metrics_ela_net

{'MSE': 6.608155494120577,
 'RMSE': 2.5706202332443304,
 'MAE': 2.3733506533622144,
 'MAPE': 0.22330720955602418,
 'MedE': 2.158921635414325}

In [20]:
f.plot_results(df, y_val, y_pred_ela_net, 'ElasticNet Regression')

### Support Vector Machine for Regression (SVR)

In [21]:
from sklearn.svm import SVR

svr_search = HalvingGridSearchCV(MultiOutputRegressor(SVR()), param_grid={'estimator__kernel': ['linear', 'poly', 'rbf', 'sigmoid', 'precomputed'],
                                                                          'estimator__gamma': ['scale', 'auto']})
svr_search.fit(x_train_normalized, y_train_normalized)
svr_search.best_params_

{'estimator__gamma': 'scale', 'estimator__kernel': 'linear'}

In [22]:
svr = SVR(kernel=svr_search.best_params_['estimator__kernel'],
          gamma=svr_search.best_params_['estimator__gamma'])
svr_reg = MultiOutputRegressor(svr)
svr_reg.fit(x_train_normalized, y_train_normalized)
y_pred_svr_normalized = ela_net.predict(x_val_normalized)
y_pred_svr = y_val_scaler.inverse_transform(y_pred_svr_normalized)
y_pred_svr

array([[39.28326837,  7.87180952],
       [39.21645113,  7.88696286],
       [39.65579294,  7.50043731],
       [39.81274749,  7.04557983],
       [39.29853607,  6.96496874],
       [38.61197768,  6.94745934],
       [38.36379152,  6.97989228],
       [38.76611469,  6.68334529],
       [39.09950104,  6.38617512],
       [39.76549435,  6.12196131],
       [39.67830456,  6.48905312],
       [39.21131429,  7.05404718]])

In [23]:
metrics_svr = f.get_metrics(y_val, y_pred_ela_net)
metrics_svr

{'MSE': 6.608155494120577,
 'RMSE': 2.5706202332443304,
 'MAE': 2.3733506533622144,
 'MAPE': 0.22330720955602418,
 'MedE': 2.158921635414325}

In [24]:
f.plot_results(df, y_val, y_pred_svr, 'Support Vector Regression')

### k-Nearest Neighbors Regressor (kNN)

In [25]:
from sklearn.neighbors import KNeighborsRegressor

knn_search = HalvingGridSearchCV(MultiOutputRegressor(KNeighborsRegressor()), param_grid={'estimator__n_neighbors': [1, 5, 10],
                                                                                          'estimator__algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
                                                                                          'estimator__n_jobs': [-1, 1, 2, 4, 8]})
knn_search.fit(x_train_normalized, y_train_normalized)
knn_search.best_params_

{'estimator__algorithm': 'ball_tree',
 'estimator__n_jobs': 4,
 'estimator__n_neighbors': 5}

In [26]:
knn = KNeighborsRegressor(n_neighbors=knn_search.best_params_['estimator__n_neighbors'],
                          algorithm=knn_search.best_params_['estimator__algorithm'],
                          n_jobs=knn_search.best_params_['estimator__n_jobs'])
knn_reg = MultiOutputRegressor(knn)
knn_reg.fit(x_train_normalized,y_train_normalized)
y_pred_knn_normalized = knn_reg.predict(x_val_normalized)
y_pred_knn = y_val_scaler.inverse_transform(y_pred_knn_normalized)
y_pred_knn

array([[40.87888124,  8.16860486],
       [40.54359725,  8.45879388],
       [41.18524957,  7.44664266],
       [41.29609294,  7.42441044],
       [40.85685026,  7.44430243],
       [40.86166954,  7.55312331],
       [40.95392427,  7.58120612],
       [41.05994836,  7.4969577 ],
       [40.92982788,  7.81756976],
       [41.16459552,  7.48876688],
       [40.52913941,  6.2730153 ],
       [40.59729776,  6.06356436]])

In [27]:
metrics_knn = f.get_metrics(y_val, y_pred_knn)
metrics_knn

{'MSE': 8.315779308828024,
 'RMSE': 2.8832034196709406,
 'MAE': 2.5412923766559103,
 'MAPE': 0.2628588443605919,
 'MedE': 2.064387721904721}

In [28]:
f.plot_results(df, y_val, y_pred_knn, 'kNN')

### Random Forest

In [29]:
from sklearn.ensemble import RandomForestRegressor

rf_search = HalvingGridSearchCV(MultiOutputRegressor(RandomForestRegressor()), param_grid={'estimator__n_estimators': [100, 200, 300],
                                                                                           'estimator__criterion': ['squared_error', 'absolute_error', 'friedman_mse', 'poisson'],
                                                                                           'estimator__bootstrap': [True, False],
                                                                                           'estimator__n_jobs': [-1, 1, 2, 4, 8]})
rf_search.fit(x_train_normalized, y_train_normalized)
rf_search.best_params_

{'estimator__bootstrap': True,
 'estimator__criterion': 'absolute_error',
 'estimator__n_estimators': 300,
 'estimator__n_jobs': 8}

In [30]:
rf = RandomForestRegressor(n_estimators=rf_search.best_params_['estimator__n_estimators'],
                           criterion=rf_search.best_params_['estimator__criterion'],
                           bootstrap=rf_search.best_params_['estimator__bootstrap'],
                           n_jobs=rf_search.best_params_['estimator__n_jobs'])
rf_reg = MultiOutputRegressor(rf)
rf_reg.fit(x_train_normalized, y_train_normalized)
y_pred_rf_normalized = rf_reg.predict(x_val_normalized)
y_pred_rf = y_val_scaler.inverse_transform(y_pred_rf_normalized)
y_pred_rf

array([[40.63286862,  9.17837684],
       [38.95573723,  9.34014551],
       [41.47886976,  5.97024752],
       [39.87288009,  5.7149865 ],
       [39.91838784,  4.82259676],
       [37.5634825 ,  4.86119112],
       [37.37111302,  4.74689019],
       [38.54461847,  4.7339604 ],
       [38.29805508,  4.81294329],
       [39.19369478,  4.91267627],
       [39.07654045,  8.81569907],
       [38.3936833 ,  9.42127363]])

In [31]:
metrics_rf = f.get_metrics(y_val, y_pred_rf)
metrics_rf

{'MSE': 4.603278684686634,
 'RMSE': 2.1137145725180266,
 'MAE': 1.6768384895718487,
 'MAPE': 0.10440029417271356,
 'MedE': 1.4731326785862722}

In [32]:
f.plot_results(df, y_val, y_pred_rf, 'Random Forest')

### Gradient Boosting Regressor

In [33]:
from sklearn.ensemble import GradientBoostingRegressor

gbr_search = HalvingGridSearchCV(MultiOutputRegressor(GradientBoostingRegressor()), param_grid={'estimator__loss': ['ls', 'lad', 'huber', 'quantile'],
                                                                                                'estimator__learning_rate': [0.001, 0.01, 0.1],
                                                                                                'estimator__n_estimators': [100, 200, 300],
                                                                                                'estimator__criterion': ['friedman_mse', 'squared_error']})
gbr_search.fit(x_train_normalized, y_train_normalized)
gbr_search.best_params_

{'estimator__criterion': 'friedman_mse',
 'estimator__learning_rate': 0.1,
 'estimator__loss': 'huber',
 'estimator__n_estimators': 300}

In [34]:
gbr = GradientBoostingRegressor(loss=gbr_search.best_params_['estimator__loss'],
                                learning_rate=gbr_search.best_params_['estimator__learning_rate'],
                                n_estimators=gbr_search.best_params_['estimator__n_estimators'],
                                criterion=gbr_search.best_params_['estimator__criterion'])
gbr_reg = MultiOutputRegressor(gbr)
gbr_reg.fit(x_train_normalized, y_train_normalized)
y_pred_gbr_normalized = gbr_reg.predict(x_val_normalized)
y_pred_gbr = y_val_scaler.inverse_transform(y_pred_gbr_normalized)
y_pred_gbr

array([[40.20884549,  9.24620375],
       [38.51131657,  8.48732079],
       [41.06459312,  5.86570604],
       [38.68948788,  5.3126281 ],
       [39.10608762,  5.01269849],
       [36.89819689,  4.7964739 ],
       [36.99469131,  4.86095257],
       [38.01624706,  5.02301602],
       [37.9353334 ,  5.32483743],
       [40.37695668,  5.46844419],
       [39.20829382,  8.48784376],
       [39.30255538,  8.82166   ]])

In [35]:
metrics_gbr = f.get_metrics(y_val, y_pred_gbr)
metrics_gbr

{'MSE': 4.342927691962399,
 'RMSE': 2.03421331710246,
 'MAE': 1.6182118829596206,
 'MAPE': 0.1015017395970658,
 'MedE': 1.4493265249680616}

In [36]:
f.plot_results(df, y_val, y_pred_gbr, 'Gradient Boosting Regressor')

### Decision Tree Regressor

In [37]:
from sklearn.tree import DecisionTreeRegressor

dtr_search = HalvingGridSearchCV(MultiOutputRegressor(DecisionTreeRegressor()), param_grid={'estimator__criterion': ['squared_error', 'absolute_error', 'friedman_mse', 'poisson'],
                                                                                            'estimator__splitter': ['best', 'random'],
                                                                                            'estimator__max_depth': [None, 10, 100, 1000],
                                                                                            'estimator__max_leaf_nodes': [None, 10, 100, 1000]})
dtr_search.fit(x_train_normalized, y_train_normalized)
dtr_search.best_params_

{'estimator__criterion': 'squared_error',
 'estimator__max_depth': 10,
 'estimator__max_leaf_nodes': 1000,
 'estimator__splitter': 'best'}

In [38]:
dtr = DecisionTreeRegressor(criterion=dtr_search.best_params_['estimator__criterion'],
                            splitter=dtr_search.best_params_['estimator__splitter'],
                            max_depth=dtr_search.best_params_['estimator__max_depth'],
                            max_leaf_nodes=dtr_search.best_params_['estimator__max_leaf_nodes'])
dtr_reg = MultiOutputRegressor(dtr)
dtr_reg.fit(x_train_normalized, y_train_normalized)
y_pred_dtr_normalized = dtr_reg.predict(x_val_normalized)
y_pred_dtr = y_val_scaler.inverse_transform(y_pred_dtr_normalized)
y_pred_dtr

array([[41.50951807,  8.57229523],
       [38.72810671,  7.96383438],
       [41.69540448,  6.06239424],
       [37.75736661,  6.16770477],
       [40.39075731,  4.88057606],
       [37.51640275,  4.95663366],
       [38.51123924,  4.48273627],
       [39.24790017,  4.81036904],
       [37.88473322,  4.48273627],
       [38.27027539,  4.48273627],
       [37.88473322,  8.87652565],
       [36.32879518,  9.8360216 ]])

In [39]:
metrics_dtr = f.get_metrics(y_val, y_pred_dtr)
metrics_dtr

{'MSE': 4.211198309436517,
 'RMSE': 2.0274059463891083,
 'MAE': 1.5785460357567598,
 'MAPE': 0.09989490420099202,
 'MedE': 1.3301731937393393}

In [51]:
f.plot_results(df, y_val, y_pred_dtr, 'Decision Tree Regressor')

### Gaussian Process Regressor

In [41]:
from sklearn.gaussian_process import GaussianProcessRegressor

gpr_search = HalvingGridSearchCV(MultiOutputRegressor(GaussianProcessRegressor()), param_grid={'estimator__alpha': [1e-10, 1e-5, 1e-2, 1, 10],
                                                                                               'estimator__optimizer': ['fmin_l_bfgs_b', None],
                                                                                               'estimator__n_restarts_optimizer': [0, 1, 5, 10, 50]})
gpr_search.fit(x_train_normalized, y_train_normalized)
gpr_search.best_params_

{'estimator__alpha': 0.01,
 'estimator__n_restarts_optimizer': 0,
 'estimator__optimizer': 'fmin_l_bfgs_b'}

In [42]:
gpr = GaussianProcessRegressor(alpha=gpr_search.best_params_['estimator__alpha'],
                               optimizer=gpr_search.best_params_['estimator__optimizer'],
                               n_restarts_optimizer=gpr_search.best_params_['estimator__n_restarts_optimizer'])
gpr_reg = MultiOutputRegressor(gpr)
gpr_reg.fit(x_train_normalized, y_train_normalized)
y_pred_gpr_normalized = gpr_reg.predict(x_val_normalized)
y_pred_gpr = y_val_scaler.inverse_transform(y_pred_gpr_normalized)
y_pred_gpr

array([[37.69728106,  5.06660664],
       [35.7919036 ,  5.6830943 ],
       [37.52704   ,  5.75003049],
       [37.00809074,  4.7202998 ],
       [37.32785182,  4.42331375],
       [37.27792809,  4.828693  ],
       [38.32879591,  4.24639553],
       [37.24394461,  4.30584276],
       [36.30571187,  4.25519236],
       [36.22812526,  3.83147282],
       [35.74189693,  4.79222387],
       [36.12473586,  5.07731236]])

In [43]:
metrics_gpr = f.get_metrics(y_val, y_pred_gpr)
metrics_gpr

{'MSE': 11.51149245426818,
 'RMSE': 3.3835750753749823,
 'MAE': 2.480050141828296,
 'MAPE': 0.14673300233657438,
 'MedE': 1.751227871220035}

In [53]:
f.plot_results(df, y_val, y_pred_gpr, 'Gaussian Process Regressor')

### Multi-Layer Perceptron

In [45]:
from sklearn.neural_network import MLPRegressor

mlp_search = HalvingGridSearchCV(MLPRegressor(), param_grid={'activation': ['identity', 'logistic', 'tanh', 'relu'],
                                                             'solver': ['lbfgs', 'sgd', 'adam'],
                                                             'alpha': [0.0001, 0.001, 0.01, 0.1, 1],
                                                             'learning_rate': ['constant', 'invscaling', 'adaptive'],
                                                             'learning_rate_init': [0.001, 0.01, 0.1, 1]})
mlp_search.fit(x_train_normalized, y_train_normalized)
mlp_search.best_params_

{'activation': 'logistic',
 'alpha': 0.01,
 'learning_rate': 'invscaling',
 'learning_rate_init': 1,
 'solver': 'lbfgs'}

In [46]:
mlp = MLPRegressor(activation=mlp_search.best_params_['activation'],
                   solver=mlp_search.best_params_['solver'],
                   alpha=mlp_search.best_params_['alpha'],
                   learning_rate=mlp_search.best_params_['learning_rate'],
                   learning_rate_init=mlp_search.best_params_['learning_rate_init'])
mlp.fit(x_train_normalized, y_train_normalized)
y_pred_mlp_normalized = mlp.predict(x_val_normalized)
y_pred_mlp = y_val_scaler.inverse_transform(y_pred_mlp_normalized)
y_pred_mlp

array([[40.78432427,  8.0002641 ],
       [39.73716323,  7.58074601],
       [42.81556553,  5.13009174],
       [41.14169775,  4.75386169],
       [38.97146352,  6.11164301],
       [36.11354667,  5.5176401 ],
       [36.11577906,  5.62705481],
       [38.33400528,  5.3286533 ],
       [38.60307874,  5.89360697],
       [41.60255236,  5.81850511],
       [39.25773414,  9.03141121],
       [39.33329314,  8.64079225]])

In [47]:
metrics_mlp = f.get_metrics(y_val, y_pred_mlp)
metrics_mlp

{'MSE': 4.509701987321817,
 'RMSE': 2.0643017351488884,
 'MAE': 1.7056424839623325,
 'MAPE': 0.12138550428633015,
 'MedE': 1.610305128545323}

In [52]:
f.plot_results(df, y_val, y_pred_mlp, 'MLP Regressor')

## Avaliação de desempenho dos modelos

Para a seleção do melhor modelo são analisados 5 métricas de erro, sendo os seguintes:

1. MSE (Mean Squared Error): mede a média dos quadrados dos erros, destacando discrepâncias grandes, útil para penalizar erros maiores.
2. RMSE (Root Mean Squared Error): oferece uma interpretação mais intuitiva do MSE, representando o desvio padrão dos resíduos.
3. MAE (Mean Absolute Error): mede a média das diferenças absolutas entre as previsões e os valores reais, oferecendo uma visão equilibrada dos erros.
4. MAPE (Mean Aboslute Percentage Error): avalia o desempenho relativo do modelo em relação aos valores reais, expresso em termos percentuais.
5. MedE (Median Absolute Error): resiliente a outliers, captura a tendência central dos erros, fornecendo uma medida robusta de desempenho.

A escolha do melhor modelo será baseado no valor mínimo para cada erro, ou seja, caso um modelo obtenha o menor valor na maior quantidade de métricas mostradas, é tomado como o melhor.

In [49]:
models_metrics = pd.DataFrame([metrics_lr, metrics_ridge, metrics_lasso, metrics_ela_net, metrics_svr, metrics_knn, metrics_rf, metrics_gbr, metrics_dtr, metrics_gpr, metrics_mlp],
                               index=['Linear Regression', 'Ridge Regression', 'Lasso Regression', 'ElasticNet Regression',
                                      'SVR', 'kNN', 'Random Forest', 'Gradient Boosting Regressor',
                                      'Decision Tree Regressor', 'Gaussian Process Regressor', 'MLP Regressor'])
models_metrics.idxmin()

MSE     Decision Tree Regressor
RMSE    Decision Tree Regressor
MAE     Decision Tree Regressor
MAPE    Decision Tree Regressor
MedE    Decision Tree Regressor
dtype: object

In [50]:
from plotly.subplots import make_subplots

fig = make_subplots(rows=5, cols=1, shared_xaxes=True,vertical_spacing=0.04)

fig.add_trace(
    go.Bar(x=models_metrics.index, y=models_metrics.MSE,
           name='MSE (TWh/mês)', text=round(models_metrics.MSE,4), textposition='auto',
           marker=dict(color='#001a5f')),
    row=1, col=1
)

fig.add_trace(
    go.Bar(x=models_metrics.index, y=models_metrics.RMSE,
           name='RMSE (TWh/mês)', text=round(models_metrics.RMSE,4), textposition='auto',
           marker=dict(color='#004987')),
    row=2, col=1
)

fig.add_trace(
    go.Bar(x=models_metrics.index, y=models_metrics.MAE,
           name='MAE (TWh/mês)', text=round(models_metrics.MAE,4), textposition='auto',
           marker=dict(color='#00569d')),
    row=3, col=1
)

fig.add_trace(
    go.Bar(x=models_metrics.index, y=models_metrics.MAPE*100,
           name='MAPE (%)', text=round(models_metrics.MAPE*100,2), textposition='auto',
           marker=dict(color='#4e97d1')),
    row=4, col=1
)

fig.add_trace(
    go.Bar(x=models_metrics.index, y=models_metrics.MedE,
           name='MedE (TWh/mês)', text=round(models_metrics.MedE,4), textposition='auto',
           marker=dict(color='#aed3e3')),
    row=5, col=1
)

fig.update_layout(height=720, width=1280,
                  title_text='Métricas de Desempenho dos Modelos Preditivos')