#Uso de series difusas para el desarrollo de modelos de predicción de demandas de la energía eléctrica.

**Estudiante: José Manuel Rubio Cienfuegos.**

**Prof. José Miguel Rubio León.**

**Prof coguía: Francisco Rivera.**

El siguiente notebook contiene el sistema encargado de procesar la base de datos que contiene información sobre diversos consumos de energía, además de datos sobre las variables que afectan este consumo de energía.

##Librerías

Esta celda contiene información sobre todas librerías necesarias para el procesamiento de la información y exposición de los resultados obtenidos.

In [22]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sb
import statistics as st

# sklearn models
from sklearn import linear_model
from sklearn.tree import DecisionTreeRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR

# metrics evaluate
from sklearn.metrics import mean_squared_error, max_error, mean_absolute_error
# from sklearn.utils import check_arrays

# MAPE Function

def MAPE(y_true, y_pred):
  true_length, pred_length = y_true.__len__(), y_pred.__len__()
  assert true_length == pred_length, ('Lengths of arrays are differents, y_true have ' + str(true_length) + ' values and y_pred have ' + str(pred_length) + ' values.')
  return (100*(sum(abs((y_true - y_pred)/(y_true)))))/(y_true.__len__())

##Carga de datos

Los datos a utilizar se cargan en este bloque. Previo a esto se deben haber subido a la interfaz de colab o habilitar un link de algun servidor para cargarlos de manera remota.

In [52]:
# Extracción de datasets
df_energy = pd.read_csv('energy_dataset.csv')
df_weather = pd.read_csv('weather_features.csv')

The MAPE of the work is: 1.096 %


In [54]:
df_energy.head()

Unnamed: 0,time,generation biomass,generation fossil brown coal/lignite,generation fossil coal-derived gas,generation fossil gas,generation fossil hard coal,generation fossil oil,generation fossil oil shale,generation fossil peat,generation geothermal,generation hydro pumped storage aggregated,generation hydro pumped storage consumption,generation hydro run-of-river and poundage,generation hydro water reservoir,generation marine,generation nuclear,generation other,generation other renewable,generation solar,generation waste,generation wind offshore,generation wind onshore,forecast solar day ahead,forecast wind offshore eday ahead,forecast wind onshore day ahead,total load forecast,total load actual,price day ahead,price actual
0,2015-01-01 00:00:00+01:00,447.0,329.0,0.0,4844.0,4821.0,162.0,0.0,0.0,0.0,,863.0,1051.0,1899.0,0.0,7096.0,43.0,73.0,49.0,196.0,0.0,6378.0,17.0,,6436.0,26118.0,25385.0,50.1,65.41
1,2015-01-01 01:00:00+01:00,449.0,328.0,0.0,5196.0,4755.0,158.0,0.0,0.0,0.0,,920.0,1009.0,1658.0,0.0,7096.0,43.0,71.0,50.0,195.0,0.0,5890.0,16.0,,5856.0,24934.0,24382.0,48.1,64.92
2,2015-01-01 02:00:00+01:00,448.0,323.0,0.0,4857.0,4581.0,157.0,0.0,0.0,0.0,,1164.0,973.0,1371.0,0.0,7099.0,43.0,73.0,50.0,196.0,0.0,5461.0,8.0,,5454.0,23515.0,22734.0,47.33,64.48
3,2015-01-01 03:00:00+01:00,438.0,254.0,0.0,4314.0,4131.0,160.0,0.0,0.0,0.0,,1503.0,949.0,779.0,0.0,7098.0,43.0,75.0,50.0,191.0,0.0,5238.0,2.0,,5151.0,22642.0,21286.0,42.27,59.32
4,2015-01-01 04:00:00+01:00,428.0,187.0,0.0,4130.0,3840.0,156.0,0.0,0.0,0.0,,1826.0,953.0,720.0,0.0,7097.0,43.0,74.0,42.0,189.0,0.0,4935.0,9.0,,4861.0,21785.0,20264.0,38.41,56.04


In [55]:
df_weather.head()

Unnamed: 0,dt_iso,city_name,temp,temp_min,temp_max,pressure,humidity,wind_speed,wind_deg,rain_1h,rain_3h,snow_3h,clouds_all,weather_id,weather_main,weather_description,weather_icon
0,2015-01-01 00:00:00+01:00,Valencia,270.475,270.475,270.475,1001,77,1,62,0.0,0.0,0.0,0,800,clear,sky is clear,01n
1,2015-01-01 01:00:00+01:00,Valencia,270.475,270.475,270.475,1001,77,1,62,0.0,0.0,0.0,0,800,clear,sky is clear,01n
2,2015-01-01 02:00:00+01:00,Valencia,269.686,269.686,269.686,1002,78,0,23,0.0,0.0,0.0,0,800,clear,sky is clear,01n
3,2015-01-01 03:00:00+01:00,Valencia,269.686,269.686,269.686,1002,78,0,23,0.0,0.0,0.0,0,800,clear,sky is clear,01n
4,2015-01-01 04:00:00+01:00,Valencia,269.686,269.686,269.686,1002,78,0,23,0.0,0.0,0.0,0,800,clear,sky is clear,01n


Antes de desarrollar algun modelo, se debe destacar que dentro de los datos ya se cuentan con valores predichos para la demanda. Por ello esta celda indica cual es el **MAPE** (*Mean Absolute Percentage Error*) de los datos pronosticados guardados en el dataset *energy_dataset.csv*

In [53]:
df_demand = df_energy[['total load actual', 'total load forecast']].dropna() # Eliminación de nans presentes en la columna de valores de la demanda.
demand_total, demand_forecast = df_demand['total load actual'], df_demand['total load forecast']

# MAPE de los valores predichos en la demanda del dataset original.
MAPE_goal = MAPE(demand_total.values, demand_forecast.values)
print('The MAPE of the work is:', round(MAPE_goal, 3), '%') 

The MAPE of the work is: 1.096 %


In [79]:
energy_values = df_energy.values
generation_values, demand_values = energy_values[:,1:-7], demand_total.values
generation_values.shape

(35064, 21)

In [76]:
df_energy

Unnamed: 0,time,generation biomass,generation fossil brown coal/lignite,generation fossil coal-derived gas,generation fossil gas,generation fossil hard coal,generation fossil oil,generation fossil oil shale,generation fossil peat,generation geothermal,generation hydro pumped storage aggregated,generation hydro pumped storage consumption,generation hydro run-of-river and poundage,generation hydro water reservoir,generation marine,generation nuclear,generation other,generation other renewable,generation solar,generation waste,generation wind offshore,generation wind onshore,forecast solar day ahead,forecast wind offshore eday ahead,forecast wind onshore day ahead,total load forecast,total load actual,price day ahead,price actual
0,2015-01-01 00:00:00+01:00,447.0,329.0,0.0,4844.0,4821.0,162.0,0.0,0.0,0.0,,863.0,1051.0,1899.0,0.0,7096.0,43.0,73.0,49.0,196.0,0.0,6378.0,17.0,,6436.0,26118.0,25385.0,50.10,65.41
1,2015-01-01 01:00:00+01:00,449.0,328.0,0.0,5196.0,4755.0,158.0,0.0,0.0,0.0,,920.0,1009.0,1658.0,0.0,7096.0,43.0,71.0,50.0,195.0,0.0,5890.0,16.0,,5856.0,24934.0,24382.0,48.10,64.92
2,2015-01-01 02:00:00+01:00,448.0,323.0,0.0,4857.0,4581.0,157.0,0.0,0.0,0.0,,1164.0,973.0,1371.0,0.0,7099.0,43.0,73.0,50.0,196.0,0.0,5461.0,8.0,,5454.0,23515.0,22734.0,47.33,64.48
3,2015-01-01 03:00:00+01:00,438.0,254.0,0.0,4314.0,4131.0,160.0,0.0,0.0,0.0,,1503.0,949.0,779.0,0.0,7098.0,43.0,75.0,50.0,191.0,0.0,5238.0,2.0,,5151.0,22642.0,21286.0,42.27,59.32
4,2015-01-01 04:00:00+01:00,428.0,187.0,0.0,4130.0,3840.0,156.0,0.0,0.0,0.0,,1826.0,953.0,720.0,0.0,7097.0,43.0,74.0,42.0,189.0,0.0,4935.0,9.0,,4861.0,21785.0,20264.0,38.41,56.04
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35059,2018-12-31 19:00:00+01:00,297.0,0.0,0.0,7634.0,2628.0,178.0,0.0,0.0,0.0,,1.0,1135.0,4836.0,0.0,6073.0,63.0,95.0,85.0,277.0,0.0,3113.0,96.0,,3253.0,30619.0,30653.0,68.85,77.02
35060,2018-12-31 20:00:00+01:00,296.0,0.0,0.0,7241.0,2566.0,174.0,0.0,0.0,0.0,,1.0,1172.0,3931.0,0.0,6074.0,62.0,95.0,33.0,280.0,0.0,3288.0,51.0,,3353.0,29932.0,29735.0,68.40,76.16
35061,2018-12-31 21:00:00+01:00,292.0,0.0,0.0,7025.0,2422.0,168.0,0.0,0.0,0.0,,50.0,1148.0,2831.0,0.0,6076.0,61.0,94.0,31.0,286.0,0.0,3503.0,36.0,,3404.0,27903.0,28071.0,66.88,74.30
35062,2018-12-31 22:00:00+01:00,293.0,0.0,0.0,6562.0,2293.0,163.0,0.0,0.0,0.0,,108.0,1128.0,2068.0,0.0,6075.0,61.0,93.0,31.0,287.0,0.0,3586.0,29.0,,3273.0,25450.0,25801.0,63.93,69.89


In [75]:
generation_values

array([[447.0, 329.0, 0.0, ..., 196.0, 0.0, 6378.0],
       [449.0, 328.0, 0.0, ..., 195.0, 0.0, 5890.0],
       [448.0, 323.0, 0.0, ..., 196.0, 0.0, 5461.0],
       ...,
       [292.0, 0.0, 0.0, ..., 286.0, 0.0, 3503.0],
       [293.0, 0.0, 0.0, ..., 287.0, 0.0, 3586.0],
       [290.0, 0.0, 0.0, ..., 287.0, 0.0, 3651.0]], dtype=object)

In [72]:
A = np.array([[1,3],[4,5]])
np.sum(A, 1)

array([4, 9])

## Extracción de variables estocásticas de interes para la demanda de la energía eléctrica.

Una parte importante de este trabajo es la búsqueda de las características que más afectan a la demanda de la energía eléctrica, donde se espera que estas esten presentes dentro de la base de datos. Para ellos se realiza una búsqueda de cuales son las variables que más influyen en los valores presentes en la demanda de la energía.

## Series Temporales

En esta sección se procesa la información de la base de datos para realizar las predicciones mediante el uso de series temporales clásicas.

### Modelos clásicos de ML.

En estas celdas se presentan el desarrollo de regresores clásicos de machine learning, para lo cual se utiliza la librería sklearn para su configuración básica.

## Series Temporales Difusas.

Mediante la representación mostrada en la sección anterior, en esta sección se añade las características de los conjuntos difusos para la obtención de nuevas representaciones de la información de la demanda de la energía.

### Modelos de machine learning aplicados a series difusas.

De manera similar a la seccion anterior, tras la conversión de los datos para la obtención de las series temporales difusas, entonces se aplican los mismos modelos para la predicción de la demanda en este dominio, ademas de hacerse la operación inversa para la obtención de la predicción en el dominio inicial.