![spam](img/EDA.png)

In [None]:
import warnings
warnings.filterwarnings('ignore')

## 1.   Exploratory Data Analysis 

<summary>
    <font size="4" color="orange"><b>1.1 Importing libraries and functions</b></font>
</summary>

In [None]:
# Basic libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Easy graphs with plotly
import plotly.express as px
import plotly.graph_objects as go
import seaborn as sns

# Matplotlib plots look like
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (15,7)
import pickle as pk


<summary>
    <font size="4" color="orange"><b>1.2 Loading CENACE database: 49 input variables </b></font>
</summary>

<img src="img/calendarsymbol.png" width="40" img align="left" />  

<font size="3" color="palevioletred"><b>Exogenous Calendar Features </b></font>

* **FECHA** (yy-mm-dd): Date

"Holiday" (0|1) indicator:

* **Lunes_Festivo**: Holiday Monday

* **Martes_PostFestivo**: Day after holiday Monday

* **Semana_Santa**: Holy Week

* **1_Mayo**: May 1

* **10_Mayo**: May 10

* **16_Sep**: September 16

* **2_Nov.**: November 2

* **Pre-Navidad_y_new_year**: Day before Christmas or New Year

* **Navidad_y_new_year**: Christmas or New Year

* **Post-Navidad_y_new_year**: Day after Christmas or New Year

<img src="img/lightsymbol.png" alt="drawing" width="25" img align="left" />  

<font size="3" color="palevioletred"><b>Endogenous Feature</b></font>

* **DEM_GCRNO_H$i$** (MW): Load energy demand in GCRNO (Gerencia de Control de Noroeste)  zone from hour $i$ to hour $i+1$ of the corresponding date, for $i=0,\dots 23$.

<img src="img/meteosymbol.png" alt="drawing" width="60" img align="left" />

<font size="3" color="palevioletred"><b>Exogenous Meteorological Features</b></font>

* **TMAX-CAB**, **TMIN-CAB**, **TMAX-HMO**, **TMIN-HMO**, **TMAX-OBR**, **TMIN-OBR**,**TMAX-LMO**, **TMIN-LMO**, **TMAX-CUL**, **TMIN-CUL** ($^\circ$C): Maximum and minimum temperature in Caborca, Hermosillo, Ciudad Obregón, Los Mochis and Culiacán, respectively.

* **PREC_HMO_mm**, **PREC_OBR_mm**, **PREC_LMO_mm**, **PREC_CUL_mm**  (mm/h): Precipitation in Hermosillo, Ciudad Obregón, Los Mochis and Culiacán, respectively.



In [None]:
# Importing load energy consumption CENACE database
url = "./inputs/Dataset GCRNO120522 DF.xlsx" #data
gcrno = pd.read_excel(url)
gcrno.columns

<summary>
    <font size="4" color="orange"><b>1.3. Dataframe rearrangement</b></font>
</summary>

The above dataframe will be transorm in a new one with:

* *INSTANCES* (index):

    **FECHA-HORA** (Date-Hour) specified in the format yyyy-mm-dd hh:00:00
    
    
* *FEATURES*: 

    **DEMANDA** Load energy demand
    
    **DIA** (Day)
       0 Monday 
       1 Tuesday 
       2 Wednesday 
       3 Thursday 
       4 Friday 
       5 Saturday 
       6 Sunday
    
    **HORA** (Hour 0–23)
      
    **MES** (Month)
       1 January
       2 February
       3 March
       4 April
       5 May
       6 Jun
       7 July
       8 August
       9 September
       10 Octuber
       11 November
       12 December
    
    And the following characteristics with constant value with respect to the day **TMAX-CAB**, **TMIN-CAB**, **TMAX-HMO**, **TMIN-HMO**, **TMAX-OBR**, **TMIN-OBR**,**TMAX-LMO**, **TMIN-LMO**, **TMAX-CUL**, **TMIN-CUL**, **PREC_HMO_MM**, **PREC_OBR_MM**, **PREC_LMO_MM**, **PREC_CUL_MM**, **LUNES_FESTIVO**, **MARTES_POSTFESTIVO**, **SEMANA_SANTA**, **1_MAYO**, **10_MAYO**, **16_SEP**, **2_NOV.**, **PRE-NAVIDAD_Y_NEW_YEAR**, **NAVIDAD_Y_NEW_YEAR**, **POST-NAVIDAD_Y_NEW_YEAR**.

In [None]:
# Transposing hours columns from the original dataframe into rows
consumo_data = gcrno.melt(
    id_vars= ['FECHA'],
    value_vars= [f'DEM_GCRNO_H{i}' for i in range(24)],
    var_name="HORA",
    value_name="DEMANDA"
).replace(
    {f'DEM_GCRNO_H{i}': i for i in range(24)}
)
# Creating Day, Hour and Month columns
consumo_data.index = consumo_data.FECHA + pd.to_timedelta(consumo_data.HORA, unit='h')
consumo_data.sort_index(inplace=True)
consumo_data.drop(columns=['HORA'], inplace=True)
consumo_data = consumo_data.asfreq('h', method='pad')
consumo_data['FECHAHORA'] = consumo_data.index
consumo_data["DIA"] = consumo_data.index.weekday
consumo_data["HORA"] = consumo_data.index.hour
consumo_data["MES"] = consumo_data.index.month

In [None]:
# Adding columns of exogenous variables
exogenas = gcrno[['FECHA','TMAX-CAB', 'TMAX-HMO', 'TMAX-OBR', 'TMAX-LMO', 'TMAX-CUL', 'TMIN-CAB',
       'TMIN-HMO', 'TMIN-OBR', 'TMIN-LMO', 'TMIN-CUL', 'PREC_HMO_MM',
       'PREC_OBR_MM', 'PREC_LMO_MM', 'PREC_CUL_MM', 'LUNES_FESTIVO',
       'MARTES_POSTFESTIVO', 'SEMANA_SANTA', '1_MAYO', '10_MAYO', '16_SEP',
       '2_NOV.', 'PRE-NAVIDAD_Y_NEW_YEAR', 'NAVIDAD_Y_NEW_YEAR',
       'POST-NAVIDAD_Y_NEW_YEAR']]
consumo = pd.merge(consumo_data, exogenas, on='FECHA', how='left')

In [None]:
# Setting as index the DATE-HOUR
del consumo['FECHA']
consumo.set_index("FECHAHORA", inplace=True)
consumo=consumo.asfreq('h')

In [None]:
# Verifying existence of missing data 
consumo.info()

<summary>
    <font size="4" color="orange"><b>2. Exploring variables</b></font>
</summary>

<br/>

<summary>
    <img src="img/lightsymbol.png" alt="drawing" width="15" img align="left" /> 
    <font size="3" color="palevioletred"><b>Energy Demand</b></font>
</summary>

In [None]:
#looking for more information on endogenous Features Demanda
consumo['DEMANDA'].describe()

#### This chart shows a similar figure year after year, it is also possible to see an ascending trend

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=consumo.index, y=consumo['DEMANDA'],
                    mode='lines',
                    name='Energy Demand'))
                                         
fig.update_layout(title_text="Energy Demand period 01/01/07-xx/05/22", height=600) 


fig.show()

In [None]:
#looking for outliers with a Box-plot

fig = go.Figure()
fig.add_trace(go.Box(x=consumo['DEMANDA'], name='Energy Demand'))
fig.update_layout(title_text="Box Plot Energy Demand period 01/01/07-xx/05/22", height=600) 
fig.show()

In [None]:
#Histogram of Demanada
fig = px.histogram(consumo, x="DEMANDA", nbins=12, title="Histogram Energy Demand period 01/01/07-xx/05/22")
fig.show()

<summary>
    <img src="img/meteosymbol.png" alt="drawing" width="60" img align="left" />
    <font size="3" color="palevioletred"><b>Exogenous Meteorological Features</b></font>
</summary>

In [None]:
#looking for more information on enxogenous Features TMAX & TMIN
consumo[['TMAX-CAB', 'TMAX-HMO', 'TMAX-OBR', 'TMAX-LMO', 'TMAX-CUL', 'TMIN-CAB',
       'TMIN-HMO', 'TMIN-OBR', 'TMIN-LMO', 'TMIN-CUL']].describe()

In [None]:
#looking for outliers in TMIN Features with a Box-plot
fig = go.Figure()
fig.add_trace(go.Box(x=consumo['TMIN-CAB'], name='TMIN-CAB'))
fig.add_trace(go.Box(x=consumo['TMIN-HMO'], name='TMIN-HMO'))
fig.add_trace(go.Box(x=consumo['TMIN-OBR'], name='TMIN-OBR'))
fig.add_trace(go.Box(x=consumo['TMIN-LMO'], name='TMIN-LMO'))
fig.add_trace(go.Box(x=consumo['TMIN-CUL'], name='TMIN-CUL'))
fig.update_layout(title_text="Box Plot Lowest Temperature period 01/01/07-xx/05/22", height=600) 
fig.show()

In [None]:
#looking for outliers in TMAX Features with a Box-plot
fig = go.Figure()
fig.add_trace(go.Box(x=consumo['TMAX-CAB'], name='TMAX-CAB'))
fig.add_trace(go.Box(x=consumo['TMAX-HMO'], name='TMAX-HMO'))
fig.add_trace(go.Box(x=consumo['TMAX-OBR'], name='TMAX-OBR'))
fig.add_trace(go.Box(x=consumo['TMAX-LMO'], name='TMAX-LMO'))
fig.add_trace(go.Box(x=consumo['TMAX-CUL'], name='TMAX-CUL'))
fig.update_layout(title_text="Box Plot Heighest Temperature period 01/01/07-xx/05/22", height=600) 
fig.show()

In [None]:
#looking for more information on enxogenous Features Rainfall (PREC_XX_MM)
consumo[['PREC_HMO_MM','PREC_OBR_MM', 'PREC_LMO_MM', 'PREC_CUL_MM']].describe()

In [None]:
#looking for outliers in Rainfall (PREC_XX_MM) Features with a Box-plot
fig = go.Figure()
fig.add_trace(go.Box(x=consumo['PREC_HMO_MM'], name='PREC_HMO_MM'))
fig.add_trace(go.Box(x=consumo['PREC_OBR_MM'], name='PREC_OBR_MM'))
fig.add_trace(go.Box(x=consumo['PREC_LMO_MM'], name='PREC_LMO_MM'))
fig.add_trace(go.Box(x=consumo['PREC_CUL_MM'], name='PREC_CUL_MM'))
fig.update_layout(title_text="Box Plot Rainfall period 01/01/07-xx/05/22", height=600) 
fig.show()

In [None]:
#correlation DEMANDA & Exogenuos Meteorological Features

corrMatrix = consumo[['DEMANDA','TMAX-CAB', 'TMAX-HMO', 'TMAX-OBR', 'TMAX-LMO', 'TMAX-CUL', 'TMIN-CAB',
       'TMIN-HMO', 'TMIN-OBR', 'TMIN-LMO', 'TMIN-CUL','PREC_HMO_MM','PREC_OBR_MM', 'PREC_LMO_MM', 'PREC_CUL_MM']].corr()
sns.heatmap(corrMatrix, annot=True)
plt.show()

In [None]:
fig = px.scatter_matrix(consumo, dimensions=['TMAX-CAB', 'TMAX-HMO', 'TMAX-OBR', 'TMAX-LMO', 'TMAX-CUL'], color="DEMANDA")
fig.update_layout(title_text="Multiple scatter Matrix DEMANDA vs TMAX Exogenuos Meteorological Features", height=1200) 
fig.show()

In [None]:
fig = px.scatter_matrix(consumo, dimensions=['TMIN-CAB', 'TMIN-HMO', 'TMIN-OBR', 'TMIN-LMO', 'TMIN-CUL'], color="DEMANDA")
fig.update_layout(title_text="Multiple scatter Matrix DEMANDA vs TMIN Exogenuos Meteorological Features", height=1200) 
fig.show()

In [None]:
fig = px.scatter_matrix(consumo, dimensions=['PREC_HMO_MM','PREC_OBR_MM', 'PREC_LMO_MM', 'PREC_CUL_MM'], color="DEMANDA")
fig.update_layout(title_text="Multiple scatter Matrix DEMANDA vs Rainfall Exogenuos Meteorological Features", height=1200) 
fig.show()

<summary>
    <img src="img/calendarsymbol.png" width="25" img align="left" />  
    <font size="3" color="palevioletred"><b>Exogenous Calendar Features </b></font>
</summary>    

In [None]:
#correlation DEMANDA & Exogenuos Calendar Features

corrMatrix = consumo[['LUNES_FESTIVO',
       'MARTES_POSTFESTIVO', 'SEMANA_SANTA', '1_MAYO', '10_MAYO', '16_SEP',
       '2_NOV.', 'PRE-NAVIDAD_Y_NEW_YEAR', 'NAVIDAD_Y_NEW_YEAR',
       'POST-NAVIDAD_Y_NEW_YEAR']].corr()
sns.heatmap(corrMatrix, annot=True)
plt.show()

In [None]:
from IPython import display
display.Image("https://mcd.unison.mx/wp-content/themes/awaken/img/logo_mcd.png", embed = True)

<summary>
    <font size="4" color="gray"> Maestría en Ciencia de Datos | Universidad de Sonora </font>
</summary>
<font size="1" color="gray"> Blvd. Luis Encinas y Rosales s/n Col. Centro. Edificio 3K1 planta baja C.P. 83000, Hermosillo, Sonora, México </font>
<font size="1" color="gray"> mcd@unison.mx </font>
<font size="1" color="gray"> Tel: +52 (662) 259 2155  </font>