# Introduction 

In the following notebook, I will be exploring a cleaned version of the energy_dataset.csv file.

* Raw data can be found [here](https://github.com/KishenSharma6/Weather-Energy-Consumption-in-Spain/tree/master/Data/01_Raw_Data)
* Cleaned data can be found [here](https://github.com/KishenSharma6/Weather-Energy-Consumption-in-Spain/tree/master/Data/02_Cleaned_Data)

**Read in libraries for notebook**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

**Set notebook preferences**

In [None]:
#Set preferences for pandas 
pd.set_option("display.max_rows", 101)

#Set style for visualizations
plt.style.use('Solarize_Light2')
font_title = {'fontsize' : 24, 
              'fontweight' : 'semibold',
             'fontname':'Gill Sans MT'}

#Surpress warnings
warnings.filterwarnings('ignore')

**Read in data**

In [None]:
#Set path to raw data
path = r'C:\Users\kishe\Documents\Data Science\Projects\Python Projects\In Progress\Spain Hourly Energy Demand and Weather'

#Read in raw data
df = pd.read_csv(path + '/Data/02_Cleaned_Data/2020_0505_Cleaned_Energy_Dataset.csv', index_col='date_time')

# Data Overview

* time: Datetime index localized to CET
* total load forecast: forecasted electrical demand
* total load actual: actual electrical demand
* price day ahead: forecasted price EUR/MWh
* price actual: price in EUR/MWh
* **date**: mm/dd/yyyy
* **time**: time of recording
* **weekday**: day of recording
* **month**: month of recording
* **year**: year of recording

**Data Preview**

In [None]:
#Print df shape
print('Shape of data:', df.shape)

#View head and data info
display(df.head())

**About the Data**

In [None]:
print('Data set stats and info:\n{}\n'.format(df.describe()))
print(df.info())

**Target distributions**

In [None]:
#Set plot
f, axes = plt.subplots(1,2, figsize = (20,10))

#Plot data
g = sns.distplot(df['total_load_actual'], ax = axes[0], color = 'r', bins=50)
j = sns.distplot(df['price_actual'], ax = axes[1],color = 'g', bins=50)

###Set plot aesthetics###
##plot 1##
#Title
g.set_title('Energy Load Across Spain',fontdict = font_title)

#Axes
g.set_xlabel('Kilowatt-Hours (kW/h)')
g.get_xaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "{:,}".format(int(x))))

##plot 2##
#Title
j.set_title('Energy Costs Across Spain', fontdict = font_title)

#Axes
j.set_xlabel('Price (EUR/MWh)')
j.get_xaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "$ {:}.00".format(int(x))))

# Exploratory Data Analysis

## Total Energy Load

**Raw Time Plot**

In [None]:
#Set figure
fig, ax = plt.subplots(2,1, figsize = (20,15))

#Plot data
df['total_load_actual'].plot(ax=ax[0], label = 'Raw Data', )
df['total_load_actual'].rolling(24*7,24*4).mean().plot(ax=ax[0],color = 'r',label = 'Weekly Average')
df['total_load_actual'].rolling(24*7*4,24*7*2).mean().plot(ax=ax[0],color = 'white',label = '4 Week Average')

df['total_load_actual'].rolling(24,1).std().plot(ax=ax[1],color = 'grey', label = 'Daily Standard Deviation', )
df['total_load_actual'].rolling(24*7,24*4).std().plot(ax=ax[1],color = 'g',label = 'Weekly Standard Deviation')
df['total_load_actual'].rolling(24*7*4,24*7*2).std().plot(ax=ax[1],color = 'black',label = '4 Week Standard Deviation')


###Set plot aesthetics###
##Plot 1##
#Title
ax[0].set_title('Total Energy Load across Spain',fontdict = font_title)

#Axes
ax[0].set_xlabel('')
ax[0].set_ylabel('Hourly Load (kW/h)')
ax[0].get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "{:,}".format(int(x))))

#Legend
ax[0].legend(fancybox = True, shadow = True, frameon = True)


##Plot 2##
#Title
ax[1].set_title('Standard Deviation', fontdict=font_title)

#Axes
ax[1].set_xlabel('')
ax[1].set_ylabel('Hourly Load (kW/h)')
ax[1].get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "{:,}".format(int(x))))

#Legend
ax[1].legend(fancybox = True, shadow = True, frameon = True)


**Autocorrelation Plot**

In [None]:
#Import plot acf
from statsmodels.graphics.tsaplots import plot_acf

#Set plot
f, ax = plt.subplots(figsize = (13,8))

#Fit to data
plot_acf(df['total_load_actual'], lags = 50, ax = ax),

###Set plot aesthetics###
#Title
ax.set_title('Total Energy Load Autocorrelation', fontdict = font_title)

**View lag at 24 hrs**

In [None]:
#Set plot
f, ax = plt.subplots(figsize = (20,10))

#Plot time plot of first 2 weeks to view seasonality
df['total_load_actual'][:24*7*2].plot(ax=ax, label = 'Hourly Data')
df['total_load_actual'][:24*7*2].rolling(12, 6).mean().plot(ax=ax, color = 'r', label ='12 Hour Average') #12 hours
df['total_load_actual'][:24*7*2].rolling(24, 12).mean().plot(ax=ax, color = 'g', label = 'Daily Average') #Daily

###Set plot aesthetics###
#Title
ax.set_title('Total Energy Load across Spain',fontdict = font_title)

#Axes
ax.set_xlabel('')
ax.set_ylabel('Hourly Load (kW/h)')
ax.get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "{:,}".format(int(x))))

#Legend
ax.legend(fancybox = True, shadow = True, frameon = True);

**Check for yearly patterns**

In [None]:
#Set plot
f, ax = plt.subplots(figsize = (20,10))

#Set colors for hue
colors = ['r','b','g','y']


#Plot Data
g = sns.lineplot(x = 'month', y = 'total_load_actual', hue = 'year',data = df, 
             sort = False, palette=colors, ax = ax);

###Set plot aesthetics###
#Title
g.set_title('Monthly Energy Load per year', fontdict = font_title)

#Axes
g.set_ylabel('Kilowatt-Hours (kW/h)')
g.get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "{:,}".format(int(x))))

#Legend
g.legend(fancybox = True, shadow = True, frameon = True);

# Price 

**Raw data plot**

In [None]:
#Set figure
fig, ax = plt.subplots(2,1, figsize = (20,15))

#Plot Data
df['price_actual'].plot(ax =ax[0], label = 'Actual')
df['price_actual'].rolling(24*7,24*7).mean().plot(ax =ax[0],color = 'r', label = 'Weekly Rolling Average')
df['price_actual'].rolling(24*7*4,24*7).mean().plot(ax =ax[0], color = 'w', label = '4-week Rolling Standard Deviation')

df['price_actual'].rolling(24*7,24*7).std().plot(ax =ax[1], color = 'g', label = 'Weekly Rolling Standard Deviation')
df['price_actual'].rolling(24*7*4,24*7).std().plot(ax =ax[1], color = 'black', label = '4-week Rolling Standard Deviation')

###Set plot aesthetics###
##Plot 1##
#Title
ax[0].set_title('Price/Hour Spain',fontsize = 22, fontweight = 'bold')

#Axes
ax[0].set_xlabel('')
ax[0].set_ylabel('Price (EUR/MWh)', fontweight = 'bold')
ax[0].get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "${:}.00".format(int(x))))

#Legend
ax[0].legend(fancybox = True, shadow = True, frameon = True)

##Plot 2##
#Title
ax[1].set_title('')

#Axes
ax[1].set_xlabel('')
ax[1].set_ylabel('Price (EUR/MWh)', fontweight = 'bold')
ax[1].get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "${:}.00".format(int(x))))

#Legend
ax[1].legend(fancybox = True, shadow = True, frameon = True);

**Autocorrelation Plot**

In [None]:
#Import plot acf
from statsmodels.graphics.tsaplots import plot_acf

#Set plot
f, ax = plt.subplots(figsize = (13,8))

#Fit to data
plot_acf(df['price_actual'], lags = 50, ax = ax),

###Set plot aesthetics###
#Title
ax.set_title('Price Autocorrelation', fontdict = font_title)

**View lag at 24 hrs**

In [None]:
#Set plot
f, ax = plt.subplots(figsize = (20,10))

#Plot time plot of first 2 weeks to view seasonality
df['price_actual'][:24*7*2].plot(ax=ax, label = 'Hourly Data')
df['price_actual'][:24*7*2].rolling(12, 6).mean().plot(ax=ax, color = 'r', label ='12 Hour Average') #12 hours
df['price_actual'][:24*7*2].rolling(24, 12).mean().plot(ax=ax, color = 'g', label = 'Daily Average') #Daily

###Set plot aesthetics###
#Title
ax.set_title('Prices Across Spain',fontdict = font_title)

#Axes
ax.set_xlabel('')
ax.set_ylabel('Price (EUR/MWh)')
ax.get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "${:}".format(int(x))))

#Legend
ax.legend(fancybox = True, shadow = True, frameon = True);

**Check for yearly patterns**

In [None]:
#Set plot
f, ax = plt.subplots(figsize = (20,10))

#Set colors for hue
colors = ['r','b','g','y']


#Plot Data
g = sns.lineplot(x = 'month', y = 'price_actual', hue = 'year',data = df, 
             sort = False, palette=colors, ax = ax);

###Set plot aesthetics###
#Title
g.set_title('Monthly Price per year', fontdict = font_title)

#Axes
g.set_ylabel('Price (EUR/MWh)')
g.get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "${:}".format(int(x))))

#Legend
g.legend(fancybox = True, shadow = True, frameon = True);