# Energy Price Prediction Project

## Previous Notebooks

- [Energy data import and cleaning](1.0-GME-Data.ipynb)

In [2]:
import numpy as np
import pandas as pd
import os

## Weather Data

Degree days are a simplification of outside temperature, as this [website](http://www.degreedays.net/) explains:

>Degree days are essentially a simplified representation of outside air-temperature data. They are widely used in the energy industry for calculations relating to the effect of outside air temperature on building energy consumption.

>"Heating degree days", or "HDD", are a measure of how much (in degrees), and for how long (in days), outside air temperature was lower than a specific "base temperature" (or "balance point"). They are used for calculations relating to the energy consumption required to heat buildings.

>"Cooling degree days", or "CDD", are a measure of how much (in degrees), and for how long (in days), outside air temperature was higher than a specific base temperature. They are used for calculations relating to the energy consumption required to cool buildings.

Here I downloaded data for HDDs and CDDs for the last 36 months from three different stations, in Milan, Rome and Naples:

In [3]:
hdd = pd.DataFrame(columns=['Date', 'HDD', 'station'])

for file in os.listdir('../data/raw'):
    filename = os.fsdecode(file)
    if 'HDD' in filename and filename.endswith('.csv'):
        data = pd.read_csv('../data/raw/{}'.format(filename), skiprows=6, sep=';')
        data['HDD'] = pd.to_numeric(data['HDD'].str.replace(',', '.'))
        data['Date'] = pd.to_datetime(data['Date'])
        data['station'] = filename[0:4]
        data.drop('% Estimated', axis=1, inplace=True)
        hdd = hdd.append(data)

In [4]:
cdd = pd.DataFrame(columns=['Date', 'CDD', 'station'])

for file in os.listdir('../data/raw'):
    filename = os.fsdecode(file)
    if 'CDD' in filename and filename.endswith('.csv'):
        data = pd.read_csv('../data/raw/{}'.format(filename), skiprows=6, sep=';')
        data['CDD'] = pd.to_numeric(data['CDD'].str.replace(',', '.'))
        data['Date'] = pd.to_datetime(data['Date'])
        data['station'] = filename[0:4]
        data.drop('% Estimated', axis=1, inplace=True)
        cdd = cdd.append(data)

In [5]:
print(len(hdd))
print(min(hdd['Date']))
print(max(hdd['Date']))

print(len(cdd))
print(min(cdd['Date']))
print(max(cdd['Date']))

3302
2014-11-01 00:00:00
2017-11-05 00:00:00
3302
2014-11-01 00:00:00
2017-11-05 00:00:00


Pivoting the data to get one row per day and merging the HDDs and CDDs:

In [13]:
hdd = hdd.pivot_table(values='HDD', index='Date', columns='station').rename(columns={'LIML':'hdd_liml', 'LIRA':'hdd_lira', 'LIRN':'hdd_lirn'})
cdd = cdd.pivot_table(values='CDD', index='Date', columns='station').rename(columns={'LIML':'cdd_liml', 'LIRA':'cdd_lira', 'LIRN':'cdd_lirn'})

In [16]:
weather = hdd.merge(cdd, how='inner', left_index=True, right_index=True)
weather.to_pickle('../data/interim/weather.pkl')

## Following Notebooks

- [Energy price futures import and cleaning](1.2-Futures-Data.ipynb)
- [Gas price import and cleaning](1.3-Gas-Data.ipynb)
- [Merging data](1.5-Merge-Data.ipynb)
- [Exploratory data analysis](2.0-EDA.ipynb)
- [Feature engineering](3.0-Feature-Engineering.ipynb)
- [More exploratory data analysis](4.0-EDA-Bis.ipynb)
- [Predictive model](5.0-Model.ipynb)