# import libraries

In [4]:
import numpy as np
import pandas as pd

from tqdm import tqdm_notebook
from entsoe import EntsoePandasClient

pd.options.display.max_rows = 10
pd.options.display.max_columns = None

# data collections

Now we collect all data we need from __[ENTSO-E](https://transparency.entsoe.eu/)__ platform.

**Entso-e** is a european TSO that provide through rest api several informations about energy market in Europe.

_You need to request an api key to access the data._

You also can install __[entose-py](https://github.com/EnergieID/entsoe-py)__, a library that does all the magic for you:

> pip install entsoe-py

In [5]:
API_KEY = '*** your api key here ***'

START_DATE = pd.Timestamp('2015-01-01 00:00', tz='Europe/Paris')
END_DATE = pd.Timestamp('2018-12-31 23:00', tz='Europe/Paris')

print(START_DATE, END_DATE)

entsoe_client = EntsoePandasClient(api_key=API_KEY, retry_count=3, retry_delay=5)

2015-01-01 00:00:00+01:00 2018-12-31 23:00:00+01:00


France Energy Price, the dependant variable to predict

In [4]:
prices = entsoe_client.query_day_ahead_prices(country_code='FR', start=START_DATE, end=END_DATE)
prices.name = 'Y'

Connection Error, retrying in 5 seconds


Total Load Forecast, the day ahead reference value for the energy demand

In [5]:
total_load_forecast = entsoe_client.query_load_forecast(country_code='FR', start=START_DATE, end=END_DATE)
total_load_forecast.name = 'LOAD_FORECAST'

Generation Forecast, the forecast value for the energy production quantity for the current day

In [6]:
generation_forecast = entsoe_client.query_generation_forecast(country_code='FR', start=START_DATE, end=END_DATE)
generation_forecast.name = 'GENERATION_FORECAST'

Wind and Solar Forecast, the forecast values for the renewable energy production for the current day

In [7]:
wind_solar_forecast = entsoe_client.query_wind_and_solar_forecast(country_code='FR', 
                                                                  start=START_DATE, end=END_DATE)
wind_solar_forecast.columns = [c.replace(' ','_').upper() for c in wind_solar_forecast.columns]

CrossBorder Physical Flow, the value of the energy transmission between countries for the previous days

In [8]:
END_DATE_FIX = END_DATE + pd.to_timedelta(1,unit='h')

from_to = [('FR','BE'),('FR','CH'),('FR','ES'),('FR','DE'),('FR','IT')]

crossborder_flows = {}

for frm, to in tqdm_notebook(from_to):
    crossborder_flows[f'crossborder_flow_{frm}-{to}'] = entsoe_client.query_crossborder_flows(
        country_code_from=frm, country_code_to=to, start=START_DATE, end=END_DATE_FIX)
    crossborder_flows[f'crossborder_flow_{frm}-{to}'].name = f'crossborder_flow_{frm}-{to}'
    crossborder_flows[f'crossborder_flow_{to}-{frm}'] = entsoe_client.query_crossborder_flows(
        country_code_from=to, country_code_to=frm, start=START_DATE, end=END_DATE_FIX)
    crossborder_flows[f'crossborder_flow_{to}-{frm}'].name = f'crossborder_flow_{to}-{frm}'

HBox(children=(IntProgress(value=0, max=5), HTML(value='')))




We have to do some considerations now:

> we need to understand when the data are available and how to model to make them usable.
    
We have to _map the flows with the corrispondent day_, so we need to make a distinction between days in the timeline.

What we have is:
- data that refers to the day ahead (t+1):
    - _prices_
    - _load forecast_
- data that refers to the current day (t):
     - _generation forecast_ 
     - _wind&solar forecast_
- data that refers to previous days (t-1)
    - _crossborder_flows_
    
__example__: today is 2018-12-30. I have:
 - prices until today
 - load forecast until tomorrow
 - generation + wind&solar forecast of today
 - crossborder flows until yesterday

# merge all and create dataset

In [9]:
dataset = pd.concat([prices, 
                     total_load_forecast,
                     generation_forecast.shift(1, 'd'),
                     wind_solar_forecast.shift(1, 'd'),
                     crossborder_flows['crossborder_flow_FR-BE'].shift(2, 'd'),
                     crossborder_flows['crossborder_flow_BE-FR'].shift(2, 'd'),
                     crossborder_flows['crossborder_flow_FR-CH'].shift(2, 'd'),
                     crossborder_flows['crossborder_flow_CH-FR'].shift(2, 'd'),
                     crossborder_flows['crossborder_flow_FR-ES'].shift(2, 'd'),
                     crossborder_flows['crossborder_flow_ES-FR'].shift(2, 'd'),
                     crossborder_flows['crossborder_flow_FR-DE'].shift(2, 'd'),
                     crossborder_flows['crossborder_flow_DE-FR'].shift(2, 'd'),
                     crossborder_flows['crossborder_flow_FR-IT'].shift(2, 'd'),
                     crossborder_flows['crossborder_flow_IT-FR'].shift(2, 'd'),
                    ], 1)
dataset = dataset.tz_convert('Europe/Paris')
dataset.to_pickle('data/dataset.pkl')