# Introduction

In this Jupyter Notebook we importing the ENTSO-E Actual Generation per Type data (processed with OPSD time series script) and correcting the hourly data with reported yearly values from ENTSO-E statistical factsheet.

The OPSD time series script converts all data to one resolution (1 hour) and interpolate gaps in the datat set with a max length of 2 hours. 

# Script setup

In [1]:
import numpy as np
import pandas as pd

#Helpers
import os
import glob
from datetime import datetime, date, timedelta, time


#Ploting
import matplotlib.pyplot as plt

%matplotlib inline
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = [15, 6]

# Data directory preparention

Create input, processed and output folders if they don't exist. If the paths are relative, the corresponding folders will be created inside the current working directory.
- input -> all needed input data
- processed -> save point and exchange with other scripts
- output -> final data

In [2]:
input_directory_path = os.path.join('input')
processed_directory_path = 'processed'
output_directory_path = os.path.join('output')

os.makedirs(input_directory_path, exist_ok=True)
os.makedirs(processed_directory_path, exist_ok=True)
os.makedirs(output_directory_path, exist_ok=True)

# Data file preparation

1. ENTSO-E Transparency Platform, Actual Generation per Type Available online: https://transparency.entsoe.eu/generation/r2/actualGenerationPerProductionType/show (accessed on Oct 02, 2020).
 - Proccesed with OPSD time series scrips (own version INETCH)
 time_series_60min_stacked.csv
 
2. ENTSO-E Statistical Factsheet 2018 in Comma Separated Value (CSV) format
https://zenodo.org/record/3461691


Original data: ENTSO-E statistics Available online:
https://www.entsoe.eu/publications/statistics-and-data/#statistical-factsheet

 File - > entsoe-statistical-factsheet-2018-stacked



Previously we used the Energy Balances in the MS Excel file format (2020 edition) eurostat https://ec.europa.eu/eurostat/de/web/energy/data/energy-balances (accessed on Oct 02, 2020). The Energy Balance data only provides gross electricity data.

# Load data functions

In [3]:
def load_timeseries_opsd(fn):
    """
    Read data from OPSD time-series package own modification.

    Parameters
    ----------
    years : None or slice()
        Years for which to read load data
        
    fn : file name or url location (file format .csv)
    
    countries : Countries for which to read load data.
        
    Returns
    -------
    load : pd.DataFrame
        Load time-series with UTC timestamps x ISO-2 countries
    """

     
    generation = pd.read_csv(fn, index_col='utc_timestamp', parse_dates=True) #, header=[0, 1, 2, 3, 4, 5], parse_dates=True)
                    #.dropna(how="all", axis=0)  
        
    #generation.columns = generation.columns.droplevel(level=[2,3,4,5])

    generation = generation.drop(columns='attribute')
    
    return generation

def load_stats_factsheet(path, fn):
    """
    Load the ENTSO-E Statistical Factsheet 2018 in Comma Separated Value (CSV).
        
    Parameters
    ----------
    path: str
        path to data
    fn : str
        filename
        
    """
    
    generation = pd.read_csv(os.path.join(path, fn),index_col=[0], header=[0, 1], parse_dates=True)
    
    return generation

In [4]:
def convert_ENTSOE_to_INATECH_type(ProductionTypeName):
    """
    Converts ENTSO-E Generation per Type source names into INATECH technology type names.

    Parameters
    ----------
    ProductionTypeName : string
        ENTSO-E name of production type.

    Returns
    -------
    string
        INATECH names of production type.

    """

    return ProductionTypeName.replace({
            'Biomass': 'biomass',
            'Fossil Brown coal/Lignite': 'lignite',
            'Fossil Gas': 'gas',
            'Fossil Coal-derived gas': 'other_fossil',
            'Fossil Hard coal': 'hard_coal',
            'Fossil Oil': 'other_fossil',
            'Fossil Peat': 'other_fossil',
            'Geothermal': 'other_renewable',
            'Hydro Pumped Storage': 'hydro',
            'Hydro Run-of-river and poundage': 'hydro',
            'Hydro Water Reservoir': 'hydro',
            'Other': 'other_fossil',
            'Solar': 'solar',
            'Waste': 'other_fossil',
            'Wind Onshore': 'wind_onshore',
            'Wind Offshore': 'wind_offshore',
            'Nuclear': 'nuclear',
            'Other renewable': 'other_renewable'
                }, inplace=False)

# Load and filter data¶

In [5]:
# period filter
start = '2018-01-01 00:00:00+00:00'
end = '2018-12-31 23:00:00+00:00'

## load and standardize data timeseries_opsd

In [6]:
entsoe_gen_type = load_timeseries_opsd(fn=input_directory_path + '/time_series_60min_stacked.csv')

In [7]:
entsoe_gen_type = entsoe_gen_type.loc[start:end].copy()

In [8]:
entsoe_gen_type.region.unique()

array(['AT', 'BE', 'BG', 'CH', 'CY', 'CZ', 'DE', 'DE_50hertz', 'DE_LU',
       'DE_amprion', 'DE_tennet', 'DE_transnetbw', 'DK', 'DK_1', 'DK_2',
       'DK_energinet', 'EE', 'ES', 'FI', 'FR', 'GB_GBN', 'GB_NIR',
       'GB_UKM', 'GR', 'HU', 'IE', 'IE_sem', 'IT', 'IT_BRNN', 'IT_CNOR',
       'IT_CSUD', 'IT_FOGN', 'IT_NORD', 'IT_PRGP', 'IT_ROSN', 'IT_SARD',
       'IT_SICI', 'IT_SUD', 'LT', 'LV', 'ME', 'NL', 'NO', 'NO_1', 'NO_2',
       'NO_3', 'NO_4', 'NO_5', 'PL', 'PT', 'RO', 'RS', 'SE', 'SE_1',
       'SE_2', 'SE_3', 'SE_4', 'SI', 'SK'], dtype=object)

In [9]:
# replace region GB_UKM with GB
entsoe_gen_type.region = entsoe_gen_type.region.replace({'GB_UKM' : 'GB'})

In [10]:
entsoe_gen_type.variable.unique()

array(['Biomass', 'Fossil Gas', 'Fossil Hard coal', 'Geothermal',
       'Hydro Pumped Storage', 'Hydro Run-of-river and poundage',
       'Hydro Water Reservoir', 'Other', 'Solar', 'Waste', 'Wind Onshore',
       'Fossil Oil', 'Nuclear', 'Wind Offshore',
       'Fossil Brown coal/Lignite', 'Fossil Coal-derived gas',
       'Other renewable', 'Fossil Peat'], dtype=object)

In [11]:
# change ProductionTypeNames into INATECH technology type names
entsoe_gen_type.variable = convert_ENTSOE_to_INATECH_type(entsoe_gen_type.variable)

In [12]:
entsoe_gen_type.variable.unique()

array(['biomass', 'gas', 'hard_coal', 'other_renewable', 'hydro',
       'other_fossil', 'solar', 'wind_onshore', 'nuclear',
       'wind_offshore', 'lignite'], dtype=object)

In [13]:
# after changing the production type we need to group the dataset
# reset index for groupby function
entsoe_gen_type.reset_index(inplace=True)
# group same production types
entsoe_gen_type = entsoe_gen_type.groupby(['variable', 'utc_timestamp','region']).sum()
# set the old index for the dataframe
entsoe_gen_type = entsoe_gen_type.reset_index().set_index('utc_timestamp')

In [14]:
# show the head of the data set 
entsoe_gen_type.head(10)

Unnamed: 0_level_0,variable,region,data
utc_timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2018-01-01 00:00:00+00:00,biomass,AT,316.0
2018-01-01 00:00:00+00:00,biomass,BE,232.46
2018-01-01 00:00:00+00:00,biomass,BG,28.0
2018-01-01 00:00:00+00:00,biomass,CZ,261.78
2018-01-01 00:00:00+00:00,biomass,DE,4764.0
2018-01-01 00:00:00+00:00,biomass,DE_50hertz,1199.0
2018-01-01 00:00:00+00:00,biomass,DE_amprion,937.0
2018-01-01 00:00:00+00:00,biomass,DE_tennet,2118.0
2018-01-01 00:00:00+00:00,biomass,DE_transnetbw,510.0
2018-01-01 00:00:00+00:00,biomass,DK,587.0


In [15]:
# convert entso data into table format
entsoe_gen_type_table =  pd.pivot_table(entsoe_gen_type, values='data', index=entsoe_gen_type.index, columns=['region','variable'])

In [16]:
entsoe_gen_type_table.head()

region,AT,AT,AT,AT,AT,AT,AT,AT,BE,BE,...,SI,SK,SK,SK,SK,SK,SK,SK,SK,SK
variable,biomass,gas,hard_coal,hydro,other_fossil,other_renewable,solar,wind_onshore,biomass,gas,...,wind_onshore,biomass,gas,hard_coal,hydro,lignite,nuclear,other_fossil,other_renewable,solar
utc_timestamp,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2018-01-01 00:00:00+00:00,316.0,71.0,150.0,3345.0,122.0,0.0,0.0,832.0,232.46,1499.9,...,0.7,28.2,175.6,48.1,455.93,179.5,1819.3,478.8,41.4,0.0
2018-01-01 01:00:00+00:00,316.0,70.0,149.0,3275.0,122.0,0.0,0.0,926.0,153.58,1565.88,...,0.61,28.1,172.9,48.0,461.95,177.9,1816.7,475.8,41.0,0.0
2018-01-01 02:00:00+00:00,316.0,69.0,149.0,3167.0,122.0,0.0,0.0,692.0,133.7,1552.23,...,0.57,28.1,162.1,46.7,513.39,174.7,1808.7,469.5,40.8,0.0
2018-01-01 03:00:00+00:00,316.0,72.0,149.0,3160.0,122.0,0.0,0.0,453.0,131.38,1515.33,...,0.62,29.7,140.3,45.3,587.83,168.5,1803.6,455.6,39.4,0.0
2018-01-01 04:00:00+00:00,316.0,75.0,149.0,3330.0,122.0,0.0,0.0,321.0,131.26,1529.68,...,0.56,29.5,143.5,46.4,562.89,170.6,1809.1,461.5,40.4,0.0


## load and standardize data Stats FACT

In [17]:
# load data
entsoe_stats = load_stats_factsheet(input_directory_path, 'Stats_FACT_table.csv')

In [18]:
# show the data set
entsoe_stats

country,AL,AL,AL,AL,AL,AL,AL,AL,AL,AL,...,TR,TR,TR,TR,TR,TR,TR,TR,TR,TR
source,biomass,gas,hard_coal,hydro,lignite,nuclear,other_fossil,other_renewable,solar,wind_offshore,...,gas,hard_coal,hydro,lignite,nuclear,other_fossil,other_renewable,solar,wind_offshore,wind_onshore
year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2018-01-01,0.0,0.0,0.0,8100000.0,0.0,0.0,0.0,0.0,0.0,0.0,...,89400000.0,67800000.0,59800000.0,44800000.0,0.0,1400000.0,6900000.0,7200000.0,0.0,19900000.0


## Scaling entsoe generation data

In [19]:
# available countries in entsoe stats data set
countries = entsoe_stats.columns.get_level_values('country').unique().to_list()

In [20]:
# filter entso e data
entsoe_gen_type.query('region in @countries', inplace= True)

In [21]:
def normed(x): return x.divide(x.sum())

countries = entsoe_gen_type_table.columns.levels[0].to_list()

entso_scaled = entsoe_gen_type_table.copy()

for country in countries:
    for i in entso_scaled[country]:
        try:
            entso_scaled[(country,i)] = normed(entso_scaled[(country,i)]) * entsoe_stats[country].at['2018',i]
        except KeyError:
            pass
            print(country + ' ' +i)

DE_50hertz biomass
DE_50hertz gas
DE_50hertz hard_coal
DE_50hertz hydro
DE_50hertz lignite
DE_50hertz other_fossil
DE_50hertz other_renewable
DE_50hertz solar
DE_50hertz wind_offshore
DE_50hertz wind_onshore
DE_LU biomass
DE_LU gas
DE_LU hard_coal
DE_LU hydro
DE_LU lignite
DE_LU nuclear
DE_LU other_fossil
DE_LU other_renewable
DE_LU solar
DE_LU wind_offshore
DE_LU wind_onshore
DE_amprion biomass
DE_amprion gas
DE_amprion hard_coal
DE_amprion hydro
DE_amprion lignite
DE_amprion nuclear
DE_amprion other_fossil
DE_amprion other_renewable
DE_amprion solar
DE_amprion wind_onshore
DE_tennet biomass
DE_tennet gas
DE_tennet hard_coal
DE_tennet hydro
DE_tennet lignite
DE_tennet nuclear
DE_tennet other_fossil
DE_tennet other_renewable
DE_tennet solar
DE_tennet wind_offshore
DE_tennet wind_onshore
DE_transnetbw biomass
DE_transnetbw gas
DE_transnetbw hard_coal
DE_transnetbw hydro
DE_transnetbw nuclear
DE_transnetbw other_fossil
DE_transnetbw other_renewable
DE_transnetbw solar
DE_transnetbw wind_

In [22]:
entsoe_gen_type_table['DE'].sum()

variable
biomass             40184808.0
gas                 42959069.0
hard_coal           71546375.0
hydro               25316891.0
lignite            128361330.0
nuclear             71844721.0
other_fossil        10037865.0
other_renewable      1394941.0
solar               41231973.0
wind_offshore       19075448.0
wind_onshore        89488871.0
dtype: float64

In [23]:
entso_scaled['DE'].sum()

variable
biomass            4.010000e+07
gas                8.730000e+07
hard_coal          7.290000e+07
hydro              2.510000e+07
lignite            1.348000e+08
nuclear            7.190000e+07
other_fossil       1.100000e+07
other_renewable    6.100000e+06
solar              4.120000e+07
wind_offshore      1.900000e+07
wind_onshore       8.820000e+07
dtype: float64

In [24]:
entsoe_stats['DE'].sum()

source
biomass             40100000.0
gas                 87300000.0
hard_coal           72900000.0
hydro               25100000.0
lignite            134800000.0
nuclear             71900000.0
other_fossil        11000000.0
other_renewable      6100000.0
solar               41200000.0
wind_offshore       19000000.0
wind_onshore        88200000.0
dtype: float64

# Export datasets

In [25]:
entsoe_gen_type.to_csv(output_directory_path + '/entsoe_gen_type_hourly.csv')

In [26]:
entso_scaled.to_csv(output_directory_path + '/entso_gen_type_hourly_scaled.csv')