# Matching EUETS and ENTSO E data

This script provides information about the matching mechanism that was used to match the EUETS emission datat with ENTSO E production data.

# Script setup

In [1]:
import os
import logging

import pandas as pd

#helpers
from helpers import change_ENTSOE_ProductionTypeName

# Data directory preparation

In [2]:
# Create input, processed and output folders if they don't exist
# If the paths are relative, the correspoding folders will be created inside the current working directory.

input_directory_path = os.path.join('input')
Matching_methode_input_directory_path = os.path.join('input', 'Matching')
processed_directory_path = 'processed'
output_directory_path = os.path.join('output')

os.makedirs(input_directory_path, exist_ok=True)
os.makedirs(Matching_methode_input_directory_path, exist_ok=True)
os.makedirs(processed_directory_path, exist_ok=True)
os.makedirs(output_directory_path, exist_ok=True)

# Matching Method

In this section we describe the method that was used to match the ENTSO E power plant names (PowerSystemResourceName) with the ETS plant names (EUTL-ID).

Since ENTSO-E and ETS names are not the same in general, a more sophisticated and time expensive approach was used. The general process was:

    1.) Filter the EU ETS data to countries and than sort them according too the emissions (large emitter in the list most of the time are power plants)

    2.) Using a google search request in order to find out more information about the power plant. For example the exact name, address and owner can often be found on the Wikipedia web page.
    
    3.) The Entso-E dataset always provides an EIC number which is the "Energy Identification code" for power plants. Googling this code sometimes redirects to a webpage called "gem.wiki". It contains similar information like wikipedia but because it is dedicated to fossil fuel projects from time to time more information. (e.g. https://www.gem.wiki/Amfard_power_station)
    
    4.) Using the power plant name in the filter option on the entso e transparency website often results in the needed power plant EIC number
    
    5.)Probelm of different names of the power plant: Usually, either the name or the location of a power plant defines its name in the datasets. This will be usually reflected to some point in ENTSO E as well as ETS data, but in very different forms, so no standard algorithm can be applied and manual investigations was done to match such a power plant.
    
    6.) If names in this two datasets do not coincide at all, another factor was be exploited: The size of the power plant can be used, since most of the time we wanted to find the huge emitter. Since the emission factor is roughly known for each technology, one can specify a range how much this power plant has emitted. (since yearly generation is known) As there are only very few installations with such high emissions, usually less then 10 entries remain in the ETS-database. Further googling then the names of this installations usually reveal what it is referring to, where the plant is and whom it belongs to. using this technique usually provided the result then.
    
    7.) In some cases it was helpful to translate the INSTALLATION_NAME of the EU ETS dataset. An example is the Polish power plant "Kozienice". This power plant appears with two entries in the EU ETS register. 1. "Elektrownia Kozienice - blok energetyczny 11" and second "Elektrownia Kozienice - kot?ownia rozruchowa". Both entries refer to the coal-fired power plant. The second entry refers to the start-up boiler.
    
This method was performed for lignite, coal and gas power plants.

For Germany an already existing matching list for power plants was used additionally and included in our dataset.
Matching List for German power plants with Entso e identifier and the EUTL identifier.
Data download form: https://zenodo.org/record/3588418#.XxlZOufgq5h \

File - > Matching_Entso_EUTL_LCPD.csv

corresponding Paper: "Comparing empirical and model-based approaches for calculating dynamic grid emission factors: an application to CO2-minimizing storage dispatch in Germany"
https://linkinghub.elsevier.com/retrieve/pii/S0959652620316358

# Load data functions

In [3]:
def load_matching_data_EU(path, fn):
    """
    Load the matching list for EU power plants with ENTSO-E identifier and the EUTL identifier.
        
    Parameters
    ----------
    path: str
        path to data
    fn : str
        filename
        
    """
    
    df = pd.read_csv(os.path.join(path, fn), sep = ',', header = 0, index_col=0)

    return df

def load_unit_info(path, fn):
    """
    Load the ENTSO-E generation unit information.
        
    Parameters
    ----------
    path: str
        path to data
    fn : str
        filename
        
    """
    
    df = pd.read_csv(os.path.join(path, fn),sep = ',',index_col=0)
    
    # Rename production type name according to own convention
    df.ProductionTypeName = change_ENTSOE_ProductionTypeName(df.ProductionTypeName)
    
    # set name for the index
    df.index.set_names('GenerationUnitEIC', inplace=True)

    return df

def load_generation_per_unit(path, fn):
    """
    Load the ENTSO-E gernation per unit data.
        
    Parameters
    ----------
    path: str
        path to data
    fn : str
        filename
        
    """
    
    generation = pd.read_csv(os.path.join(path, fn),sep = ',',index_col=0,parse_dates=True)
    
    return generation

# Data loading and file preparation 

#### Load machting information for power plant data

In [4]:
unit_matching_EU = load_matching_data_EU(Matching_methode_input_directory_path, 'matching_ENTSOE_EU_ETS.csv')

#### Load power plant unit inforamtion (capacity, name, etc.)

In [5]:
generation_unit_info = load_unit_info(input_directory_path, 'unit_data_2018.csv')

#### Load power plant generation data

In [6]:
generation_per_unit = load_generation_per_unit(input_directory_path, 'gen_data_2018.csv')

# Data preperation

#### Yearly power generation per unit

In [7]:
generation_unit_info['generation_2018'] = generation_per_unit.sum()

#### Matching ENTSO-E ID and EUTL ID

In [8]:
generation_unit_info_matched = pd.merge(generation_unit_info, unit_matching_EU, left_on='GenerationUnitEIC', right_on='eic_g', how='inner')

# Matching Results

In [9]:
generation_unit_info_matched

Unnamed: 0,AreaCode,AreaName,AreaTypeCode,InstalledGenCapacity,MapCode,PowerSystemResourceName,ProductionTypeName,ProductionUnitEIC,duplicate_count,generation_2018,eic_g,eic_p,EUTL_countrycode,EUTL_ID
0,10YAT-APG------L,APG CA,CTA,140.0,AT,Lau GuD,gas,14W-PROD-LAU---8,2.0,0.000,14W-GEN-LAU----Z,14W-PROD-LAU---8,AT,86
1,10YAT-APG------L,APG CA,CTA,400.0,AT,Kraftwerk Timelkam GUD,gas,14WENERGIE--WT02,2.0,685235.670,14WENERGIEAGWT4S,14WENERGIE--WT02,AT,149
2,10YAT-APG------L,APG CA,CTA,332.0,AT,KW Dürnrohr Block 2,hard_coal,14W-KW-DU2-EVN-K,2.0,745290.410,14W-KW-DUE-EVN-A,14W-KW-DU2-EVN-K,AT,94
3,10YAT-APG------L,APG CA,CTA,150.0,AT,KW Riedersbach 2 G2,hard_coal,14WENERGIEAGWR05,2.0,0.000,14WENERGIEAGWR21,14WENERGIEAGWR05,AT,79
4,10Y1001A1001A796,Energinet CA,CTA,250.0,DK,Avedoerevaerket 1,biomass,45V0000000000091,1.0,646873.950,45W000000000029I,45W0000000000102,DK,42
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
838,10YGB----------A,National Grid CA,CTA,525.0,GB,EGGPS-3,hard_coal,48WSTN0000EGGPS5,0.0,236678.180,48W00000EGGPS-3T,48WSTN0000EGGPS5,GB,169
839,10YGB----------A,National Grid CA,CTA,260.0,GB,DEEP-1,gas,48WSTN00000DEEP3,0.0,335757.515,48W000000DEEP-1N,48WSTN00000DEEP3,GB,187
840,10YGB----------A,National Grid CA,CTA,525.0,GB,EGGPS-2,hard_coal,48WSTN0000EGGPS5,0.0,15438.130,48W00000EGGPS-2V,48WSTN0000EGGPS5,GB,169
841,10YGB----------A,National Grid CA,CTA,20.0,GB,WBUGT-4,gas,48WSTN0000WBUGT2,0.0,81.230,48W00000WBUGT-4N,48WSTN0000WBUGT2,GB,145


In [10]:
# total capcity
generation_unit_info_matched.InstalledGenCapacity.sum()

295018.8

In [11]:
for i in generation_unit_info_matched.ProductionTypeName.unique():
    print('Number of ' + i + ' units')
    print(len(generation_unit_info_matched[generation_unit_info_matched.ProductionTypeName == i]))
    print('Installed capacity of ' + i + ' in MW')
    print(generation_unit_info_matched[generation_unit_info_matched.ProductionTypeName == i].InstalledGenCapacity.sum())
    

Number of gas units
452
Installed capacity of gas in MW
152774.8
Number of hard_coal units
232
Installed capacity of hard_coal in MW
87003.0
Number of biomass units
3
Installed capacity of biomass in MW
1083.0
Number of lignite units
131
Installed capacity of lignite in MW
45173.0
Number of other_fossil units
25
Installed capacity of other_fossil in MW
8985.0


In [12]:
filter = ['gas', 'hard_coal', 'lignite']

In [13]:
units = generation_unit_info.query('ProductionTypeName in @filter')

In [14]:
units_matches = generation_unit_info_matched.query('ProductionTypeName in @filter')

In [15]:
len(units) - len(units_matches)

60

In [16]:
len(units)

875

In [17]:
units_matches.InstalledGenCapacity.sum()

284950.79999999993

In [18]:
units.query('GenerationUnitEIC not in @units_matches.eic_g').InstalledGenCapacity.sum()

18537.6

In [19]:
units_matches.eic_g

0      14W-GEN-LAU----Z
1      14WENERGIEAGWT4S
2      14W-KW-DUE-EVN-A
3      14WENERGIEAGWR21
5      45W000000000032T
             ...       
838    48W00000EGGPS-3T
839    48W000000DEEP-1N
840    48W00000EGGPS-2V
841    48W00000WBUGT-4N
842    48W000000FIDL-16
Name: eic_g, Length: 815, dtype: object