# Matching EUETS and ENTSO E data

This script provides information about the matching mechanism that was used to match the EUETS emission datat with ENTSO E production data.

# Script setup

In [1]:
import os
import logging

import pandas as pd

#helpers


# Data directory preparation

In [2]:
# Create input, processed and output folders if they don't exist
# If the paths are relative, the correspoding folders will be created inside the current working directory.

input_directory_path = os.path.join('input')
Matching_methode_input_directory_path = os.path.join('input', 'Matching')
processed_directory_path = 'processed'
output_directory_path = os.path.join('output')

os.makedirs(input_directory_path, exist_ok=True)
os.makedirs(Matching_methode_input_directory_path, exist_ok=True)
os.makedirs(processed_directory_path, exist_ok=True)
os.makedirs(output_directory_path, exist_ok=True)

# Matching Method

In this section we describe the method that was used to match the ENTSO E power plant names (PowerSystemResourceName) with the ETS plant names (EUTL-ID).

Since ENTSO-E and ETS names are not the same in general, a more sophisticated and time expensive approach was used. The general process was:

    1.) Using a google search request in order to find out more information about the power plant. For example the address and owner can often be found on the Wikipedia web page.
    
    2.) The Entso E dataset always provides an EIC number which is the "Energy Identification code" for power plants. Googling this code sometimes redirects to a webpage called "gem.wiki". It contains similar information like wikipedia but because it is dedicated to fossil fuel projects from time to time more information. (e.g. https://www.gem.wiki/Amfard_power_station)
    
    3.)Probelm of different names of the power plant: Usually, either the name or the location of a power plant defines its name in the datasets. This will be usually reflected to some point in ENTSO E as well as ETS data, but in very different forms, so no standard algorithm can be applied and manual investigations was done to match such a power plant.
    
    4.) If names in this two datasets do not coincide at all, another factor was be exploited: The size of the power plant can be used, since most of the time we wanted to find the huge emitter. Since the emission factor is roughly known for each technology, one can specify a range how much this power plant has emitted. (since yearly generation is known) As there are only very few installations with such high emissions, usually less then 10 entries remain in the ETS-database. Further googling then the names of this installations usually reveal what it is referring to, where the plant is and whom it belongs to. using this technique usually provided the result then.
    
This method was performed for lignite, coal and gas power plants.

For Germany an already existing matching list for power plants was used additionally and included in our dataset.
Matching List for German power plants with Entso e identifier and the EUTL identifier.
Data download form: https://zenodo.org/record/3588418#.XxlZOufgq5h \

File - > Matching_Entso_EUTL_LCPD.csv

corresponding Paper: "Comparing empirical and model-based approaches for calculating dynamic grid emission factors: an application to CO2-minimizing storage dispatch in Germany"
https://linkinghub.elsevier.com/retrieve/pii/S0959652620316358

# Data loading and file preparation 

Load the csv files with the matching information.

In [3]:
lignite_pp = pd.read_csv(os.path.join(Matching_methode_input_directory_path, 'Lignite_pps_matched.csv'), sep = ';', header = 0)
coal_pp = pd.read_csv(os.path.join(Matching_methode_input_directory_path, 'Hardcoal_pps_matched.csv'), sep = ';', header = 0)
gas_pp = pd.read_csv(os.path.join(Matching_methode_input_directory_path, 'Gas_pps_matched.csv'), sep = ';', header = 0)

Combine the matching information into one dataframe.

In [4]:
Matches_pp = pd.concat([coal_pp[['PowerSystemResourceName','countrycode','EUTL_ID']],
                        gas_pp[['PowerSystemResourceName','countrycode','EUTL_ID']],
                        lignite_pp[['PowerSystemResourceName','countrycode','EUTL_ID']]], ignore_index=True)


Export the matching list. 

In [5]:
Matches_pp.to_csv(processed_directory_path + '/Matching_Entso_EUTL_EU.csv')

In [6]:
Matches_pp

Unnamed: 0,PowerSystemResourceName,countrycode,EUTL_ID
0,ABOÑO 1,ES,201.0
1,ABOÑO 2,ES,201.0
2,ABTH7,GB,188.0
3,ABTH8,GB,188.0
4,ABTH9,GB,188.0
...,...,...,...
848,Turów B06,PL,3.0
849,Weisweiler E,DE,1607.0
850,Weisweiler F,DE,1607.0
851,Weisweiler G,DE,1607.0


# this was a check if the other files have more matches

In [6]:
#lig_2 = pd.read_csv(os.path.join(Matching_methode_input_directory_path, 'Lignite_pps.csv'), sep = ';', header = 0,encoding= 'unicode_escape')
#gas_2 = pd.read_csv(os.path.join(Matching_methode_input_directory_path, 'Gas_pps.csv'), sep = ';', header = 0,encoding= 'unicode_escape')
#coal_2 = pd.read_csv(os.path.join(Matching_methode_input_directory_path, 'Hardcoal_pps.csv'), sep = ';', header = 0,encoding= 'unicode_escape')

In [7]:
#Matches_2 = pd.concat([coal[['PowerSystemResourceName','countrycode','EUTL_ID']], gas[['PowerSystemResourceName','countrycode','EUTL_ID']], lig[['PowerSystemResourceName','countrycode','EUTL_ID']]], ignore_index=True)
#Matches_2.rename(columns = {'identifier_guess':'EUTL_ID'}, inplace = True)
#Matches_2.to_csv(processed_directory_path + '/Matching_Entso_EUTL_EU_2.csv')

