Set of tools to combine multiple power plant databases
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

README.md

powerplantmatching

A toolset for cleaning, standardizing and combining multiple power plant databases.

This package provides ready-to-use power plant data for the European power system. Starting from openly available power plant datasets, the package cleans, standardizes and merges the input data to create a new combining dataset, which includes all the important information. The major advantage of this procedure is that the resulting dataset can be easily updated as soon as new input datasets are released.

Map of power plants in Europe

powerplantmatching was initially developed by the Renewable Energy Group at FIAS to build power plant data inputs to PyPSA-based models for carrying out simulations for the CoNDyNet project, financed by the German Federal Ministry for Education and Research (BMBF) as part of the Stromnetze Research Initiative.

What it can do

  • clean and standardize power plant data sets
  • aggregate power plants units which belong to the same plant
  • compare and combine different data sets
  • create lookups and give statistical insight to power plant goodness
  • provide cleaned data from different sources
  • choose between gros/net capacity
  • provide an already merged data set of six different data-sources

Processed Data

If you are only interested in the power plant data, we provide our current merged dataset for European power plants as a csv-file. This set combines the data of all the data sources listed in Data-Sources and provides the following information:

  • Power plant name - claim of each database
  • Fueltype - {Bioenergy, Geothermal, Hard Coal, Hydro, Lignite, Nuclear, Natural Gas, Oil, Solar, Wind, Other}
  • Technology - {CCGT, OCGT, Steam Turbine, Combustion Engine, Run-Of-River, Pumped Storage, Reservoir}
  • Set - {Power Plant (PP), Combined Heat and Power (CHP), Storages (Stores)}
  • Capacity - [MW]
  • Geo-position - Latitude, Longitude
  • Country - EU-27 + CH + NO (+ UK) minus Cyprus and Malta
  • YearCommissioned - Commmisioning year of the powerplant
  • RetroFit - Year of last retrofit
  • File - Source file of the data entry
  • projectID - Immutable identifier of the power plant

The current release together with the open version (without ESE dataset) of the processed data is stored on Zenodo:

DOI

In order include all datasources, please install the package and recompile the full matched data.

The following picture compares the total capacities per fuel type between the different data sources and the resulting dataset

Total capacities per fuel type for the different data sources and the merged dataset.

Comparing the aggregated capacities per country and fuel type with the capacity statistics provided by the ENTSOE:

Capacity statistics comparison

Installation

  1. Make sure that git lfs is installed, in case of doubt just run git lfs install
  2. Copy or clone the repository to a directory of your choosing /path/to/powerplantmatching
    cd /path/to
    git clone https://github.com/FRESNA/powerplantmatching.git
  3. If you use conda (if not skip this step), install the requirements from requirements.yaml into a new environment powerplantmatching and activate it.
    conda env create -f powerplantmatching/requirements.yaml
    source activate powerplantmatching
  4. Install the package using pip
    pip install -e ./powerplantmatching'
  5. Copy config_example.yaml to config.yaml.

Optional but recommended:resulting

  1. Download the ESE dataset and store it under /path/to/powerplantmatching/data/in/projects.xls.
  2. Add your ENTSOE security token to the config.yaml file. The token can be obtained by following section 2 of the RESTful API documentation of the ENTSOE-E Transparency platform.

Optional:

  1. Add your Google API key to the config.yaml file to enable geoparsing. The key can be obtained by following the instructions.

Once set up the package, the full database is available through the python command

import powerplantmatching as pm
pm.collection.matched_data()

Note, that for the compilation this will take its time (about 30 min for the standard data sources.)

Make your own configuration

You have the option to easily manipulate the resulting data. Through the config.yaml file you can

  • determine the global set of countries and fueltypes

  • determine which data sources to combine and which data sources should completely be contained in the final dataset

  • individually filter data sources via a pandas.DataFrame.query statement set as an argument of data source name in your config.yaml (see config_example.yaml).

The config_example.yaml provides an adjusted configuration for an european dataset.
Further you can

  • scale the power plant capacities in order to match country specific statistics about total power plant capacities

  • trace back which original data flew into the resulting data.

  • visualize the data

  • export your powerplant data to a PyPSA or TIMES model

There is a (bit out of date) Documentation available, which (however) gives you some more extensive insight on the coding level.

Data-Sources:

The merged dataset is available in two versions: The bigger dataset links the entries of the matched power plants and lists all the related properties given by the different data-sources. The smaller, reduced dataset claims only the value of the most reliable data source being matched in the individual power plant data entry. The considered reliability scores are:

Dataset Reliabilty score
BNETZA 5
CARMA 1
ENTSOE 4
ESE 4
GEO 3
IWPDCY 3
OPSD 5
UBA 5
GPD 3

Module Structure

The package consists of ten modules. For creating a new dataset you can make most use of the modules data, clean and match, which provide you with function for data supply, vertical cleaning and horizontal matching, respectively.

Modular package structure

How it works

Whereas single databases as the CARMA, GEO or the OPSD database provide non standardized and incomplete information, the datasets can complement each other and improve their reliability. In a first step, powerplantmatching converts all powerplant dataset into a standardized format with a defined set of columns and values. The second part consists of aggregating power plant blocks together into units. Since some of the datasources provide their powerplant records on unit level, without detailed information about lower-level blocks, comparing with other sources is only possible on unit level. In the third and name-giving step the tool combines (or matches)different, standardized and aggregated input sources keeping only powerplants units which appear in more than one source. The matched data afterwards is complemented by data entries of reliable sources which have not matched.

The aggregation and matching process heavily relies on DUKE, a java application specialized for deduplicating and linking data. It provides many built-in comparators such as numerical, string or geoposition comparators. The engine does a detailed comparison for each single argument (power plant name, fuel-type etc.) using adjusted comparators and weights. From the individual scores for each column it computes a compound score for the likeliness that the two powerplant records refer to the same powerplant. If the score exceeds a given threshold, the two records of the power plant are linked and merged into one data set.

Let's make that a bit more concrete by giving a quick example. Consider the following two data sets

Dataset 1:

Name Fueltype Classification Country Capacity lat lon File
0 Aarberg Hydro nan Switzerland 14.609 47.0444 7.27578 nan
1 Abbey mills pumping Oil nan United Kingdom 6.4 51.687 -0.0042057 nan
2 Abertay Other nan United Kingdom 8 57.1785 -2.18679 nan
3 Aberthaw Coal nan United Kingdom 1552.5 51.3875 -3.40675 nan
4 Ablass Wind nan Germany 18 51.2333 12.95 nan
5 Abono Coal nan Spain 921.7 43.5588 -5.72287 nan

and

Dataset 2:

Name Fueltype Classification Country Capacity lat lon File
0 Aarberg Hydro nan Switzerland 15.5 47.0378 7.272 nan
1 Aberthaw Coal Thermal United Kingdom 1500 51.3873 -3.4049 nan
2 Abono Coal Thermal Spain 921.7 43.5528 -5.7231 nan
3 Abwinden asten Hydro nan Austria 168 48.248 14.4305 nan
4 Aceca Oil CHP Spain 629 39.941 -3.8569 nan
5 Aceca fenosa Natural Gas CCGT Spain 400 39.9427 -3.8548 nan

where Dataset 2 has the higher reliability score. Apparently entries 0, 3 and 5 of Dataset 1 relate to the same power plants as the entries 0,1 and 2 of Dataset 2. The toolset detects those similarities and combines them into the following set, but prioritising the values of Dataset 2:

Name Country Fueltype Classification Capacity lat lon File
0 Aarberg Switzerland Hydro nan 15.5 47.0378 7.272 nan
1 Aberthaw United Kingdom Coal Thermal 1500 51.3873 -3.4049 nan
2 Abono Spain Coal Thermal 921.7 43.5528 -5.7231 nan

Citing powerplantmatching

If you want to cite powerplantmatching, the current release is stored on Zenodo with a release-specific DOI:

DOI

Acknowledgements

The development of powerplantmatching was helped considerably by in-depth discussions and exchanges of ideas and code with

  • Tom Brown from Karlsruhe Institute for Technology
  • Chris Davis from University of Groningen and
  • Johannes Friedrich, Roman Hennig and Colin McCormick of the World Resources Institute

Licence

Copyright 2018-2020 Fabian Gotzens (FZ Jülich), Jonas Hörsch (KIT), Fabian Hofmann (FIAS)

powerplantmatching is released as free software under the GPLv3, see LICENSE for further information.