# Table of Contents
* [1. Aggregated generation capacity by technology and country](#1.-Aggregated-generation-capacity-by-technology-and-country)
	* [1.1 Prepare Environment](#1.1-Prepare-Environment)
	* [1.2 Import raw data from Excel-file](#1.2-Import-raw-data-from-Excel-file)
	* [1.3 Convert raw data to list](#1.3-Convert-raw-data-to-list)
	* [1.4 Define technology levels](#1.4-Define-technology-levels)
* [2. Documenting the data package (meta data)](#2.-Documenting-the-data-package-%28meta-data%29)
* [3. Write results to file](#3.-Write-results-to-file)


# 1. Aggregated generation capacity by technology and country

The script processes the compiled nationally aggregated generation capacities for European countries. Due to varying formats and data specifications of references for national generation capacities, the script focuses on the rearranging compiled data. Thus, the script itself does not collect select, download or manages data from original sources.

## 1.1 Prepare Environment

In [None]:
# Jan: numpy wird nicht verwendet, logging hier auch importieren!
import pandas as pd
import numpy as np
import os.path
import yaml  # http://pyyaml.org/, pip install pyyaml, conda install pyyaml
import json
import subprocess
import sqlite3 

%matplotlib inline
import logging
logger = logging.getLogger('notebook')
logger.setLevel('INFO')
nb_root_logger = logging.getLogger()
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s',
                              datefmt='%d %b %Y %H:%M:%S')
nb_root_logger.handlers[0].setFormatter(formatter)

#create download and output folder if they do not exist
os.makedirs('data_final', exist_ok=True)

## 1.2 Import raw data from Excel-file

The manually compiled dataset is imported to Ipython.

In [None]:
data_file = 'National_Generation_Capacities.xlsx'
# Jan: Auf Windows wird dies nicht funktionieren, da der Seperator \ ist. 
# Mit os.path.join können OS unabhängige Path gebaut werden. 
filepath = 'data_downloaded/'+data_file
data_raw = pd.read_excel(filepath,
                         sheetname='Summary',
                         header=None,
                         na_values=['-'],
                         skiprows=0)

# Deal with merged cells in Excel:Fill first three rows with information
data_raw.iloc[0:2] = data_raw.iloc[0:2].fillna(method='ffill', axis=1)

# Set index for rows
data_raw = data_raw.set_index([0])
data_raw.index.name = 'technology'

# set multiindex column names
data_raw.columns = pd.MultiIndex.from_arrays(
    data_raw[:6].values, 
    names=['country', 'type', 'year', 'source',
           'source_type', 'capacity_definition'])

# remove 3 rows which are already used as column names
data_raw = data_raw[pd.notnull(data_raw.index)]

data_raw


## 1.3 Convert raw data to list

Convert the initial crosstab format of the input data to a list in order to improve the handling of the data within the code.

In [None]:
# Jan: hinter jedem Komma sollte ein Leerzeichen sein. 
# Reshape Dateframe to list
pd.DataFrame(data_raw.stack(level=['source','source_type','year','type','country','capacity_definition']))

In [None]:
# Reshape Dateframe to list
data = pd.DataFrame(data_raw.stack(level=['source','source_type','year','type','country','capacity_definition']))
# Jan: Einheitlich um = Leerzeichen setzten. 
# reset index for Dataframe
data=data.reset_index()
data['technology'] = data['technology'].str.replace('- ','')
data=data.set_index('technology')

# delete entries with missing source
data = data[data['source'].isnull() == False]
data = data[data['source'] != 0]

data=data.rename(columns={0: 'capacity'})

data['capacity'] = pd.to_numeric(data['capacity'], errors='coerce')

data.head()

## 1.4 Define technology levels

Due to varying categorizations in the country-specific raw data, a revised categorization is implemented to reflect the detail level of the different national references. We specify the following four different technology levels:

- **Technology level 0** - Total generation capacity
- **Technology level 1** - Generation capacity by fuel type (fossil, nuclear, renewable, other)
- **Technology level 2** - Generation capacity by fuel (e.g. coal, lignite, hard coal, natural gas, wind)
- **Technology level 3** - Generation capacity by fuel and technology (e.g. combined cycle natural gas, gas turbine, onshore wind, offshore wind)

Finally, following categorization is used in the dataset and their assigned technology level. In order to ensure consistency of the dataset, some categories are labeled with multiple technology levels.

| Category | Technology Level | Note |
| :--- | :---: | :---|
| **Fossil fuels**  | 1  | |
| Lignite  | 2, 3  | |
| Hard coal  | 2, 3  | |
| Coal derivatives  | 2, 3  | |
| Oil  | 2, 3  | |
| Natural gas  | 2  | |
| *Combined cycle*  | 3  | |
| *Gas turbine*  | 3  | |
| *Other and unknown natural gas*  | 3 | Used if no precise information on technologies available |
| *Differently categorized natural gas*  | 3  | Used if no precise or other categorization on technologies available |
| Peat  | 2, 3  | |
| Waste (non-renewable)  | 2, 3  | |
| Mixed fossil fuels  | 2, 3  | |
| Differently categorized fossil fuels  | 2, 3  | Used if no precise or other categorization on technologies available |
| **Nuclear**  | 1, 2, 3 | |
| **Renewable energy sources**  | 1 | |
| Hydro  | 2 | |
| *Run-of-river*  | 3 | |
| *Reservoirs*  | 3 | |
| *Reservoirs incl. pumped storage*  | 3 | Reservoir and pumped storage capacity if not differentiated in reference.|
| *Pumped storage*  | 3 | |
| *Pumped storage with natural inflow*  | 3 | |
| *Differently categorized hydro*  | 3 | Used if no precise or other categorization on technologies available |
| Wind  | 2 | |
| *Onshore wind*  | 3 | |
| *Offshore wind* | 3 | |
| *Differenty categorized wind*  | 3 | Used if no precise or other categorization on technologies available|
| Solar  | 2 | |
| *Photovoltaic*  | 3 | |
| *Concentrated solar power*  | 3 | |
| *Differently categorized solar*  | 3 | Used if no precise or other categorization on technologies available|
| Geothermal  | 2, 3 | |
| Tide, wave, and ocean  | 2, 3 | |
| Bioenergy and other renewable fuels  | 2, 3 | |
| *Biomass*  | 3 | |
| *Biogas*  | 3 | |
| *Sewage and landfill gas*  | 3 | |
| *Other bioenergy including bio waste*  | 3 | |
| **Other or unspecified energy sources**  | 1, 2, 3 ||  |


The following table summarized the defined categories considered within the different technology levels:

| Technology level 0 | Technology level 1 | Technology level 2 | Technology level 3|
| :--- | :--- | :--- | :--- |
| Total| |||
| |Fossil fuels |||
| | |Lignite|Lignite|
| | |Hard coal|Hard coal|
| | |Coal derivatives|Coal derivatives|
| | |Oil|Oil|
| | |Natural gas||
| | ||Combined cycle|
| | ||Gas turbine|
| | ||Other and unknown natural gas|
| | ||Differently categorized natural gas|
| | |Peat|Peat|
| | |Waste (non-renewable)|Waste (non-renewable)|
| | |Mixed fossil fuels|Mixed fossil fuels|
| | | Differently categorized fossil fuels| Differently categorized fossil fuels|
| |Nuclear |Nuclear|Nuclear|
| |Renewable energy sources |||
| | |Hydro||
| | ||Run-of-river|
| | ||Reservoirs|
| | ||Reservoirs incl. pumped storage|
| | ||Pumped storage|
| | ||Pumped storage with natural inflow|
| | ||Differently categorized hydro|
| | |Wind||
| | ||Onshore wind|
| | ||Offshore wind|
| | ||Differenty categorized wind|
| | |Solar||
| | ||Photovoltaic|
| | ||Concentrated solar power|
| | ||Differently categorized solar|
| | |Geothermal |Geothermal |
| | |Tide, wave, and ocean|Tide, wave, and ocean|
| | |Bioenergy and other renewable fuel||
| | ||Biomass|
| | ||Biogas|
| | ||Sewage and landfill gas|
| | ||Other bioenergy including bio waste|
| |Other or unspecified energy sources |Other or unspecified energy sources|Other or unspecified energy sources||



In [None]:
# FRAUKE: Die lange Tabelle vorher als csv-Datei haben und die Info dann einlesen, mir
## kommt es so vor als könnte man das etwas programmiermäßiger lösen, wenn man die Info
## aus der Tabelle als csv-Datei hat, das einließt, darstellt um nicht eine Tabelle
## als Markdown zu machen und dann aber die Info schon hat und nicht so viel Zeilen
## für True, False zu verwenden
data['technology_level_0'] = False
data['technology_level_1'] = False
data['technology_level_2'] = False
data['technology_level_3'] = False

data.loc['Total','technology_level_0'] = True
# Jan: Leerzeichen hinter , laut PEP8
data.loc['Fossil fuels','technology_level_1'] = True
data.loc['Nuclear','technology_level_1'] = True
data.loc['Nuclear','technology_level_2'] = True
data.loc['Nuclear','technology_level_3'] = True
data.loc['Renewable energy sources','technology_level_1'] = True
data.loc['Other or unspecified energy sources','technology_level_1'] = True
data.loc['Other or unspecified energy sources','technology_level_2'] = True
data.loc['Other or unspecified energy sources','technology_level_3'] = True

data.loc['Lignite','technology_level_2'] = True
data.loc['Lignite','technology_level_3'] = True
data.loc['Hard coal','technology_level_2'] = True
data.loc['Hard coal','technology_level_3'] = True
data.loc['Coal derivatives','technology_level_2'] = True
data.loc['Coal derivatives','technology_level_3'] = True
data.loc['Oil','technology_level_2'] = True
data.loc['Oil','technology_level_3'] = True
data.loc['Natural gas','technology_level_2'] = True
data.loc['Combined cycle','technology_level_3'] = True
data.loc['Gas turbine','technology_level_3'] = True
data.loc['Other and unknown natural gas','technology_level_3'] = True
data.loc['Differently categorized natural gas','technology_level_3'] = True
data.loc['Peat','technology_level_2'] = True
data.loc['Peat','technology_level_3'] = True
data.loc['Waste (non-renewable)','technology_level_2'] = True
data.loc['Waste (non-renewable)','technology_level_3'] = True
data.loc['Mixed fossil fuels','technology_level_2'] = True
data.loc['Mixed fossil fuels','technology_level_3'] = True
data.loc['Differently categorized fossil fuels','technology_level_2'] = True
data.loc['Differently categorized fossil fuels','technology_level_3'] = True


data.loc['Hydro','technology_level_2'] = True
data.loc['Run-of-river','technology_level_3'] = True
data.loc['Reservoirs','technology_level_3'] = True
data.loc['Reservoirs incl. pumped storage','technology_level_3'] = True
data.loc['Pumped storage','technology_level_3'] = True
data.loc['Pumped storage with natural inflow','technology_level_3'] = True
data.loc['Differently categorized hydro','technology_level_3'] = True

data.loc['Wind','technology_level_2'] = True
data.loc['Onshore wind','technology_level_3'] = True
data.loc['Offshore wind','technology_level_3'] = True
data.loc['Differently categorized wind','technology_level_3'] = True

data.loc['Solar','technology_level_2'] = True
data.loc['Photovoltaic','technology_level_3'] = True
data.loc['Concentrated solar power','technology_level_3'] = True
data.loc['Differently categorized solar','technology_level_3'] = True

data.loc['Geothermal','technology_level_2'] = True
data.loc['Geothermal','technology_level_3'] = True
data.loc['Tide, wave, and ocean','technology_level_2'] = True
data.loc['Tide, wave, and ocean','technology_level_3'] = True
data.loc['Bioenergy and other renewable fuels','technology_level_2'] = True
data.loc['Biomass','technology_level_3'] = True
data.loc['Biogas','technology_level_3'] = True
data.loc['Sewage and landfill gas','technology_level_3'] = True
data.loc['Other bioenergy including bio waste','technology_level_3'] = True
# Jan: Leerzechen
data=data.reset_index()
data.head()


# 2. Documenting the data package (meta data)

We document the data packages meta data in the specific format JSON as proposed by the Open Knowledge Foundation. See the Frictionless Data project by OKFN (http://data.okfn.org/) and the Data Package specifications (http://dataprotocols.org/data-packages/) for more details.

In order to keep the notebook more readable, we first formulate the metadata in the human-readable YAML format using a multi-line string. We then parse the string into a Python dictionary and save that to disk as a JSON file.

In [None]:
# Here we define meta data of the resulting data package.
# The meta data follows the specification at:
# http://dataprotocols.org/data-packages/

metadata = """

name: opsd-national-generation-capacities
title: Aggregated generation capacity by technology and country
description: This dataset comprises technology-specific aggregated generation capacities for European countries. The generation capacities are consistently categorized based on fuel and technology. For each European country, various references are used ranging from international (e.g. ENTSOE or EUROSTAT) to national sources from e.g. regulatory authorities. The input data is processed in the script linked below. 
version: "2016-04-12"
keywords: [national generation capacities,europe]
opsd-jupyter-notebook-url: "https://github.com/Open-Power-System-Data/datapackage_national_generation_capacities/blob/master/main.ipynb"
geographical-scope: Europe
opsd-changes-to-last-version: Corrected wrong entry in AT

resources:
    - path: aggregated_capacity.csv
      format: csv
      mediatype: text/csv
      schema:    
        fields:
            - name: id
              description: ID for data entries 
              type: integer
            - name: technology
              description: Generation technologies defined by fuel and conversion technology
              type: string
            - name: source
              description: Source of data entry
              type: string
            - name: source_type
              description: Type of datasource
              type: string
            - name: year
              description: Year 
              type: integer
              format: YYYY
            - name: type
              description: Type of capacity
              type: string
            - name: country
              description: Country 
              type: string
            - name: capacity_definition
              description: Capacity definition used in the relevant source
              type: string
            - name: capacity
              description: Capacity in MW
              type: float
            - name: technology_level_0
              description: Technology level 0 (total aggregated capacity)
              type: boolean
            - name: technology_level_1
              description: Technology level 1 (aggregation by type of fuel)
              type: boolean
            - name: technology_level_2
              description: Technology level 2 (aggregation by fuel)
              type: boolean
            - name: technology_level_3
              description: Technology level 3 (aggregation by fuel and technology)
              type: boolean
    - path: aggregated_capacity.xlsx
      format: xlsx
      mediatype: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
              
licenses:
    - url: http://example.com/license/url/here
      name: License Name Here
      version: 1.0
      id: license-id-from-open

sources:
    - name: ENTSOE
      web: https://www.entsoe.eu/publications/system-development-reports/adequacy-forecasts/Pages/default.aspx
    - name: EUROSTAT
      web: http://ec.europa.eu/energy/en/statistics/country
    - name: e-control
      web: http://www.e-control.at/statistik/strom/bestandsstatistik
    - name: ELIA
      web: http://www.elia.be/en/grid-data/power-generation/generating-facilities
    - name: UN Statistical Office
      web: http://data.un.org/Data.aspx?d=EDATA&f=cmID%3AEC
    - name: BFE
      web: http://www.bfe.admin.ch/themen/00526/00541/00542/00630/index.html?dossier_id=00765
    - name: ERU
      web: http://www.eru.cz/en/elektrina/statistika-a-sledovani-kvality/rocni-zpravy-o-provozu
    - name: BMWi
      web: http://www.bmwi.de/BMWi/Redaktion/Binaer/Energiedaten/energietraeger10-stromerzeugungskapazitaeten-bruttostromerzeugung,property=blob,bereich=bmwi2012,sprache=de,rwb=true.xls
    - name: DEA
      web:  http://www.ens.dk/en/info/facts-figures/energy-statistics-indicators-energy-efficiency/annual-energy-statistics
    - name: REE
      web: http://www.ree.es/en/publications/statistical-data-of-spanish-electrical-system/national-indicators/
    - name: RTE 2014
      web: http://www.rte-france.com/en/document/overview-electrical-energy-france-march-2014
    - name: RTE 2015
      web:  http://clients.rte-france.com/lang/an/visiteurs/vie/prod/parc_reference.jsp
    - name: Terna 2013
      web: http://download.terna.it/terna/0000/0216/17.XLSX
    - name: Terna 2014
      web: http://download.terna.it/terna/0000/0216/16.XLSX
    - name: ILR
      web: http://www.ilr.public.lu/electricite/statistiques/index.html
    - name: Tennet NL
      web: http://energieinfo.tennet.org/dataexport/exporteerdatacountry.aspx?id=InstalledCapacity
    - name: CIRE
      web: http://www.rynek-energii-elektrycznej.cire.pl/st,33,207,tr,75,0,0,0,0,0,podstawowe-dane.html
    - name: TSO Bulgaria 
      web: http://www.tso.bg/uploads/file/Profile/en/ESO_Annual_Report_2012_en.pdf
    - name: Statistics Estonia
      web: http://pub.stat.ee/px-web.2001/Dialog/varval.asp?ma=FE032&ti=CAPACITY+AND+PRODUCTION+OF+POWER+PLANTS&path=../I_Databas/Economy/07Energy/02Energy_consumption_and_production/01Annual_statistics/&lang=1
    - name: Statistics Finland
      web: http://pxnet2.stat.fi/PXWeb/pxweb/en/StatFin/StatFin__ene__ehk/240_ehk_tau_112_en.px/table/tableViewLayout1/?rxid=31077c25-37e4-480e-81e6-49a66cbe4dc2
    - name: Department of Energy & Climate Change UK
      web: https://www.gov.uk/government/statistics/electricity-chapter-5-digest-of-united-kingdom-energy-statistics-dukes 
    - name: Regulatory Authority for Energy Greece 
      web:  http://www.rae.gr/site/file/system/docs/ActionReports/national_2012
    - name: Croation Transmission System operator (HOPS)
      web: https://www.hops.hr/wps/wcm/connect/fbb3e297-dbfc-437a-bd36-458e02b9e7e4/Temeljni+podaci+2013.pdf?MOD=AJPERES
    - name: Mavir 2014
      web: http://www.mavir.hu/documents/10262/188569160/BT_terv_2014/9946a7a2-38ec-4794-9d7f-96a7a927d1b9 
    - name: Mavir 2013
      web: http://www.mavir.hu/documents/10262/188569160/BT_terv_2013_11_12_EN/ea873e22-bf88-4ee4-8a00-db09030bbb34
    - name: Eirgrid
      web: http://www.soni.ltd.uk/media/documents/Operations/CapacityStatements/All%20Island%20Generation%20Capacity%20Statement%202015.%20-%202024..pdf
    - name: Litgrid
      web: http://www.litgrid.eu/index.php/power-system/power-system-information/generation-capacity/546 
    - name: Central Statistical Bureau of Latvia
      web: http://data.csb.gov.lv/pxweb/en/vide/vide__ikgad__energetika/EN0130.px/table/tableViewLayout1/?rxid=a79839fe-11ba-4ecd-8cc3-4035692c5fc8
    - name: Energy Ministry NO 2013
      web:  https://www.regjeringen.no/globalassets/upload/oed/faktaheftet/facts_energy_water.pdf
    - name: Energy Ministry NO 2015 
      web: https://www.regjeringen.no/contentassets/fd89d9e2c39a4ac2b9c9a95bf156089a/facts_2015_energy_and_water_web.pdf 
    - name: REN
      web: http://www.ren.pt/files/2015-05/2015-05-04145306_f7664ca7-3a1a-4b25-9f46-2056eef44c33$$72f445d4-8e31-416a-bd01-d7b980134d0f$$ee3c56e5-6d14-4aa0-ac1f-ca5006917e03$$storage_image$$pt$$1.pdf
    - name: Anre 
      web: http://www.anre.ro/download.php?f=ga%2BCig%3D%3D&t=vdeyut7dlcecrLbbvbY%3D
    - name: Svensk Energi
      web: http://www.svenskenergi.se/Global/Statistik/El%C3%A5ret/El%C3%A5ret%202014_slututg%C3%A5va.pdf
    - name: Agencija za energijo 2014
      web: http://www.agen-rs.si/documents/10926/38704/Poro%C4%8Dilo/54b1b378-1e76-4d40-8e0d-c30339baa248
    - name: Agencija za energijo 2013
      web: http://www.agen-rs.si/documents/10926/0/Agencija-za-energijo---Energetika-SLO-za-2013-3.pdf/b63d191d-ecbc-4efe-8b91-1e0f80d3272b
    - name: Statistical Office of Slovakia 2013
      web: https://slovak.statistics.sk/PortalTraffic/fileServlet?Dokument=bcc9ac82-9eb4-4320-b460-1f5c726db355
    - name: Statistical Office of Slovakia 2014
      web: https://slovak.statistics.sk/PortalTraffic/fileServlet?Dokument=6d8bdb1f-528c-41b3-9564-0ff365c98bb8
      

maintainers:
    - name: Friedrich Kunz
      email: fkunz@diw.de
      web: http://open-power-system-data.org/

openpowersystemdata-enable-listing: True  


"""

metadata = yaml.load(metadata)

datapackage_json = json.dumps(metadata, indent=4, separators=(',', ': '))

# 3. Write results to file

In [None]:
output_path = 'data_final/'

# Die Struktur für einen Blockkommentar ist #_Kommentar
#Write the result to file
data.to_csv(output_path+'aggregated_capacity.csv', encoding='utf-8')

#Write the results to excel file
data.to_excel(output_path+'aggregated_capacity.xlsx', sheet_name='output')

#Write the results to sql database
data.to_sql(output_path+'aggregated_capacity', sqlite3.connect(output_path+'aggregated_capacity.sqlite'), if_exists="replace") 

#Write the information of the metadata
with open(os.path.join(output_path, 'datapackage.json'), 'w') as f:
    f.write(datapackage_json)

#Set this string to this notebook's filename!    
nb_filename = 'main.ipynb'

# Save a copy of the notebook to markdown, to serve as the package README file
subprocess.call(['ipython', 'nbconvert', '--to', 'markdown', nb_filename])
path_readme = os.path.join(output_path, 'README.md')
try:
    os.remove(path_readme)
except Exception:
    pass
os.rename(nb_filename.replace('.ipynb', '.md'), path_readme)    