# Table of Contents
* [National Generation Capacities](#National-Generation-Capacities)
	* [Prepare Environment](#Prepare-Environment)
	* [Import raw data from Excel-file](#Import-raw-data-from-Excel-file)
	* [Convert raw data to list](#Convert-raw-data-to-list)
	* [Define technology levels](#Define-technology-levels)
	* [Print results of technology levels](#Print-results-of-technology-levels)
		* [Technology level 1](#Technology-level-1)
		* [Technology level 2](#Technology-level-2)
		* [Technology level 3](#Technology-level-3)
	* [Comparison of different technology levels for all countries](#Comparison-of-different-technology-levels-for-all-countries)
	* [Comparison of different technology levels for a selection](#Comparison-of-different-technology-levels-for-a-selection)
* [Documenting the data package (meta data)](#Documenting-the-data-package-%28meta-data%29)
* [Write results to file](#Write-results-to-file)


# National Generation Capacities

## Prepare Environment 

In [None]:
import pandas as pd
import numpy as np
import os.path
import yaml  # http://pyyaml.org/, pip install pyyaml, conda install pyyaml
import json
import subprocess

%matplotlib inline
import logging
logger = logging.getLogger('notebook')
logger.setLevel('INFO')
nb_root_logger = logging.getLogger()
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s',
                              datefmt='%d %b %Y %H:%M:%S')
nb_root_logger.handlers[0].setFormatter(formatter)

#create download and output folder if they do not exist
if not os.path.exists('output/'): os.makedirs('output/')
if not os.path.exists('output/datapackage_generation_capacities'): os.makedirs('output/datapackage_generation_capacities')

## Import raw data from Excel-file

In [None]:
data_file = 'National_Generation_Capacities.xlsx'
filepath = 'inputs/'+data_file
data_raw = pd.read_excel(filepath,
                     sheetname='Summary',
                     header=None,
                     na_values = ['-'],
                     skiprows=0)

# Deal with merged cells in Excel:Fill first three rows with information 
data_raw.iloc[0:2] = data_raw.iloc[0:2].fillna(method='ffill', axis=1)

#Set index for rows
data_raw=data_raw.set_index([0])
data_raw.index.name='technology' 

#set multiindex column names
data_raw.columns=pd.MultiIndex.from_arrays(data_raw[:5].values, names=['country','type','year','source','capacity definition']) 

#remove 3 rows which are already used as column names
data_raw = data_raw[pd.notnull(data_raw.index)] 

data_raw

## Convert raw data to list

In [None]:
# Reshape Dateframe to list
data = pd.DataFrame(data_raw.stack(level=['source','year','type','country','capacity definition']))

# reset index for Dataframe
data=data.reset_index()
data['technology'] = data['technology'].str.replace('- ','')
data=data.set_index('technology')

# delete entries with missing source
data = data[data['source'].isnull() == False]
data = data[data['source'] != 0]

data=data.rename(columns={0: 'value'})

data['value'] = pd.to_numeric(data['value'], errors='coerce')

data.head()

## Define technology levels

Due to varying categorizations in the country-specific raw data, a revised categorization is implemented to reflect the detail level of the different national references. We specify the following four different technology levels:

- **0** - Total generation capacity
- **1** - Generation capacity by fuel type (fossil, nuclear, renewable, other)
- **2** - Generation capacity by fuel (e.g. coal, lignite, hard coal, natural gas, wind)
- **3** - Generation capacity by fuel and technology (e.g. combined cycle natural gas, gas turbine, onshore wind, offshore wind)

Finally, following categorization is used in the dataset and their assigned technology level. In order to ensure consistency of the dataset, some categories are labeled with multiple technology levels.

| Category | Technology Level | Note |
| :--- | :---: | :---|
| **Fossil fuels**  | 1  | |
| Lignite  | 2, 3  | |
| Hard coal  | 2, 3  | |
| Coal derivatives  | 2, 3  | |
| Oil  | 2, 3  | |
| Natural gas  | 2  | |
| *Combined cycle*  | 3  | |
| *Gas turbine*  | 3  | |
| *Other and unknown natural gas*  | 3 | Used if no precise information on technologies available |
| *Differently categorized natural gas*  | 3  | Used if no precise or other categorization on technologies available |
| Mixed fossil fuels  | 2, 3  | |
| Differently categorized fossil fuels  | 2, 3  | Used if no precise or other categorization on technologies available |
| **Nuclear**  | 1, 2, 3 | |
| **Renewable energy sources**  | 1 | |
| Hydro  | 2 | |
| *Run-of-river*  | 3 | |
| *Reservoirs*  | 3 | |
| *Reservoirs incl. pumped storage*  | 3 | Reservoir and pumped storage capacity if not differentiated in reference.|
| *Pumped storage*  | 3 | |
| *Differently categorized hydro*  | 3 | Used if no precise or other categorization on technologies available |
| Wind  | 2 | |
| *Onshore wind*  | 3 | |
| *Offshore wind* | 3 | |
| *Differenty categorized wind*  | 3 | Used if no precise or other categorization on technologies available|
| Solar  | 2 | |
| *Photovoltaic*  | 3 | |
| *Concentrated solar power*  | 3 | |
| *Differently categorized solar*  | 3 | Used if no precise or other categorization on technologies available|
| Geothermal  | 2, 3 | |
| Tide, wave, and ocean  | 2, 3 | |
| Bioenergy and other renewable fuels  | 2, 3 | |
| *Biomass*  | 3 | |
| *Biogas*  | 3 | |
| *Sewage and landfill gas*  | 3 | |
| *Other bioenergy including waste*  | 3 | |
| **Other or unspecified energy sources**  | 1, 2, 3 ||  |


The following table summarized the defined categories considered within the different technology levels:

| Technology level 0 | Technology level 1 | Technology level 2 | Technology level 3|
| :--- | :--- | :--- | :--- |
| Total| |||
| |Fossil fuels |||
| | |Lignite|Lignite|
| | |Hard coal|Hard coal|
| | |Coal derivatives|Coal derivatives|
| | |Oil|Oil|
| | |Natural gas||
| | ||Combined cycle|
| | ||Gas turbine|
| | ||Other and unknown natural gas|
| | ||Differently categorized natural gas|
| | |Mixed fossil fuels|Mixed fossil fuels|
| | | Differently categorized fossil fuels| Differently categorized fossil fuels|
| |Nuclear |Nuclear|Nuclear|
| |Renewable energy sources |||
| | |Hydro||
| | ||Run-of-river|
| | ||Reservoirs|
| | ||Reservoirs incl. pumped storage|
| | ||Differently categorized hydro|
| | |Wind||
| | ||Onshore wind|
| | ||Offshore wind|
| | ||Differenty categorized wind|
| | |Solar||
| | ||Photovoltaic|
| | ||Concentrated solar power|
| | ||Differently categorized solar|
| | |Geothermal |Geothermal |
| | |Tide, wave, and ocean|Tide, wave, and ocean|
| | |Bioenergy and other renewable fuel||
| | ||Biomass|
| | ||Biogas|
| | ||Sewage and landfill gas|
| | ||Other bioenergy including waste|
| |Other or unspecified energy sources |Other or unspecified energy sources|Other or unspecified energy sources||



In [None]:
data['technology_level'] = '3'

data.loc['Total','technology_level'] = '0'

data.loc['Fossil fuels','technology_level'] = '1'
data.loc['Nuclear','technology_level'] = '1, 2, 3'
data.loc['Renewable energy sources','technology_level'] = '1'
data.loc['Other or unspecified energy sources','technology_level'] = '1, 2, 3'

data.loc['Lignite','technology_level'] = '2, 3'
data.loc['Hard coal','technology_level'] = '2, 3'
data.loc['Coal derivatives','technology_level'] = '2, 3'
data.loc['Oil','technology_level'] = '2, 3'
data.loc['Natural gas','technology_level'] = '2'
data.loc['Mixed fossil fuels','technology_level'] = '2, 3'
data.loc['Differently categorized fossil fuels','technology_level'] = '2, 3'

data.loc['Hydro','technology_level'] = '2'
data.loc['Wind','technology_level'] = '2'
data.loc['Solar','technology_level'] = '2'
data.loc['Geothermal','technology_level'] = '2, 3'
data.loc['Tide, wave, and ocean','technology_level'] = '2, 3'
data.loc['Bioenergy and other renewable fuels','technology_level'] = '2'
data['technology_level']

data=data.reset_index()
data.head()


## Print results of technology levels

### Technology level 1

In [None]:
pivot_capacity_level1 = pd.pivot_table(data[data.technology_level.str.contains('1')],
                               index=('country','year','source'),
                               columns = ('technology'),
                               values='value',
                               aggfunc=sum,
                               margins=False)

pivot_capacity_plot=pivot_capacity_level1.plot(kind='bar',stacked=True, legend=True, figsize=(12, 6))
pivot_capacity_plot.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
pivot_capacity_plot.set_ylim(0,250000)

pivot_capacity_plot
pivot_capacity_level1

### Technology level 2

In [None]:
pivot_capacity_level2 = pd.pivot_table(data[data.technology_level.str.contains('2')],
                               index=('country','year','source'),
                               columns = ('technology'),
                               values='value',
                               aggfunc=sum,
                               margins=False)

pivot_capacity_plot=pivot_capacity_level2.plot(kind='bar',stacked=True, legend=True, figsize=(12, 6))
pivot_capacity_plot.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
pivot_capacity_plot.set_ylim(0,250000)

pivot_capacity_plot
pivot_capacity_level2

### Technology level 3

In [None]:
pivot_capacity_level3 = pd.pivot_table(data[data.technology_level.str.contains('3')],
                               index=('country','year','source'),
                               columns = ('technology'),
                               values='value',
                               aggfunc=sum,
                               margins=False)

pivot_capacity_plot=pivot_capacity_level3.plot(kind='bar',stacked=True, legend=True, figsize=(12, 6))
pivot_capacity_plot.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
pivot_capacity_plot.set_ylim(0,250000)

pivot_capacity_plot
pivot_capacity_level3

## Comparison of different technology levels for all countries

In [None]:
capacity_total_0 = pd.DataFrame(data[data['technology_level'].str.contains('0')].groupby(['capacity definition','source','year','type','country'])['value'].sum())
capacity_total_1 = pd.DataFrame(data[data['technology_level'].str.contains('1')].groupby(['capacity definition','source','year','type','country'])['value'].sum())
capacity_total_2 = pd.DataFrame(data[data['technology_level'].str.contains('2')].groupby(['capacity definition','source','year','type','country'])['value'].sum())
capacity_total_3 = pd.DataFrame(data[data['technology_level'].str.contains('3')].groupby(['capacity definition','source','year','type','country'])['value'].sum())

capacity_total_comparison = pd.DataFrame(capacity_total_0)
capacity_total_comparison = pd.merge(capacity_total_0, capacity_total_1,left_index=True,right_index=True,how='left')
capacity_total_comparison = capacity_total_comparison.rename(columns={'value_x': 'technology level 0','value_y': 'technology level 1'})
capacity_total_comparison = pd.merge(capacity_total_comparison, capacity_total_2,left_index=True,right_index=True,how='left')
capacity_total_comparison = pd.merge(capacity_total_comparison, capacity_total_3,left_index=True,right_index=True,how='left')
capacity_total_comparison = capacity_total_comparison.rename(columns={'value_x': 'technology level 2','value_y': 'technology level 3'})


capacity_total_comparison = capacity_total_comparison.sortlevel(['country','year'])

capacity_total_pivot_plot = capacity_total_comparison.plot(kind='bar',stacked=False, legend=True, figsize=(12, 6))
capacity_total_pivot_plot.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
#capacity_total_pivot_plot.set_ylim(0,250000)

capacity_total_pivot_plot
capacity_total_comparison

## Comparison of different technology levels for a selection

In [None]:
capacity_total_comparison = pd.DataFrame(capacity_total_comparison.stack()).reset_index().rename(columns={'level_5': 'technology_level',0: 'value'})

capacity_total_pivot = pd.pivot_table(
                               # select specific country for comparison
#                               capacity_total_comparison[capacity_total_comparison['country']=='BE'],
                               # select specific source for comparison 
                               capacity_total_comparison[capacity_total_comparison['source']=='entsoe'],
                               index=('country','year','source'),
                               columns='technology_level', 
                               values='value',
                               aggfunc=sum,
                               margins=False)

capacity_total_pivot_plot = capacity_total_pivot.plot(kind='bar',stacked=False, legend=True, figsize=(12, 6))
capacity_total_pivot_plot.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
#capacity_total_pivot_plot.set_ylim(0,250000)

capacity_total_pivot_plot
capacity_total_pivot

# Documenting the data package (meta data)

We document the data packages meta data in the specific format JSON as proposed by the Open Knowledge Foundation. See the Frictionless Data project by OKFN (http://data.okfn.org/) and the Data Package specifications (http://dataprotocols.org/data-packages/) for more details.

In order to keep the notebook more readable, we first formulate the metadata in the human-readable YAML format using a multi-line string. We then parse the string into a Python dictionary and save that to disk as a JSON file.

In [None]:
# Here we define meta data of the resulting data package.
# The meta data follows the specification at:
# http://dataprotocols.org/data-packages/

metadata = """

name: opsd-national-generation-capacities
title: National electricity generation capacities of selected European countries.
description: This dataset comprises technology-specific aggregated generation capacities for selected European countries.
version: "2016-02-08"
keywords: [national generation capacities,europe]
opsd-jupyter-notebook-url: "https://github.com/Open-Power-System-Data/datapackage_national_generation_capacities/blob/master/National_Generation_Capacities.ipynb"

resources:
    - path: national_generation_capacities.csv
      format: csv
      mediatype: text/csv
      schema:  # Schema according to: http://dataprotocols.org/json-table-schema/        
        fields:
            - name: id
              description: ID for data entries 
              type: integer
            - name: technology
              description: Generation technologies defined by fuel and conversion technology
              type: string
            - name: source
              description: Source of data entry
              type: string
            - name: year
              description: Year 
              type: integer
              format: YYYY
            - name: type
              description: Type of capacity
              type: string
            - name: country
              description: Country 
              type: string
            - name: capacity definition
              description: Capacity definition used in the relevant source
              type: string
            - name: value
              description: Capacity in MW
              type: float
            - name: technology_level
              description: Level of technology definitions (0-total aggregated capacity, 1-aggregated by type of fuel, 2-aggregated by fuel, 3-aggregated by fuel and technology)
              type: integer
              
licenses:
    - url: http://example.com/license/url/here
      name: License Name Here
      version: 1.0
      id: license-id-from-open

sources:
    - name: ENTSOE,
      web: https://www.entsoe.eu/publications/system-development-reports/adequacy-forecasts/Pages/default.aspx
    - name: EUROSTAT,
      web: http://ec.europa.eu/energy/en/statistics/country
    - name: e-control,
      web: http://www.e-control.at/statistik/strom/bestandsstatistik
    - name: ELIA,
      web: http://www.elia.be/en/grid-data/power-generation/generating-facilities
    - name: UN,
      web: http://data.un.org/Data.aspx?d=EDATA&f=cmID%3AEC
    - name: BFE,
      web: http://www.bfe.admin.ch/themen/00526/00541/00542/00630/index.html?dossier_id=00765
    - name: ERU,
      web: http://www.eru.cz/en/elektrina/statistika-a-sledovani-kvality/rocni-zpravy-o-provozu
    - name: BMWi,
      web: http://www.bmwi.de/BMWi/Redaktion/Binaer/Energiedaten/energietraeger10-stromerzeugungskapazitaeten-bruttostromerzeugung,property=blob,bereich=bmwi2012,sprache=de,rwb=true.xls
    - name: DEA,
      web:  http://www.ens.dk/en/info/facts-figures/energy-statistics-indicators-energy-efficiency/annual-energy-statistics
    - name: REE,
      web: http://www.ree.es/en/publications/statistical-data-of-spanish-electrical-system/national-indicators/
    - name: RTE 2014,
      web: http://www.rte-france.com/en/document/overview-electrical-energy-france-march-2014
    - name: RTE 2015,
      web:  http://clients.rte-france.com/lang/an/visiteurs/vie/prod/parc_reference.jsp
    - name: TERNA 2013,
      web: http://download.terna.it/terna/0000/0216/17.XLSX
    - name: TERNA 2014,
      web: http://download.terna.it/terna/0000/0216/16.XLSX
    - name: ILR,
      web: http://www.ilr.public.lu/electricite/statistiques/index.html
    - name: Tennet NL,
      web: http://energieinfo.tennet.org/dataexport/exporteerdatacountry.aspx?id=InstalledCapacity
    - name: CIRE,
      web: http://www.rynek-energii-elektrycznej.cire.pl/st,33,207,tr,75,0,0,0,0,0,podstawowe-dane.html
      

maintainers:
    - name: OPSD-Team,
      email: OPSD-Team-email,
      web: http://open-power-system-data.org/

views:
    # You can put hints here which kind of graphs or maps make sense to display your data. This makes the 
    # Data Package Viewer at http://data.okfn.org/tools/view automatically display visualazations of your data.
    # See http://data.okfn.org/doc/data-package#views for more details.    

# extend your datapackage.json with attributes that are not
# part of the data package spec
# you can add your own attributes to a datapackage.json, too

openpowersystemdata-enable-listing: True  # This is just an example we don't actually make use of yet.


"""

metadata = yaml.load(metadata)

datapackage_json = json.dumps(metadata, indent=4, separators=(',', ': '))

# Write results to file

In [None]:
output_path = 'output/datapackage_generation_capacities/'
output_path2 = 'output/datapackage_generation_capacities'

#Write the result to file
data_raw.to_csv(output_path+'national_generation_capacities_crosstab.csv', encoding='utf-8')

#Write the results to excel file
#data_raw.to_excel(output_path+'national_generation_capacities_crosstab.xlsx', sheet_name='output')

#Write the result to file
data.to_csv(output_path+'national_generation_capacities.csv', encoding='utf-8')

#Write the results to excel file
#data.to_excel(output_path+'national_generation_capacities.xlsx', sheet_name='output')

#Write the information of the metadata
with open(os.path.join(output_path, 'datapackage.json'), 'w') as f:
    f.write(datapackage_json)

#Set this string to this notebook's filename!    
nb_filename = 'National_Generation_Capacities.ipynb'

# Save a copy of the notebook to markdown, to serve as the package README file
subprocess.call(['ipython', 'nbconvert', '--to', 'markdown', nb_filename])
path_readme = os.path.join(output_path2, 'README.md')
try:
    os.remove(path_readme)
except Exception:
    pass
os.rename(nb_filename.replace('.ipynb', '.md'), path_readme)    
 