# Register of renewable energy power plants in Germany

This Jupyter Notebook is part of the "Open Power System Data" Project. The aim is to extract, merge, clean and verify different sources with data of renewable energy (RE) power plants in Germany. The data is downloaded from three diffrent sources:
* [BNetzA](http://www.bundesnetzagentur.de/) - The German Federal Network Agency for Electricity, Gas, Telecommunications, Posts and Railway 
* [EnergyMap.info](http://www.energymap.info/) - The German Society for Solarpower e.V. 
* [Netztransparenz.de](https://www.netztransparenz.de/de/Anlagenstammdaten.htm) - Information platform from the german TSOs

In Germany all data is mandatorily reported by the four TSOs (50 Herz, Amprion, Tennet, TransnetBW). The BNetzA merges them and EnergyMap.info additionally validates and converts the data into a standard format. The EnergyMap.info register includes all RE power plants which are eligible for the German feed-in tariff (according to the EEG law). EnergyMap.info is the main source for all historical data (installed until July 2014)

Note: Since 2014 the BNetzA itself is responsible to publish the renewable power plants register. The legal framework for the register is  specified in the [EEG 2014 (German)](http://www.gesetze-im-internet.de/eeg_2014/) [(English)](http://www.res-legal.eu/search-by-country/germany/single/s/res-e/t/promotion/aid/feed-in-tariff-eeg-feed-in-tariff/lastp/135/). From June 2014 on all power plants are published in two separated MS Excel files (roof mounted PV systems and all other renewable power plants). Additionaly the Data from [Netztransparenz.de](https://www.netztransparenz.de/de/Anlagenstammdaten.htm) is included in the script to eliminate missing values and timelines.


# License

* This notebook is published under the GNU GPL v3 license. http://www.gnu.org/licenses/gpl-3.0.en.html.
* This notebook is developed by the project Open Power System Date (OPSD.) http://open-power-system-data.org/

# Table of contents 


(work in progress)
* [Register of renewable energy power plants in Germany](#Register-of-renewable-energy-power-plants-in-Germany)
* [License](#Licence)
* [Import python modules](#Import-python-modules)
* [Data overview](#Data-overview)
* [Download and extract data](#Download-an-extracting-data-from-data-sources)
    * [Unzip and load data as Data Frame](#Unzip-and-load-data-as-data-frame)
    * [Definition of column names](#Definition-of-column-names)
    * [First look at the Data Frame structure and format](#First-look-at-the-Data-Frame-structure-and-format)
    * [Rename values by own english standards](#Rename-values-by-own-english-standards)

* [Find duplicantes and other suspect data](#Find-duplicantes-and-other-suspect-data)

* [Georeferencing get coordinates](#Georeferencing-get-coordinates)
* [Geoploting test](#Geoplot-testing)
* [Documenting the data package in a JSON object](#Documenting-the-data-package-in-a-JSON-object)
* [Write Data Package to disk](#Write-Data-Package-to-disk)


# Import Python modules

In [None]:
%matplotlib inline

from collections import OrderedDict
import io
import json
import os
import subprocess
import zipfile
import posixpath
import urllib.parse
import urllib.request
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np
import pandas as pd
import requests
import yaml  # http://pyyaml.org/, pip install pyyaml, conda install pyyaml
import datetime  

import logging
logger = logging.getLogger('notebook')
logger.setLevel('INFO')
nb_root_logger = logging.getLogger()
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s'\
                              '- %(message)s',datefmt='%d %b %Y %H:%M:%S')
nb_root_logger.handlers[0].setFormatter(formatter)



#create download and output folders if they don't exist
if not os.path.exists('input/'): os.makedirs('input/')
if not os.path.exists('output/'): os.makedirs('output/')
if not os.path.exists('output/datapackage_renewables'): os.makedirs('output/ \
                                                      datapackage_renewables')


# Data overview

The data processed in this notebook is provided by and downloaded from three different sources. This is done due to the fact, that all sources have different timeperiods, missing data and are inaccurate or imcomplete.

####  Bundesnetzagentur (BNetzA)

The German Federal Network Agency for Electricity, Gas, Telecommunications, Posts and Railway (BNetzA) publishes the renewable power plant register ("Anlagenregister"):

* Data discription from source: BNetzA for [renewable power plants](http://www.bundesnetzagentur.de/cln_1422/DE/Sachgebiete/ElektrizitaetundGas/Unternehmen_Institutionen/ErneuerbareEnergien/Photovoltaik/DatenMeldgn_EEG-VergSaetze/DatenMeldgn_EEG-VergSaetze_node.html) and [PV](http://www.bundesnetzagentur.de/cln_1422/DE/Sachgebiete/ElektrizitaetundGas/Unternehmen_Institutionen/ErneuerbareEnergien/Photovoltaik/DatenMeldgn_EEG-VergSaetze/DatenMeldgn_EEG-VergSaetze_node.html)
* Download from: 
    * Renewable Power Plants without PV [xls-File]('http://www.bundesnetzagentur.de/SharedDocs/Downloads/DE/Sachgebiete/Energie/Unternehmen_Institutionen/ErneuerbareEnergien/Anlagenregister/VOeFF_Anlagenregister/2015_11_Veroeff_AnlReg.xls?__blob=publicationFile&v=1) 
    * PV [csv-File](http://www.bundesnetzagentur.de/SharedDocs/Downloads/DE/Sachgebiete/Energie/Unternehmen_Institutionen/ErneuerbareEnergien/Photovoltaik/Datenmeldungen/Meldungen_CSV.csv?__blob=publicationFile&v=14)
        

#### EnergyMap.info

The webpage is owned by the Deutsche Gesellschaft für Sonnenenergie e.V. (DGS)
* Data discription from source: [EnergyMap](http://www.energymap.info/download/)
* Download from: [zip-File](http://www.energymap.info/download/eeg_anlagenregister_2015.08.utf8.csv.zip)
    * Last Update: 08.2015
    * File size: 51,9 MB
   
EnergyMap.info is the main source for all historical data (installed until July 2014).

#### Netztransparenz.de 

Official grid transparency platform from the German TSOs (50Hertz, Amprion, TenneT and TransnetBW).
* Data discription from source: [Netztransparenz](https://www.netztransparenz.de/de/Anlagenstammdaten.htm)
* Download from: [zip-File](https://www.netztransparenz.de/de/file/UeNB-EEG-Anlagenstammdaten_2013.zip)
    * Last Update: 06.11.2015
    * File size: 363,9 MB


# Download an extracting data from data sources



In [None]:
# Energymap.info as
url_energymap = 'http://www.energymap.info/download/eeg_anlagenregister_2015.'\
                 '08.utf8.csv.zip'
# netztransparenz.de
url_netztransp  ='https://www.netztransparenz.de/de/file/'\
                  'Anlagenstammdaten_2014_4UeNB.zip'
# BNetzA PV register
url_bnetza_pv = 'https://www.bundesnetzagentur.de/SharedDocs/Downloads/DE/'\
                'Sachgebiete/Energie/Unternehmen_Institutionen/'\
                'ErneuerbareEnergien/Photovoltaik/Datenmeldungen/'\
                'Meldungen_Aug-Okt2015.xls?__blob=publicationFile&v=1'
# BNetzA (all other power plants) register
url_bnetza ='http://www.bundesnetzagentur.de/SharedDocs/Downloads/DE/Sachgebiete/'\
            'Energie/Unternehmen_Institutionen/ErneuerbareEnergien/Anlagenregister/'\
            'VOeFF_Anlagenregister/2015_11_Veroeff_AnlReg.xls?__blob=publicationFile&v=1'



In [None]:
# Download function from TU Berlin
def downloadandcache(url):
    """This function downloads a file into a folder called 
    downloads and returns the local filepath."""
    path = urllib.parse.urlsplit(url).path
    filename = posixpath.basename(path)
    now = datetime.datetime.now()
    datestring = ""
    datestring = str(now.year)+"-"+str(now.month)
    filepath = "input/"+datestring+"-"+filename
    
    #check if file exists, otherwise download it
    if os.path.exists(filepath) == False:
        print("Downloading file", filename)
        urllib.request.urlretrieve(url, filepath)
    else:
        print("Using local file from", filepath)
    filepath = ''+filepath
    return filepath

In [None]:
# Use download function and get file path
energymap_filepath = downloadandcache(url_energymap)
bnetza_filepath = downloadandcache(url_bnetza)
bnetza_pv_filepath = downloadandcache(url_bnetza_pv)
netztransp_filepath = downloadandcache(url_netztransp)

## Unzip and load data as data frame

To handle the provided data a [pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe) is established.

--> Time for loading data: approx. 5-10 min.

In [None]:
# unzip netztransparenz.de
z = zipfile.ZipFile(netztransp_filepath)
# read csv of Amprion
amprion_df = pd.read_csv(z.open('Amprion_Anlagenstammdaten_2014.csv'),
                       sep=';',        # CSV field separator, default is ','
                       thousands='.',  # Thousands separator, default is ','
                       decimal=',',    # Decimal separator, default is '.'
                       header=0,
                       parse_dates=[11,12,13,14], 
                       encoding = 'cp850',
                       dayfirst=True, 
                       low_memory=False)

In [None]:
# read csv of 50Hertz
hertz_df = pd.read_csv(z.open('50Hertz_Anlagenstammdaten_2014.csv'),
                       sep=';',        # CSV field separator, default is ','
                       thousands='.',  # Thousands separator, default is ','
                       decimal=',',    # Decimal separator, default is '.'
                       header=0,
                       parse_dates=[11,12,13,14],  
                       encoding = 'cp1252',
                       dayfirst=True, 
                       low_memory=False)


In [None]:
# read csv of TenneT
tennet_df = pd.read_csv(z.open('TenneT_Anlagenstammdaten_2014.csv'),
                       sep=';',        # CSV field separator, default is ','
                       thousands='.',  # Thousands separator, default is ','
                       decimal=',',    # Decimal separator, default is '.'
                       header=0,
                       parse_dates=[11,12,13,14], 
                       encoding = 'cp1252',
                       dayfirst=True, 
                       low_memory=False)

In [None]:
# read csv of TransnetBW
transnetbw_df = pd.read_csv(z.open('TransnetBW_Anlagenstammdaten_2014.csv'),
                       sep=';',        # CSV field separator, default is ','
                       thousands='.',  # Thousands separator, default is ','
                       decimal=',',    # Decimal separator, default is '.'
                       header=0,
                       parse_dates=[11,12,13,14], 
                       encoding = 'cp1252',
                       dayfirst=True, 
                       low_memory=False)

In [None]:
# unzip energymap zip file
zz = zipfile.ZipFile(energymap_filepath)
# read csv of energymap
enmap_df = pd.read_csv(zz.open('eeg_anlagenregister_2015.08.utf8.csv'),
                       sep=';',        # CSV field separator, default is ','
                       thousands='.',  # Thousands separator, default is ','
                       decimal=',',    # Decimal separator, default is '.'
                       header=None,
                       names=None,
                       parse_dates=[0], 
                       dayfirst=True, 
                       comment='#',
                       low_memory=False)

In [None]:
# read BNetzA register
bnetza = pd.ExcelFile(bnetza_filepath)
#bnetza.sheet_names
bnetza_df = bnetza.parse("Gesamtübersicht")
#list(bnetza1.columns)

In [None]:
# read BNetzA PV roof mounted
bnetza_pv = pd.ExcelFile(bnetza_pv_filepath)

# Combine all BNetzA sheets into one data frame
bnetza_pv_df = pd.concat(bnetza_pv.parse(sheet, skiprows=10
                         ) for sheet in bnetza_pv.sheet_names)
# drop NULL "Unnamed:" column
bnetza_pv_df = bnetza_pv_df.drop(bnetza_pv_df.columns[[7]], axis=1)
# show sheet names
bnetza_pv.sheet_names

## Definition of column names

The original column names from EnergyMap.info, the German TSOs and the BNetzA need to be translated. The new english column names are assigned to the data frame. 

In [None]:
# translation of column names by existing csv list
columnnames =pd.read_csv('input/column_translation_list.csv',sep = ";",
                           header=0)
columndict = columnnames.set_index('original_name')['column_naming'].to_dict()
energymap_dict = columnnames.loc[columnnames['source'] == 'EnergyMap',
                                 'column_naming'].to_dict()

In [None]:
# rename 
bnetza_pv_df.rename(columns = columndict , inplace=True)
# because of douple backslash in a python dict a additional helping dict is introduced
backslash ={'Anlage \nBundesland': 'federal_state','Anlage \nOrt oder Gemarkung': 
            'city','Anlage \nPLZ': 'postcode','Anlage \nStraße oder Flurstück *)': 
            'address', 'Installierte \nNennleistung [kWp]': 'rated_power'}
bnetza_pv_df.rename(columns = backslash, inplace=True)
bnetza_pv_df.columns

In [None]:
# get an overview of the list by source
columnnames.loc[columnnames['source'] == 'BNetzA_PV']
#columnnames.groupby('source').count()

In [None]:
# rename bnetza_df columns by the traslation dict
bnetza_df.rename(columns = columndict , inplace=True)
#bnetza_df.columns

In [None]:
# rename columns
transnetbw_df.rename(columns = columndict , inplace=True)
tennet_df.rename(columns = columndict , inplace=True)
amprion_df.rename(columns = columndict , inplace=True)
hertz_df.rename(columns = columndict , inplace=True)
enmap_df.rename(columns = energymap_dict , inplace=True)

## Set original source to dataframe
* Set source names
* Make changes with the PV solar roof mounted data
    * Add new eeg_id  

In [None]:
# set source names of each dataframe
enmap_df['source'] = 'EnergyMap'
transnetbw_df['source'] = 'TransnetBW'
tennet_df['source'] = 'TenneT'
amprion_df['source'] = 'Amprion'
hertz_df['source'] = '50Hertz'
bnetza_df['source'] = 'BNetzA'
bnetza_pv_df['source'] = 'BNetzA_PV'

In [None]:
# add type of generation 
bnetza_pv_df['generation_type'] = 'solar_roof_mounted'
# add new id for eeg_id by the index
bnetza_pv_df['eeg_id'] =  'pv_id' + bnetza_pv_df.index.astype(str)

In [None]:
# join dataframes together
f_enmap = [enmap_df, bnetza_df, bnetza_pv_df]
f_tso = [transnetbw_df,tennet_df, amprion_df, 
         hertz_df, bnetza_df, bnetza_pv_df]
# 
renewables = pd.concat(f_tso)                      
enmap = pd.concat(f_enmap) 

## Select columns of our interest

In [None]:
# locate columns
column_interest = ['start_up_date', 'rated_power','generation_type',
                   'generation_subtype','thermal_capacity','city', 'postcode',
                   'address','tso','utm_zone','utm_east', 'utm_north',
                   'gps_lon','gps_lat', 'notification_reason', 'eeg_id',
                   'dso_id', 'dso','voltage_level','decommission_date',
                   'power_plant_id','source']

In [None]:
renewables = renewables.loc[:, column_interest]
energymap = enmap.loc[:, column_interest]

# delete the rest of the columns
renewables.reset_index(drop=True)
energymap.reset_index(drop=True)
print('done')

## Take a look at entries and source

In [None]:
renewables.groupby(['source'])['eeg_id'].count()

In [None]:
energymap.groupby(['source'])['eeg_id'].count()

## First look at the dataframe structure and format

A first validation of the created dataframe and the data formats of the different columns.

`DataFrame.info()` shows us the number of non-null (non-NA) values in each column, which can serve as a first indicator.

In [None]:
energymap.info()

In [None]:
renewables.info()

## Rename values by own english standards

* use translation csv file
* create subtypes of generation


In [None]:
# translation of entries values 
values =pd.read_csv('input/value_translation_list.csv',sep = ";",
                           header=0)
# create dict in order change values 
valuedict = values.set_index('original_name')['generation_subtype'].to_dict()
# 
gentypedict = values.set_index('generation_subtype')['generation_type'].to_dict()

In [None]:
print(values)

In [None]:
# replace values by standard values
# running time ~ 10 min
renewables.replace(valuedict, inplace=True)
energymap.replace(valuedict, inplace=True)

In [None]:
# subtypes = type_of_generation
renewables.generation_subtype = renewables.generation_type

In [None]:
# replace subtypes by types
renewables.generation_type.replace(gentypedict, inplace=True)
energymap.generation_type.replace(gentypedict, inplace=True)
# get an overview
renewables.groupby(['generation_type'])['rated_power'].sum() / 1000

In [None]:
renewables.groupby(['generation_subtype'])['rated_power'].sum() / 1000

In [None]:
renewables.groupby(['source'])['eeg_id'].count()

In [None]:
energymap.groupby(['source'])['eeg_id'].count()

In [None]:
energymap.groupby(['generation_type'])['rated_power'].sum()/1000

# Data Validation 
Correction of Duplicates and other suspect data:

(incomplet  - work in progress)
* for postcode len() < 5: add zero
* delete #NV value for type of generation
* find duplicates in power_plant_id
* take the last entry of a duplicate 
* find and drop entries with decommissioning time (power plants no longer running)
* delete entries in pv_roof_mounted <= 2014-12-31 from BNetzA_PV
* check notification_reason


## Post Code Corrections
Post codes have to be a string value. As some start with a zero, we remove any float type and check for invalid post codes.

In [None]:
# function for adding zero
addzero = lambda a,b: b if (a !=4) else '0'+b

In [None]:
# change postcode in BNetzA list           
# convert int to string for postcode
renewables['postcode']= renewables['postcode'].apply(str)
renewables['postcode_new']= renewables['postcode'].apply(str)
#count lenght of string
renewables['postcode_len'] = renewables.postcode.str.len()

# reset index
renewables = renewables.reset_index()
# locate len = 4 and add 0 to postcode
renewables['postcode_new']=renewables[['postcode_len','postcode']
                                      ].apply(lambda x: addzero(x[0], x[1]), axis=1)

In [None]:
# change postcode  in energymap.info list
# convert int to string for postcode
energymap['postcode']= energymap['postcode'].apply(str)
energymap['postcode_new']= energymap['postcode'].apply(str)
#count lenght of string
energymap['postcode_len'] = energymap.postcode.str.len()

# reset index
energymap = energymap.reset_index()
# locate len = 4 and add 0 to postcode
energymap['postcode_new']=energymap[['postcode_len','postcode']
                                    ].apply(lambda x: addzero(x[0], x[1]), 
                                            axis=1)

In [None]:
# drop and rename post code columns
renewables.drop(['postcode_len','postcode','index'], axis=1, inplace=True)
renewables.rename(columns={'postcode_new': 'postcode'}, inplace=True)

energymap.drop(['postcode_len','postcode','index'], axis=1, inplace=True)
energymap.rename(columns={'postcode_new': 'postcode'}, inplace=True)


## Drop suspect data 

* Order Dataframe by date in order to take the last given value if duplicates are given

In [None]:
# sort index by start_up_date
renewables = renewables.ix[pd.to_datetime(renewables.start_up_date).order().index]
renewables.head()

In [None]:
# drop BNetzA data which are older than 2014-12-31
idx_date = renewables[(renewables['start_up_date'] <= '2014-12-31') 
                      & (renewables['source'] == 'BNetzA')].index
renewables.drop(idx_date,inplace=True)  

# drop entires which are not installed yet
idx_no = renewables[(renewables['notification_reason']== 'Genehmigung')].index
renewables.drop(idx_no,inplace=True)

#get all duplicate eeg_ids and take the last entry
renewables.drop_duplicates(cols=['generation_type','start_up_date',
                                 'eeg_id'], keep='last',inplace=True)

# get all data from BNetzA_PV <= 2014-12-31 because of duplicated entries
idx_pv = renewables[(renewables['start_up_date'] <= '2014-12-31') 
                    & (renewables['source'] == 'BNetzA_PV')].index
renewables.drop(idx_pv,inplace=True)

# drop all entries where decommissioning is not null
idx_decom = renewables[renewables.decommission_date.notnull()].index
renewables.drop(idx_decom,inplace=True)

# delete #NV value for type of generation
idx_nv = renewables[(renewables['generation_type'] == '#NV')].index
renewables.drop(idx_nv,inplace=True)

print('done')
                                         

In [None]:
renewables.groupby(['generation_type'])['rated_power'].sum() / 1000

In [None]:
renewables.groupby(['generation_subtype'])['rated_power'].sum() / 1000

## Using the Energymap dataframe

* We delete all data from energymap >= 2014-07-31
* Make new index by date

In [None]:
idx_date = energymap[(energymap['start_up_date'] >= '2014-07-31') 
                     & (energymap['source'] == 'EnergyMap')].index
energymap.drop(idx_date,inplace=True)     


In [None]:
# sort index by start_up_date
energymap = energymap.ix[pd.to_datetime(energymap.start_up_date).order().index]
energymap.head()

In [None]:
# drop BNetzA data which are older 2014-12-31
idx_date = energymap[(energymap['start_up_date'] >= '2014-07-31') 
                     & (renewables['source'] == 'BNetzA')].index
energymap.drop(idx_date,inplace=True)  

# Drop entires which are not installed yet
idx_no = energymap[(energymap['notification_reason']== 'Genehmigung')].index
energymap.drop(idx_no,inplace=True)

# get all duplicated eeg_ids and take last entry
energymap.drop_duplicates(cols=['generation_type','start_up_date',
                                'eeg_id'], keep='last',inplace=True)

# get all data from BNetzA_PV <= 2014-12-31 because of duplicated entries
idx_pv = energymap[(energymap['start_up_date'] <= '2014-07-31') 
                   & (energymap['source'] == 'BNetzA_PV')].index
energymap.drop(idx_pv,inplace=True)

# drop all entries where decommissioning is not null
idx_decom = energymap[energymap.decommission_date .notnull()].index
energymap.drop(idx_decom,inplace=True)

# delete #NV value for type of generation
idx_nv = energymap[(energymap['generation_type'] == '#NV')].index
energymap.drop(idx_nv,inplace=True)

print('done')

In [None]:
energymap.groupby(['generation_type'])['rated_power'].sum() / 1000

In [None]:
renewables.info()

In [None]:
energymap.info()

# Georeferencing
## Get coordinates

The post code provides a first approximation of the power plants geo coordinates. Since the TSOs only report the first three digits of the poste code (e.g. 024xx) and no city or street names are provided. Accordingly, a centroid of the post code region polygon is established to find the coordinates.

For the approximation of the data postcode regions with three (from TSOs) and five digits (from BNetzA) are offered by http://www.suche-postleitzahl.org.

*  http://www.suche-postleitzahl.org/downloads?download=plz-gebiete.shp.zip
*  http://www.suche-postleitzahl.org/downloads?download_file=plz-3stellig.shp.zip
*  http://www.suche-postleitzahl.org/downloads#

License:

* http://www.suche-postleitzahl.org/downloads
* Open Database Licence for free use. 
* Source of data: © OpenStreetMap contributors

With this data we compiled a csv file for all existing German post codes. We generated the latitude and longitude coordinates by running a PostgreSQL + PostGIS database. We added all possible TSO names to each post code including overlaps.
* SQL Script (link will follow here)

In [None]:
# Get csv file
postcode = pd.read_csv('input/de_tso_postcode_gps.csv',
                       sep=';',        # CSV field separator, default is ','
                       header=0)

postcode.drop_duplicates(cols='postcode', keep='last',inplace=True)
postcode.head()

In [None]:
# take postcode and longitude/latitude infomations
postcode= postcode[[0,3,4]]

In [None]:
# Clean dataframe
renewables.drop(['gps_lon','gps_lat'], axis=1, inplace=True)
renewables.info()

In [None]:
postcode = postcode.reset_index()
renewables = renewables.reset_index()

### Merge geometry information by using the postcode

In [None]:
ren_geo =pd.merge(renewables, postcode, on=['postcode'],  how='left')

In [None]:
# rename columes
ren_geo.rename(columns={'gps_lon': 'longitude','gps_lat': 'latitude'}, inplace=True)

In [None]:
# how many couldn't be merged?
print('Missing coordinates ',ren_geo.latitude.isnull().sum())
# get an
ren_geo[ren_geo.latitude.isnull()].groupby(['generation_type',
                                            'generation_subtype']
                                          )['source'].count()

### Energymap Data Frame


In [None]:
energymap.info()

In [None]:
# GEO REF energymap
postcode = postcode.reset_index()
energymap = energymap.reset_index()

energ_geo =pd.merge(energymap, postcode, on=['postcode'],  how='left')

In [None]:
# function using selecting longitude and latitude data. 
# If energymap = NULL take postcode referenz
get_lonlat = lambda a,b: b if ( pd.isnull(a) == True) else a
# 
energ_geo['longitude'] = energ_geo[['gps_lon_x','gps_lon_y']].apply(lambda x:
                                    get_lonlat(x[0], x[1]), axis=1)
energ_geo['latitude'] = energ_geo[['gps_lat_x','gps_lat_y']].apply(lambda x: 
                                    get_lonlat(x[0], x[1]), axis=1)

In [None]:
# How many couldn't be merged?
print('Missing coordinates ',energ_geo.longitude.isnull().sum())
# get an
energ_geo[energ_geo.longitude.isnull()].groupby(['source','generation_type'
                                                ])['source'].count()

#### ToDo:
* update postcode csv

In [None]:
energ_geo.info()

In [None]:
# Clean Data Frame
energ_geo.drop(['index_x','index_y','level_0','gps_lon_y',
                'gps_lon_x','gps_lat_y','gps_lat_x'], axis=1, inplace=True)
energ_geo.info()

## Statistics for data valuation

In [None]:
energ_geo['rated_power'].describe()
# energ_geo.groupby(['type_of_generation'])['rated_power'].describe()

In [None]:
ren_geo['rated_power'].describe()
#ren_geo.groupby(['type_of_generation'])['rated_power'].describe()

In [None]:
#activ['rated_power'].sum() / 1000
ren_geo[(ren_geo['start_up_date'] >= '2014-12-31')].groupby([
        'generation_type'])['rated_power'].sum() / 1000

In [None]:
ren_geo.query("start_up_date <= '2014-12-31'").groupby(['generation_type']
                                                      )['rated_power'].count()

In [None]:
# Generation type by source and min date
ren_geo.groupby(['generation_type','source'])['start_up_date'].min()

In [None]:
# Generation type by source and max date
ren_geo.groupby(['generation_type','source'])['start_up_date'].max()

In [None]:
# growth rate 2014
rate14 = (ren_geo['start_up_date'] > '2014-01-01') & (ren_geo['start_up_date']
                                                      <= '2014-12-31')
ren_geo.loc[rate14].groupby(['generation_subtype'])['rated_power'].sum() / 1000

In [None]:
# Overview of early dates
ren_geo.query("start_up_date < '1980-01-01' &  generation_type!= 'hydro'")

In [None]:
energ_geo.query("start_up_date < '1980-01-01'"\
                "& generation_type != 'hydro' ").head()

##  Plot overview of the first result

In [None]:
energ_geo['rated_power'].plot()

## Take a look into the statistics of BMWi and BNetzA

In [None]:
# BNetzA
url_eeg2013 ='http://www.bundesnetzagentur.de/SharedDocs/Downloads/DE/'\
             'Sachgebiete/Energie/Unternehmen_Institutionen/ErneuerbareEnergien/' \
              'ZahlenDatenInformationen/EEGinZahlen_2013.xls;?__blob=publicationFile&v=3'
eeg2013 = pd.ExcelFile(url_eeg2013)
eeg2013.sheet_names

In [None]:
sta2013 = eeg2013.parse('2.1 Überblick Deutschland', skiprows=4, skip_footer=9)
sta2013

In [None]:
url_reg2014  ='http://www.erneuerbare-energien.de/EE/Redaktion/DE/Downloads/zeitreihen-zur-entwicklung-der' \
             '-erneuerbaren-energien-in-deutschland-1990-2014-excel.xlsx;?__blob=publicationFile&v=5'

reg2014 = pd.ExcelFile(url_reg2014)   
sta2014 = reg2014.parse('4', skiprows=7, skip_footer=7)
    

In [None]:
sta2014.T

# Create final dataframe

In [None]:
df_columns = ['start_up_date','rated_power','generation_type',
              'subtype','thermal_capacity','city', 'postcode',
              'tso','longitude','latitude','eeg_id',
              'voltage_level','source']

In [None]:
transp_data = ren_geo.loc[:, df_columns]
energymap_data = energ_geo.loc[:, df_columns]

# delete the rest of columns
transp_data.reset_index(drop=True)
energymap_data.reset_index(drop=True)
print('done')

In [None]:
transp_data.head()

# Geoplot testing

* use energy map to get gps by eeg_id
* transform UTM to GPS by https://pypi.python.org/pypi/utm
* use plz.sql to find the rest


In [None]:
def draw_germany():
    """Returns a basemap centered on Germany"""
    plt.figure(figsize=(14, 10))
    return Basemap(projection='lcc', resolution='l',
                   llcrnrlon= 5.87, urcrnrlon=16.04,
                   llcrnrlat= 47, urcrnrlat= 55.1,
                   lat_1=47.26, lat_2=56, lon_0=6,
                   area_thresh=10000)


In [None]:
m = draw_germany()
# Draw map background
m.fillcontinents(color='white', lake_color='#eeeeee')
m.drawstates(color='lightgray')
m.drawcoastlines(color='lightgray')
m.drawcountries(color='lightgray')
m.drawmapboundary(fill_color='#eeeeee')

# get coordinates
lon = np.array(energymap_data.longitude)
lat = np.array(energymap_data.latitude)

# Make plot settings
style = dict(s=5, marker='o', alpha=0.5, zorder=2)
m.scatter(lon,lat, latlon=True,
          label='Renewable Power plants based on BNetzA and EnergyMap',
          color='#00592D', **style)

plt.legend(loc='lower left', frameon=False)

In [None]:
m = draw_germany()
# Draw map background
m.fillcontinents(color='white', lake_color='#eeeeee')
m.drawstates(color='lightgray')
m.drawcoastlines(color='lightgray')
m.drawcountries(color='lightgray')
m.drawmapboundary(fill_color='#eeeeee')

# get coordinates
lon = np.array(transp_data.longitude)
lat = np.array(transp_data.latitude)

# Make plot settings
style = dict(s=5, marker='o', alpha=0.5, zorder=2)
m.scatter(lon,lat, latlon=True,
          label='Renewable Power plants based on BNetzA and TSO',
          color='#00592D', **style)

plt.legend(loc='lower left', frameon=False)

# Documenting the meta data

The data packages meta data are provided in the specific format JSON, as proposed by the Open Knowledge Foundation. Please see the Frictionless Data project by OKFN (http://data.okfn.org/) and the Data Package specifications (http://dataprotocols.org/data-packages/) for more details.

In order to keep the notebook more readable, we formulate the metadata in the human-readable YAML format using a multi-line string and then parse the string into a Python dictionary and save that as a JSON file.

In [None]:
# Here we define the meta data of the resulting data package.

metadata = """
name: opsd-renewable-master-register-germany
title: List of renewable power plants DE
description: This dataset is for OPSD testing purposes and contains the master data of renewable power plants Germany.
version: "2016-02-10"
keywords: [master data register,power plants,renewables,germany]
resources:
    - path: renewable_power_plants_germany.csv
    - format: csv
    - mediatype: text/csv
      schema:  # Schema according to: http://dataprotocols.org/json-table-schema/        
          fields:
            - name: start_up_date
              description: Date of start up
              type: datetime
              format: YYYY-MM-DDThh:mm:ssZ  # This MUST be in ISO 8601 format of YYYY-MM-DDThh:mm:ssZ in UTC time according to http://dataprotocols.org/json-table-schema/
            - name: rated_power
              description: Rated power in kW
              type: number
              format: float
            - name: generation_type
              description: Type of generation / energy source
              type: number
            - name: subtype,
              description: subtype of generation
              type: text
            - name: thermal_capacity
              description: Installed thermal capacity in kW
              type: number
            - name: city
              description: City name of location
              type: text
            - name: tso
              description: Name of TSO  
              type: text    
            - name: longitude
              description: GPS Longitude coordinates
              type: number             
            - name: latitude
              description: GPS Latitude coordinates 
              type: number
            - name: eeg_id
              description: eeg remuneration number
              type: text
            - name: voltage_level
              description: Volatage level of grid connection
              type: text             
            - name: source
              description: Source name of entry
              type: text
licenses:
    - url: http://example.com/license/url/here
      name: License Name Here
      version: 1.0
      id: license-id-from-open
sources:
    - name: EnergyMap.info,
      web: http://www.energymap.info/download.html
    - name: Bundesnetzagentur - Veröffentlichung von im Anlagenregister registrierten Daten,
      web: http://www.bundesnetzagentur.de/cln_1422/DE/Sachgebiete/ElektrizitaetundGas/Unternehmen_Institutionen/ErneuerbareEnergien/Anlagenregister/Anlagenregister_Veroeffentlichung/Anlagenregister_Veroeffentlichungen_node.html
    - name: Bundesnetzagentur - Datenmeldungen und EEG-Vergütungssätze für Photovoltaikanlagen,
      web: http://www.bundesnetzagentur.de/cln_1431/DE/Sachgebiete/ElektrizitaetundGas/Unternehmen_Institutionen/ErneuerbareEnergien/Photovoltaik/DatenMeldgn_EEG-VergSaetze/DatenMeldgn_EEG-VergSaetze_node.html    
    - name: Netz-transparenz.de Informationsplattform der deutschen Übertragungsnetzbetreiber, 
      web: https://www.netztransparenz.de/de/Anlagenstammdaten.htm
maintainers:
    - name: OPSD Project,
      email: wolf-dieter.bunke@uni-flensburg.de,
      web: http://open-power-system-data.org/
views: null
    # You can put hints here which kind of graphs or maps make sense to display your data. 
    # Data Package Viewer at http://data.okfn.org/tools/view automatically display visualazations of your data.
    # See http://data.okfn.org/doc/data-package#views for more details.    
# extend your datapackage.json with attributes that are not
# part of the data package spec
# you can add your own attributes to a datapackage.json, too
openpowersystemdata-enable-listing: True  # This is just an example we don't actually make use of yet.
opsd-jupyter-notebook-url: https://github.com/Open-Power-System-Data/datapackage_renewable_power_plants/blob/master/renewable-energy-power-plants_DE.ipynb
"""

metadata = yaml.load(metadata)

datapackage_json = json.dumps(metadata, indent=4, separators=(',', ': '))

# Write Data Package to local memory

The Frictionless Data project's suggestions (see http://data.okfn.org/) are followed to write a data package to the local memory.

The data package consists of:
 * **datapackage.json** A file containing the meta data of the data package
 * ***.csv** One or multiple CSV data files
 * **README.md** A readme (which is a copy of this Jupyter Notebook converted to markdown format)

Following this file structure allows a number of tools (http://data.okfn.org/tools) designed specifically for data packages.

In [None]:
path_package = 'output/datapackage_renewables'

os.makedirs(path_package, exist_ok=True)
    
energymap_data.to_csv(os.path.join(path_package,
                            'renewable_power_plants_germany.csv'))

with open(os.path.join(path_package, 'datapackage.json'), 'w') as f:
    f.write(datapackage_json)
    
##
# Make sure the filename matches the filename of the notebook itself
##
nb_filename = 'renewable-energy-power-plants_DE.ipynb'

# Save a copy of the notebook to markdown, to serve as the package README file
subprocess.call(['jupyter', 'nbconvert', '--to', 'markdown', nb_filename])
path_readme = os.path.join(path_package, 'README.md')
try:
    os.remove(path_readme)
except Exception:
    pass
os.rename(nb_filename.replace('.ipynb', '.md'), path_readme)