# How to obtain weather data from MERRA-2 (Part 3): Processing raw data and compiling the data package

## About this Notebook
This Jupyter Notebook is part of the [Open Power System Data Project](http://www.open-power-system-data.org) and is written in Python 3. 

This is **Part 3** of the notebook. It aims to process the downloaded raw data and compiles the data package.

**_Before running this script make sure you ran [Part 2]()!_**

---

### Other notebooks
**[Part 1**](): Introduction

**[Part 2**](): Download raw data

### License

This notebook is published under [The MIT License](https://opensource.org/licenses/mit-license.php) license:

Copyright (c) 2016 [copyright holders]

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

### Table of contents
<p><div class="lev2"><a href="#About-this-Notebook"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>About this Notebook</a></div><div class="lev3"><a href="#Other-notebooks"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Other notebooks</a></div><div class="lev3"><a href="#License"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>License</a></div><div class="lev3"><a href="#Table-of-contents"><span class="toc-item-num">1.1.3&nbsp;&nbsp;</span>Table of contents</a></div><div class="lev3"><a href="#Script-Setup"><span class="toc-item-num">1.1.4&nbsp;&nbsp;</span>Script Setup</a></div><div class="lev3"><a href="#Import-downloaded-files-from-Part-2"><span class="toc-item-num">1.1.5&nbsp;&nbsp;</span>Import downloaded files from <a target="_blank" href="">Part 2</a></a></div><div class="lev2"><a href="#Cleaning-raw-data"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Cleaning raw data</a></div><div class="lev3"><a href="#calculating-height-profiles"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>calculating height profiles</a></div><div class="lev3"><a href="#calculating-wind-data"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>calculating wind data</a></div><div class="lev3"><a href="#calculating-solar-data"><span class="toc-item-num">1.2.3&nbsp;&nbsp;</span>calculating solar data</a></div><div class="lev3"><a href="#temperature"><span class="toc-item-num">1.2.4&nbsp;&nbsp;</span>temperature</a></div><div class="lev3"><a href="#roughness-length"><span class="toc-item-num">1.2.5&nbsp;&nbsp;</span>roughness length</a></div><div class="lev2"><a href="#Compiling-data-package"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Compiling data package</a></div><div class="lev3"><a href="#netCDF"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>netCDF</a></div><div class="lev3"><a href="#CSV"><span class="toc-item-num">1.3.2&nbsp;&nbsp;</span>CSV</a></div><div class="lev3"><a href="#Metadata-(JSON)"><span class="toc-item-num">1.3.3&nbsp;&nbsp;</span>Metadata (JSON)</a></div>

***

### Script Setup

In [None]:
# importing all necessary Python libraries for this Script

# Set up a log
logger = logging.getLogger('notebook')
logger.setLevel('INFO')
nb_root_logger = logging.getLogger()
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s'\
                              '- %(message)s',datefmt='%d %b %Y %H:%M:%S')
nb_root_logger.handlers[0].setFormatter(formatter)

# Create input and output folders if they don't exist
# os.makedirs('input/original_data', exist_ok=True)
# os.makedirs('output', exist_ok=True)
# os.makedirs('output/datapackage_renewables', exist_ok=True)

### Import downloaded files from [Part 2]()

## Cleaning raw data

### calculating height profiles

### calculating wind data

### calculating solar data

### temperature

### roughness length

## Compiling data package

### netCDF

### CSV

### Metadata (JSON)
The data packages meta data are created in the specific JSON format as proposed by the Open Knowledge Foundation. Please see the [Frictionless Data project by OKFN](http://data.okfn.org/) and the [Data Package specifications](http://dataprotocols.org/data-packages/) for more details.

In order to keep the notebook more readable the metadata is written in the human-readable YAML format using a multi-line string and then parse the string into a Python dictionary and save is as a JSON file.

In [1]:
'''
Einzelne Felder der JSON-Datei anpassen (hier nur aus RE-Skript reinkopiert)!
'''

# The meta data follows the specification at:
# http://dataprotocols.org/data-packages/
    
metadata = """
name: opsd-weather-data
title: Script to obtain weather data
description: >-
    This data package aims to provide an introduction to the MERRRA-2 weather
    dataset and a documented method to download, extract, organize and export
    weather data for the use in energy system models.  
version: "2016-07-12"
keywords: [weather data register]
geographical-scope: World
resources:
    - path: renewable_power_plants_germany.csv
      format: csv
      mediatype: text/csv
      schema:         
          fields:
            - name: start_up_date
              description: Date of start up/installation date
              type: datetime
              format: YYYY-MM-DDThh:mm:ssZ
            - name: electrical_capacity
              description: Installed electrical capacity in kW
              type: number
              format: float
              unit: kW
            - name: generation_type
              description: Type of generation / energy source
              type: text
            - name: generation_subtype
              description: Subtype of generation / energy source
              type: text
            - name: thermal_capacity
              description: Installed thermal capacity in kW
              type: number
              format: float
              unit: kW
            - name: city
              description: Name of location
              type: text
            - name: tso
              description: Name of TSO  
              type: text    
            - name: lon
              description: Longitude coordinates
              type: geopoint
              format: lon
            - name: lat
              description: Latitude coordinates 
              type: geopoint
              format: lat
            - name: eeg_id
              description: EEG (German feed-in tariff law) remuneration number
              type: text
            - name: power_plant_id
              description: Power plant identification number by BNetzA
              type: text
            - name: voltage_level
              description: Voltage level of grid connection
              type: text 
            - name: decommission_date
              description: Date of decommission
              type: datetime
              format: YYYY-MM-DDThh:mm:ssZ  
            - name: comment
              description: Validation comments
              type: text 
            - name: source
              description: Source of database entry
              type: text
              source: TransnetBW, TenneT, Amprion, 50Hertz, BNetzA_PV, BNetzA
    - path: renewable_capacity_germany_timeseries.csv
      format: csv
      mediatype: text/csv
      schema:         
          fields:
            - name: timestamp
              description: Start time of the day
              type: datetime
              format: YYYY-MM-DDThh:mm:ssZ
            - name: capacity_biomass_de
              description: Cumulated biomass electrical capacity
              type: number
            - name: capacity_wind_de
              description: Cumulated wind capacity
              type: number                 
            - name: capacity_solar_de
              description: Cumulated solar capacity
              type: number                
            - name: capacity_gas_de
              description: Cumulated gas electrical capacity
              type: number  
            - name: capacity_geothermal_de
              description: Cumulated geothermal electrical capacity
              type: number 
            - name: capacity_hydro_de
              description: Cumulated hydro capacity
              type: number  
    - path: renewable_power_plants_germany.xlsx
      format: xlsx
      mediatype: xlsx
      schema:         
          fields:
            - name: start_up_date
              description: Date of start up/installation date
              type: datetime
              format: YYYY-MM-DDThh:mm:ssZ
            - name: electrical_capacity
              description: Installed electrical capacity in kW
              type: number
              format: float
              unit: kW
            - name: generation_type
              description: Type of generation / energy source
              type: text
            - name: generation_subtype
              description: Subtype of generation / energy source
              type: text
            - name: thermal_capacity
              description: Installed thermal capacity in kW
              type: number
              format: float
              unit: kW
            - name: city
              description: Name of location
              type: text
            - name: tso
              description: Name of TSO  
              type: text    
            - name: lon
              description: Longitude coordinates
              type: geopoint
              format: lon
            - name: lat
              description: Latitude coordinates 
              type: geopoint
              format: lat
            - name: eeg_id
              description: EEG (German feed-in tariff law) remuneration number
              type: text
            - name: power_plant_id
              description: Power plant identification number by BNetzA
              type: text
            - name: voltage_level
              description: Voltage level of grid connection
              type: text 
            - name: decommission_date
              description: Date of decommission
              type: datetime
              format: YYYY-MM-DDThh:mm:ssZ  
            - name: comment
              description: Validation comments
              type: text 
            - name: source
              description: Source of database entry
              type: text
              source: TransnetBW, TenneT, Amprion, 50Hertz, BNetzA_PV, BNetzA
licenses:
    - url: http://example.com/license/url/here
      name: License Name Here
      version: 1.0
      id: license-id-from-open
sources:
    - name: Bundesnetzagentur - register of renewable power plants (excl. PV)
      web: http://www.bundesnetzagentur.de/cln_1422/DE/Sachgebiete/ElektrizitaetundGas/Unternehmen_Institutionen/ErneuerbareEnergien/Anlagenregister/Anlagenregister_Veroeffentlichung/Anlagenregister_Veroeffentlichungen_node.html
      source: BNetzA
    - name: Bundesnetzagentur - register of PV power plants
      web: http://www.bundesnetzagentur.de/cln_1431/DE/Sachgebiete/ElektrizitaetundGas/Unternehmen_Institutionen/ErneuerbareEnergien/Photovoltaik/DatenMeldgn_EEG-VergSaetze/DatenMeldgn_EEG-VergSaetze_node.html    
      source: BNetzA_PV
    - name: Netztransparenz.de - information platform of German TSOs (register of renewable power plants in their control area)
      web: https://www.netztransparenz.de/de/Anlagenstammdaten.htm
      source: TransnetBW, TenneT, Amprion, 50Hertz
maintainers:
    - name: Wolf-Dieter Bunke
      email: wolf-dieter.bunke@uni-flensburg.de
      web: http://open-power-system-data.org/
views: True
openpowersystemdata-enable-listing: True
opsd-jupyter-notebook-url: https://github.com/Open-Power-System-Data/datapackage_renewable_power_plants/blob/master/main.ipynb
opsd-changes-to-last-version: Update of output data 
"""

metadata = yaml.load(metadata)

datapackage_json = json.dumps(metadata, indent=4, separators=(',', ': '))

# Write the information of the metadata
with open(os.path.join(path_package, 'datapackage.json'), 'w') as f:
    f.write(datapackage_json)

NameError: name 'yaml' is not defined