# Table of Contents
 <p><div class="lev1"><a href="#How-to-obtain-weather-data-from-MERRA-2-(Part-2):-Download-raw-data"><span class="toc-item-num">1&nbsp;&nbsp;</span>How to obtain weather data from MERRA-2 (Part 2): Download raw data</a></div><div class="lev2"><a href="#About-this-Notebook"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>About this Notebook</a></div><div class="lev3"><a href="#Other-notebooks"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Other notebooks</a></div><div class="lev3"><a href="#License"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>License</a></div><div class="lev3"><a href="#Table-of-contents"><span class="toc-item-num">1.1.3&nbsp;&nbsp;</span>Table of contents</a></div><div class="lev1"><a href="#Script-Setup"><span class="toc-item-num">2&nbsp;&nbsp;</span>Script Setup</a></div><div class="lev1"><a href="#Download-raw-data"><span class="toc-item-num">3&nbsp;&nbsp;</span>Download raw data</a></div><div class="lev2"><a href="#Input"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Input</a></div><div class="lev3"><a href="#Parameters-choices"><span class="toc-item-num">3.1.1&nbsp;&nbsp;</span>Parameters choices</a></div><div class="lev3"><a href="#Timeframe"><span class="toc-item-num">3.1.2&nbsp;&nbsp;</span>Timeframe</a></div><div class="lev3"><a href="#Geography-coordinates"><span class="toc-item-num">3.1.3&nbsp;&nbsp;</span>Geography coordinates</a></div><div class="lev2"><a href="#Subsetting-data"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Subsetting data</a></div><div class="lev2"><a href="#Downloading-data"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Downloading data</a></div><div class="lev1"><a href="#Setting-up-dataframe(s)"><span class="toc-item-num">4&nbsp;&nbsp;</span>Setting up dataframe(s)</a></div><div class="lev2"><a href="#Concatenating/combining-individual-files"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Concatenating/combining individual files</a></div><div class="lev2"><a href="#First-look-at-the-final-data-frame-structure-and-format"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>First look at the final data frame structure and format</a></div><div class="lev2"><a href="#Saving-dataframe"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Saving dataframe</a></div>

# How to obtain weather data from MERRA-2 (Part 2): Download raw data

## About this Notebook
This Jupyter Notebook is part of the [Open Power System Data Project](http://www.open-power-system-data.org) and is written in Python 3. 

This is **Part 2** of the notebook. It aims to download data from the MERRRA-2 weather dataset.

---

### Other notebooks
**Part 1**: Introduction

**Part 3**: Processing raw data and compiling the data package

### License

This notebook is published under [The MIT License](https://opensource.org/licenses/mit-license.php) license:

Copyright (c) 2016 [copyright holders]

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

### Table of contents
 <p><div class="lev2"><a href="#About-this-Notebook"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>About this Notebook</a></div><div class="lev3"><a href="#Other-notebooks"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Other notebooks</a></div><div class="lev3"><a href="#License"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>License</a></div><div class="lev3"><a href="#Table-of-contents"><span class="toc-item-num">1.1.3&nbsp;&nbsp;</span>Table of contents</a></div><div class="lev1"><a href="#Script-Setup"><span class="toc-item-num">2&nbsp;&nbsp;</span>Script Setup</a></div><div class="lev1"><a href="#Download-raw-data"><span class="toc-item-num">3&nbsp;&nbsp;</span>Download raw data</a></div><div class="lev2"><a href="#Input"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Input</a></div><div class="lev3"><a href="#Parameters-choices"><span class="toc-item-num">3.1.1&nbsp;&nbsp;</span>Parameters choices</a></div><div class="lev3"><a href="#Timeframe"><span class="toc-item-num">3.1.2&nbsp;&nbsp;</span>Timeframe</a></div><div class="lev3"><a href="#Geography-coordinates"><span class="toc-item-num">3.1.3&nbsp;&nbsp;</span>Geography coordinates</a></div><div class="lev2"><a href="#Subsetting-data"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Subsetting data</a></div><div class="lev2"><a href="#Downloading-data"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Downloading data</a></div><div class="lev1"><a href="#Setting-up-dataframe(s)"><span class="toc-item-num">4&nbsp;&nbsp;</span>Setting up dataframe(s)</a></div><div class="lev2"><a href="#Concatenating/combining-individual-files"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Concatenating/combining individual files</a></div><div class="lev2"><a href="#First-look-at-the-final-data-frame-structure-and-format"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>First look at the final data frame structure and format</a></div><div class="lev2"><a href="#Saving-dataframe"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Saving dataframe</a></div>
 
***

# Script Setup

In [1]:
# importing all necessary Python libraries for this Script
import pandas as pd
import xarray as xr
import requests
import logging
import os
import datetime
from calendar import monthrange

# Set up a log
log = logging.getLogger('notebook')
log.setLevel(logging.DEBUG)
log.addHandler(logging.StreamHandler())
nb_root_logger = logging.getLogger()
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s'\
                              '- %(message)s',datefmt='%d %b %Y %H:%M:%S')
nb_root_logger.handlers[0].setFormatter(formatter)

# create download folder
os.makedirs('download', exist_ok=True)
# os.makedirs('output', exist_ok=True)
# os.makedirs('output/datapackage_renewables', exist_ok=True)

---
# Download raw data

## Input
### Parameters choices
Definition of Input parameters for creating URL with OPeNDAP. Which parameters shall be included in the weather data package?

**general parameters**
- tavg1_2d_slv_Nx = M2T1NXSLV:
  - H850: Height at 850 hPa
  - H500: Height at 500 hPa
  - H250: Height at 250 hPa
  - DISPH: Displacement height
  - ftp://goldsmr4.sci.gsfc.nasa.gov/data/s4pa/MERRA2/M2T1NXSLV.5.12.4/
  - via OPenDAP: http://goldsmr4.sci.gsfc.nasa.gov/opendap/MERRA2/M2T1NXSLV.5.12.4/contents.html
  
**Wind Speed** 
- tavg1_2d_slv_Nx = M2T1NXSLV
  - U2M: Eastward wind at 2 m above displacement height
  - U10M: Eastward wind at 10 m above displacement height
  - U50M: Eastward wind at 50 m above surface
  - U850: Eastward wind at 850 hPa
  - U500: Eastward wind at 500 hPa
  - U250: Eastward wind at 250 hPa
  - V2M: Northward wind at 2 m above displacement height
  - V10M: Northward wind at 10 m above displacement height
  - V50M: Northward wind at 50 m above surface
  - V850: Northward wind at 850 hPa
  - V500: Northward wind at 500 hPa
  - V250: Northward wind at 250 hPa
  - ftp://goldsmr4.sci.gsfc.nasa.gov/data/s4pa/MERRA2/M2T1NXSLV.5.12.4/
  - via OPenDAP: http://goldsmr4.sci.gsfc.nasa.gov/opendap/MERRA2/M2T1NXSLV.5.12.4/contents.html
  
- tavg1_2d_flx_Nx = M2T1NXFLX:
  - Z0M: Roughness length, momentum
  - ftp://goldsmr4.sci.gsfc.nasa.gov/data/s4pa/MERRA2/M2T1NXFLX.5.12.4/
  - via OPenDAP: http://goldsmr4.sci.gsfc.nasa.gov/opendap/MERRA2/M2T1NXFLX.5.12.4/contents.html

später: **Temperature** (tavg1_2d_slv_Nx = M2T1NXSLV)
- TS: Surface skin temperature
- T2M Temperature at 2 m above the displacement height
- T10M: Temperature at 10 m above the displacement height
- T850: Temperature at 850 hPa
- T500: Temperature at 500 hPa
- T250: Temperature at 250 hPa
- ftp://goldsmr4.sci.gsfc.nasa.gov/data/s4pa/MERRA2/M2T1NXSLV.5.12.4/
- via OPenDAP: http://goldsmr4.sci.gsfc.nasa.gov/opendap/MERRA2/M2T1NXSLV.5.12.4/contents.html

später: **Solar Radiation** (tavg1_2d_rad_Nx = M2T1NXRAD)
- SWGDN: Surface incident shortwave flux (incident = einfallender Strahl)
- SWGDNCLR: Surface incident shortwave flux assuming clear sky
- SWGNT: Surface net downward shortwave flux
- SWGNTCLR: Surface net downward shortwave flux assuming clear sky
- SWGNTCLN: Surface net downward shortwave flux assuming clean sky
- SWGNTCLRCLN: Surface net downward shortwave flux assuming clear clean sky
- SWTDN: top-of-atmosphere incoming shortwave flux
- _possibly more_
- ftp://goldsmr4.sci.gsfc.nasa.gov/data/s4pa/MERRA2/M2T1NXRAD.5.12.4/
- via OPenDAP: http://goldsmr4.sci.gsfc.nasa.gov/opendap/MERRA2/M2T1NXRAD.5.12.4/contents.html

### Parameter selection 
These are the possible parameters:
* wind
* solar radiation
* temperature

If you want to select more than one parameter, separate them with commas. For example: wind, solar radiation, temperature

In [2]:
# Getting user input
possible_params = ['wind', 'solar radiation', 'temperature']

def test_user_input(para):
    '''Tests if too few or too many parameters are provided and if the parameters are legit.'''
    if len(para) < 1 or len(para) > 3:
        return False
    for p in para:
        if p not in possible_params:
            return False
        else: 
            pass
    return True


# params = input('Please provide atleast one parameter: ')
# params = params.split(',')
# removes whitespace in front of the provided parameters.
# params = [p.lstrip() for p in params]
    
# if not test_user_input(params):
#    raise Exception('Something is wrong with the provided parameters: '+str(params))
# params

# for testing
params = ['wind']

In [3]:
# "translation" of input to list of needed dataset parameters (see above)
# download general parameters always
# download other parameters/datasets only when needed (see list above)

BASE_URL = 'http://goldsmr4.sci.gsfc.nasa.gov:80/opendap/MERRA2/M2T1NXSLV.5.12.4/'


### Timeframe

Definition of desired timespan data is needed for. (Optional: daily, monthly, yearly aggregation)

In [4]:
# TODO: User input 
# User input of timespan
download_year = 2014 # Testcase Schlewswig Holstein, 2014, hourly data
download_month = '01'
download_day = '01'



### Geography coordinates
Definition of desired coordinates (rectangular area) data is needed for -> corner coordinates input

In [5]:
# User input of coordinates
# ------
# Bsp. Schleswig-Holstein (lat/lon in WGS84 Dezimalgrad)
# Nordost-Punkt; 55,036823°N, 11,349297°E
# Südwest-Punkt; 53,366266°N, 7,887088°E

# one point example - not in use
lat = 0
lon = 0


## Subsetting data

Combining parameter choices above/translation according to OPenDAP guidelines into URL-appendix

In [6]:
'''
"translation" of input to desired URL parameter
Creation of links to relevant datasets (see above) with chosen parameters
add URL parameter to dataset link
dataset links see above
Beispiel-Schema für Wind-Dataset tavg1_2d_slv_Nx (unterteilt in tägliche Datensätze):
Link für Datensatz für 01.01.1980: http://goldsmr4.sci.gsfc.nasa.gov/opendap/MERRA2/M2T1NXSLV.5.12.4/1980/01/MERRA2_100.tavg1_2d_slv_Nx.19800101.nc4
-> Lädt kompletten Datensatz herunter (alle Parameter, gesamte Welt, 24 Stunden)
-> .html anhängen = manualles subsetting
z.B. time [0:1:23] = jeder Zeitschritt (hier:Stunde) zwischen 0 und 23 Uhr, [0:2:23] = nur jede 2. Stunde etc.
z.B. lat/lon in Schritten angegeben
- latitude (Breitengrad Nord/Süd) 360*0,5°-Schritte
- longitude (Längengrad Ost/West) 575*0,625°-Schritte
'''

def translate_year_to_file_number(year):
    """The file names basically consist of a number and a meta data string. The number changes over the year. 1980
       until 1991 it is 100, 1992 until 2000 it is 200, 2001 until 2010 it is 300 and from 2011 until now it is 400.
    """
    file_number = ''
    
    if year >= 1980 and year < 1992:
        file_number = '100'
    elif year >= 1992 and year < 2001:
        file_number = '200'
    elif year >= 2001 and year < 2011:
        file_number = '300'
    elif year >= 2011:
        file_number = '400'
    else:
        raise Exception('The specified year is out of range.')
    
    return file_number
    
#file_name = 'MERRA2_' + translate_year_to_file_number(download_year) + '.tavg1_2d_slv_Nx.' \
#            + str(download_year) + download_month + download_day + '.nc4'

# Build the parameter
# no need to query for disph
url_params = ['U2M']# ,'U10M','U50M','V2M','V10M','V50M']
# time
url_params = map(lambda x: x + '[0:1:23]' , url_params)
# lat
url_params = map(lambda x: x + '[360]', url_params)
# lon
url_params = map(lambda x: x + '[575]', url_params) 

url_params = ','.join(url_params)
    
    
    
generated_URL = []
# generate the download links
for y in [download_year]: # only for testing
    # build the file_number
    y_str = str(y)
    file_num = translate_year_to_file_number(download_year)
    for m in range(1,13):
        # build the month string: for the month 1 - 9 it starts with a leading 0. zfill solves that problem
        m_str = str(m).zfill(2)
        # monthrange returns the first weekday and the number of days in a month. Also works for leap years.
        _, nr_of_days = monthrange(y, m)
        for d in range(1,nr_of_days+1):
            d_str = str(d).zfill(2)
            file_name = 'MERRA2_' + file_num + '.tavg1_2d_slv_Nx.' + y_str + m_str + d_str + '.nc4'
            query = BASE_URL  + y_str + '/'+ m_str + '/' + file_name + '.nc4?' + url_params    
            file_path = file_path = os.path.join(os.curdir, 'download', file_name)
            generated_URL.append([query,file_name,file_path])
            
            

print('Queries generated: ' + str(len(generated_URL)))



Queries generated: 365


## Downloading data

In [7]:

# create file path for the download



# download data (one file per day and dataset) with links to local directory
def download_and_save_file(url, file_path):
    counter = 0
    # NOTE the stream=True parameter
    r = requests.get(url, stream=True)
    with open(file_path, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
    return r.status_code
    
def estimate_time(number_of_url):
    seconds = number_of_url * 7
    time = str(datetime.timedelta(seconds=666))
    return time
    
    
print('Downloading files')
counter = 0
for item in generated_URL:
    query, _, file_path = item
    counter += 1
    if (counter % 30) == 0:
        print('Downloading files - Files left: ' +  str((len(generated_URL) - counter)))
    %time download_and_save_file(query, file_path)
    




Wall time: 7.37 s
Wall time: 10.3 s
Wall time: 7.44 s
Wall time: 6.75 s
Wall time: 5.45 s
Wall time: 5.13 s
Wall time: 6.41 s
Wall time: 5.8 s
Wall time: 5.97 s
Wall time: 6.81 s
Wall time: 6.12 s
Wall time: 6.51 s
Wall time: 6.62 s
Wall time: 5.95 s
Wall time: 4.82 s
Wall time: 6.79 s
Wall time: 7.26 s
Wall time: 9.17 s
Wall time: 6.89 s
Wall time: 8.88 s
Wall time: 13.2 s
Wall time: 11 s
Wall time: 7.13 s
Wall time: 6.68 s
Wall time: 9.68 s
Wall time: 7.07 s
Wall time: 6.92 s
Wall time: 9.45 s
Wall time: 8.25 s
Downloading files - Files left: 335
Wall time: 7.15 s
Wall time: 9.03 s


KeyboardInterrupt: 

---
# Setting up dataframe(s)

In [18]:
# create dataframe(s)
file_path = os.path.join('download_testing', '*.nc4')
print(file_path)
ds = xr.open_dataset('download_testing\\MERRA2_400.tavg1_2d_slv_Nx.20140101.nc4')
ds

download_testing\*.nc4


<xarray.Dataset>
Dimensions:  (lat: 1, lon: 1, time: 24)
Coordinates:
  * lat      (lat) int64 0
  * lon      (lon) int64 0
  * time     (time) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
Data variables:
    U2M      (time, lat, lon) float64 2.023 2.007 1.93 1.82 1.778 1.787 ...
Attributes:
    HDF5_GLOBAL.History: Original file generated: Mon Dec  8 12:24:17 2014 GMT
    HDF5_GLOBAL.Comment: GMAO filename: d5124_m2_jan10.tavg1_2d_slv_Nx.20140101.nc4
    HDF5_GLOBAL.Filename: MERRA2_400.tavg1_2d_slv_Nx.20140101.nc4
    HDF5_GLOBAL.Conventions: CF-1
    HDF5_GLOBAL.Institution: NASA Global Modeling and Assimilation Office
    HDF5_GLOBAL.References: http://gmao.gsfc.nasa.gov
    HDF5_GLOBAL.Format: NetCDF-4/HDF-5
    HDF5_GLOBAL.SpatialCoverage: global
    HDF5_GLOBAL.VersionID: 5.12.4
    HDF5_GLOBAL.TemporalRange: 1980-01-01 -> 2016-12-31
    HDF5_GLOBAL.identifier_product_doi_authority: http://dx.doi.org/
    HDF5_GLOBAL.ShortName: M2T1NXSLV
    HDF5_GLOBAL.GranuleID:

## Concatenating/combining individual files
Combine individual (daily) dataset files to one single dataframe

In [None]:
# load/concatenate all files -> one dataframe per daily dataset

In [None]:
# combine all individual dataframes to one dataframe in total

## First look at the final data frame structure and format

In [None]:
# DataFrame.info()

## Saving dataframe
Save the final dataframe locally

---
The further processing of the data frame and the compiling of the final data package can be found in [Part3](weather_data_3_processing.ipynb) of this script.