# Table of Contents
 <p><div class="lev1"><a href="#How-to-obtain-weather-data-from-MERRA-2-(Part-2):-Download-raw-data"><span class="toc-item-num">1&nbsp;&nbsp;</span>How to obtain weather data from MERRA-2 (Part 2): Download raw data</a></div><div class="lev2"><a href="#About-this-Notebook"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>About this Notebook</a></div><div class="lev3"><a href="#Other-notebooks"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Other notebooks</a></div><div class="lev3"><a href="#License"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>License</a></div><div class="lev3"><a href="#Table-of-contents"><span class="toc-item-num">1.1.3&nbsp;&nbsp;</span>Table of contents</a></div><div class="lev1"><a href="#Script-Setup"><span class="toc-item-num">2&nbsp;&nbsp;</span>Script Setup</a></div><div class="lev1"><a href="#Download-raw-data"><span class="toc-item-num">3&nbsp;&nbsp;</span>Download raw data</a></div><div class="lev2"><a href="#Input"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Input</a></div><div class="lev3"><a href="#Parameters-choices"><span class="toc-item-num">3.1.1&nbsp;&nbsp;</span>Parameters choices</a></div><div class="lev3"><a href="#Timeframe"><span class="toc-item-num">3.1.2&nbsp;&nbsp;</span>Timeframe</a></div><div class="lev3"><a href="#Geography-coordinates"><span class="toc-item-num">3.1.3&nbsp;&nbsp;</span>Geography coordinates</a></div><div class="lev2"><a href="#Subsetting-data"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Subsetting data</a></div><div class="lev2"><a href="#Downloading-data"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Downloading data</a></div><div class="lev1"><a href="#Setting-up-dataframe(s)"><span class="toc-item-num">4&nbsp;&nbsp;</span>Setting up dataframe(s)</a></div><div class="lev2"><a href="#Concatenating/combining-individual-files"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Concatenating/combining individual files</a></div><div class="lev2"><a href="#First-look-at-the-final-data-frame-structure-and-format"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>First look at the final data frame structure and format</a></div><div class="lev2"><a href="#Saving-dataframe"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Saving dataframe</a></div>

# How to obtain weather data from MERRA-2 (Part 2): Download raw data

## About this Notebook
This Jupyter Notebook is part of the [Open Power System Data Project](http://www.open-power-system-data.org) and is written in Python 3. 

This is **Part 2** of the notebook. It aims to download data from the MERRRA-2 weather dataset.

---

### Other notebooks
**Part 1**: Introduction

**Part 3**: Processing raw data and compiling the data package

### License

This notebook is published under [The MIT License](https://opensource.org/licenses/mit-license.php) license:

Copyright (c) 2016 [copyright holders]

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

### Table of contents
 <p><div class="lev2"><a href="#About-this-Notebook"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>About this Notebook</a></div><div class="lev3"><a href="#Other-notebooks"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Other notebooks</a></div><div class="lev3"><a href="#License"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>License</a></div><div class="lev3"><a href="#Table-of-contents"><span class="toc-item-num">1.1.3&nbsp;&nbsp;</span>Table of contents</a></div><div class="lev1"><a href="#Script-Setup"><span class="toc-item-num">2&nbsp;&nbsp;</span>Script Setup</a></div><div class="lev1"><a href="#Download-raw-data"><span class="toc-item-num">3&nbsp;&nbsp;</span>Download raw data</a></div><div class="lev2"><a href="#Input"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Input</a></div><div class="lev3"><a href="#Parameters-choices"><span class="toc-item-num">3.1.1&nbsp;&nbsp;</span>Parameters choices</a></div><div class="lev3"><a href="#Timeframe"><span class="toc-item-num">3.1.2&nbsp;&nbsp;</span>Timeframe</a></div><div class="lev3"><a href="#Geography-coordinates"><span class="toc-item-num">3.1.3&nbsp;&nbsp;</span>Geography coordinates</a></div><div class="lev2"><a href="#Subsetting-data"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Subsetting data</a></div><div class="lev2"><a href="#Downloading-data"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Downloading data</a></div><div class="lev1"><a href="#Setting-up-dataframe(s)"><span class="toc-item-num">4&nbsp;&nbsp;</span>Setting up dataframe(s)</a></div><div class="lev2"><a href="#Concatenating/combining-individual-files"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Concatenating/combining individual files</a></div><div class="lev2"><a href="#First-look-at-the-final-data-frame-structure-and-format"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>First look at the final data frame structure and format</a></div><div class="lev2"><a href="#Saving-dataframe"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Saving dataframe</a></div>
 
***

# Script Setup

In [1]:
# importing all necessary Python libraries for this Script
import pandas as pd
import xarray as xr
import numpy as np
import requests
import logging
import os
from datetime import datetime
from calendar import monthrange
from opendap_download.multi_processing_download import DownloadManager
import math
from functools import partial
import re

# Set up a log
logging.basicConfig(level=logging.INFO)
log = logging.getLogger('notebook')


---
# Download raw data

## Input
### Parameters choices
Definition of Input parameters for creating URL with OPeNDAP. Which parameters shall be included in the weather data package?

**general parameters**
- tavg1_2d_slv_Nx = M2T1NXSLV:
  - H850: Height at 850 hPa
  - H500: Height at 500 hPa
  - H250: Height at 250 hPa
  - DISPH: Displacement height
  - ftp://goldsmr4.sci.gsfc.nasa.gov/data/s4pa/MERRA2/M2T1NXSLV.5.12.4/
  - via OPenDAP: http://goldsmr4.sci.gsfc.nasa.gov/opendap/MERRA2/M2T1NXSLV.5.12.4/contents.html
  
**Wind Speed** 
- tavg1_2d_slv_Nx = M2T1NXSLV
  - U2M: Eastward wind at 2 m above displacement height
  - U10M: Eastward wind at 10 m above displacement height
  - U50M: Eastward wind at 50 m above surface
  - U850: Eastward wind at 850 hPa
  - U500: Eastward wind at 500 hPa
  - U250: Eastward wind at 250 hPa
  - V2M: Northward wind at 2 m above displacement height
  - V10M: Northward wind at 10 m above displacement height
  - V50M: Northward wind at 50 m above surface
  - V850: Northward wind at 850 hPa
  - V500: Northward wind at 500 hPa
  - V250: Northward wind at 250 hPa
  - ftp://goldsmr4.sci.gsfc.nasa.gov/data/s4pa/MERRA2/M2T1NXSLV.5.12.4/
  - via OPenDAP: http://goldsmr4.sci.gsfc.nasa.gov/opendap/MERRA2/M2T1NXSLV.5.12.4/contents.html
  
- tavg1_2d_flx_Nx = M2T1NXFLX:
  - Z0M: Roughness length, momentum
  - ftp://goldsmr4.sci.gsfc.nasa.gov/data/s4pa/MERRA2/M2T1NXFLX.5.12.4/
  - via OPenDAP: http://goldsmr4.sci.gsfc.nasa.gov/opendap/MERRA2/M2T1NXFLX.5.12.4/contents.html

später: **Temperature** (tavg1_2d_slv_Nx = M2T1NXSLV)
- TS: Surface skin temperature
- T2M Temperature at 2 m above the displacement height
- T10M: Temperature at 10 m above the displacement height
- T850: Temperature at 850 hPa
- T500: Temperature at 500 hPa
- T250: Temperature at 250 hPa
- ftp://goldsmr4.sci.gsfc.nasa.gov/data/s4pa/MERRA2/M2T1NXSLV.5.12.4/
- via OPenDAP: http://goldsmr4.sci.gsfc.nasa.gov/opendap/MERRA2/M2T1NXSLV.5.12.4/contents.html

später: **Solar Radiation** (tavg1_2d_rad_Nx = M2T1NXRAD)
- SWGDN: Surface incident shortwave flux (incident = einfallender Strahl)
- SWGDNCLR: Surface incident shortwave flux assuming clear sky
- SWGNT: Surface net downward shortwave flux
- SWGNTCLR: Surface net downward shortwave flux assuming clear sky
- SWGNTCLN: Surface net downward shortwave flux assuming clean sky
- SWGNTCLRCLN: Surface net downward shortwave flux assuming clear clean sky
- SWTDN: top-of-atmosphere incoming shortwave flux
- _possibly more_
- ftp://goldsmr4.sci.gsfc.nasa.gov/data/s4pa/MERRA2/M2T1NXRAD.5.12.4/
- via OPenDAP: http://goldsmr4.sci.gsfc.nasa.gov/opendap/MERRA2/M2T1NXRAD.5.12.4/contents.html

### Parameter selection 
These are the possible parameters:
* wind
* solar radiation
* temperature

If you want to select more than one parameter, separate them with commas. For example: wind, solar radiation, temperature

In [2]:
# Getting user input
possible_params = ['wind', 'solar radiation', 'temperature']

def test_user_input(para):
    '''Tests if too few or too many parameters are provided 
    and if the parameters are legit.'''
    if len(para) < 1 or len(para) > 3:
        return False
    for p in para:
        if p not in possible_params:
            return False
        else: 
            pass
    return True



### Timeframe

Definition of desired timespan data is needed for. (Optional: daily, monthly, yearly aggregation)

In [3]:
# TODO: User input 
# User input of timespan
download_year = 2014 # Testcase Schlewswig Holstein, 2014, hourly data
# download_month = '01'
# download_day = '01'



### Geography coordinates
Definition of desired coordinates (rectangular area) data is needed for -> corner coordinates input

In [4]:
# User input of coordinates
# ------
# Bsp. Schleswig-Holstein (lat/lon in WGS84 Dezimalgrad)
# Nordost-Punkt; 55,036823°N, 11,349297°E
# Südwest-Punkt; 53,366266°N, 7,887088°E

lat_1, lon_1 = 53.366266, 7.887088 
lat_2, lon_2 = 55.036823, 11.349297


def translate_lat_to_geos5_native(latitude):
    """
    The source for this formula is in the MERRA2 
    Variable Details - File specifications for GEOS pdf file.
    
    latitude: float Needs +/- instead of N/S
    """
    return 1 + ((latitude + 90) / 0.5)

def translate_lon_to_geos5_native(longitude):
    """See function above"""
    return 1 + ((longitude + 180) / (2/3))

def find_closest_coordinate(calc_coord, coord_array):
    index = (np.abs(coord_array-calc_coord)).argmin()
    return coord_array[index]

# The arrays contain the coordinates of the grid used by the API
lat_coords = np.arange(0, 360, dtype=int)
lon_coords = np.arange(0, 575, dtype=int)

lat_coord_1 = translate_lat_to_geos5_native(lat_1)
lon_coord_1 = translate_lon_to_geos5_native(lon_1)
lat_coord_2 = translate_lat_to_geos5_native(lat_2)
lon_coord_2 = translate_lon_to_geos5_native(lon_2)


# find the closest coordinate in the grid
lat_co_1_closest = find_closest_coordinate(lat_coord_1, lat_coords)
lon_co_1_closest = find_closest_coordinate(lon_coord_1, lon_coords)
lat_co_2_closest = find_closest_coordinate(lat_coord_2, lat_coords)
lon_co_2_closest = find_closest_coordinate(lon_coord_2, lon_coords)

# TODO: Make sure the user knows the difference between given and calculated
# degree (point)

log.info('Calculated coordinates for point 1: ' + str((lat_coord_1, lon_coord_1)))
log.info('Closest coordinates for point 1: ' + str((lat_co_1_closest, lon_co_1_closest)))
log.info('Calculated coordinates for point 2: ' + str((lat_coord_2, lon_coord_2)))
log.info('Closest coordinates for point 2: ' + str((lat_co_2_closest, lon_co_2_closest)))




INFO:notebook:Calculated coordinates for point 1: (287.732532, 282.83063200000004)
INFO:notebook:Closest coordinates for point 1: (288, 283)
INFO:notebook:Calculated coordinates for point 2: (291.073646, 288.0239455)
INFO:notebook:Closest coordinates for point 2: (291, 288)


## Subsetting data

Combining parameter choices above/translation according to OPenDAP guidelines into URL-appendix

In [5]:
'''
"translation" of input to desired URL parameter
Creation of links to relevant datasets (see above) with chosen parameters
add URL parameter to dataset link
dataset links see above
Beispiel-Schema für Wind-Dataset tavg1_2d_slv_Nx (unterteilt in tägliche Datensätze):
Link für Datensatz für 01.01.1980: http://goldsmr4.sci.gsfc.nasa.gov/opendap/MERRA2/M2T1NXSLV.5.12.4/1980/01/MERRA2_100.tavg1_2d_slv_Nx.19800101.nc4
-> Lädt kompletten Datensatz herunter (alle Parameter, gesamte Welt, 24 Stunden)
-> .html anhängen = manualles subsetting
z.B. time [0:1:23] = jeder Zeitschritt (hier:Stunde) zwischen 0 und 23 Uhr, [0:2:23] = nur jede 2. Stunde etc.
z.B. lat/lon in Schritten angegeben
- latitude (Breitengrad Nord/Süd) 360*0,5°-Schritte
- longitude (Längengrad Ost/West) 575*0,625°-Schritte
'''

def translate_year_to_file_number(year):
    """
    The file names basically consist of a number and a meta data string. 
    The number changes over the year. 1980
    until 1991 it is 100, 1992 until 2000 it is 200, 2001 until 2010 it is 
    300 and from 2011 until now it is 400.
    """
    file_number = ''
    
    if year >= 1980 and year < 1992:
        file_number = '100'
    elif year >= 1992 and year < 2001:
        file_number = '200'
    elif year >= 2001 and year < 2011:
        file_number = '300'
    elif year >= 2011:
        file_number = '400'
    else:
        raise Exception('The specified year is out of range.')
    
    return file_number
    


def generate_url_params(parameter, time_para, lat_para, lon_para):
    '''Creates a string containing all the parameters in query form'''
    parameter = map(lambda x: x + time_para, parameter)
    parameter = map(lambda x: x + lat_para, parameter)
    parameter = map(lambda x: x + lon_para, parameter)
    
    return ','.join(parameter)
    
    

def generate_download_links(download_years, base_url, dataset_name, url_params):
    '''
    Generates the links for the download. 
    download_years: The years you want to download as array. 
    dataset_name: The name of the data set. For example tavg1_2d_slv_Nx
    '''
    urls = []
    for y in download_years: # only for testing
    # build the file_number
        y_str = str(y)
        file_num = translate_year_to_file_number(download_year)
        for m in range(1,13):
            # build the month string: for the month 1 - 9 it starts with a leading 0. zfill solves that problem
            m_str = str(m).zfill(2)
            # monthrange returns the first weekday and the number of days in a month. Also works for leap years.
            _, nr_of_days = monthrange(y, m)
            for d in range(1,nr_of_days+1):
                d_str = str(d).zfill(2)
                file_name = 'MERRA2_' + file_num + '.'+dataset_name+'.' + y_str + m_str + d_str + '.nc4'
                query = base_url  + y_str + '/'+ m_str + '/' + file_name + '.nc4?' + url_params
                urls.append(query)
    return urls

requested_params = ['U2M', 'U10M', 'U50M', 'V2M', 'V10M', 'V50M', 'DISPH']
requested_time = '[0:1:23]'
requested_lat = '[' + str(lat_co_1_closest) + ':1:' + str(lat_co_2_closest) + ']'
requested_lon = '[' + str(lon_co_1_closest) + ':1:' + str(lon_co_2_closest) + ']'



parameter = generate_url_params(requested_params, requested_time,
                                requested_lat, requested_lon)

BASE_URL = 'http://goldsmr4.sci.gsfc.nasa.gov:80/opendap/MERRA2/M2T1NXSLV.5.12.4/'
generated_URL = generate_download_links([download_year], BASE_URL, 
                                        'tavg1_2d_slv_Nx', 
                                        parameter)
            
        
log.debug('Queries generated: ' + str(len(generated_URL)))
log.debug(generated_URL[0])


## Download testing

In [6]:
dlm = DownloadManager()
url = generated_URL[0]
log.debug(url)
filename = dlm.get_filename(url)
log.debug(filename)

## Downloading data

In [7]:
# download data (one file per day and dataset) with links to local directory
username = input('Username: ')
password = input('Password: ')
download_manager = DownloadManager()
# download_manager.read_credentials_from_yaml('credentials.yaml')
download_manager.set_username_and_password(username, password)
download_manager.download_path = 'download'
download_manager.download_urls = generated_URL
%time download_manager.start_download(4)

Username: janurb
Password: Bepanthol12345
Wall time: 12min 27s


# Get roughness from different file

In [8]:
roughness_para = generate_url_params(['Z0M'], requested_time, 
                                     requested_lat, requested_lon)
ROUGHNESS_BASE_URL = 'http://goldsmr4.sci.gsfc.nasa.gov/opendap/MERRA2/M2T1NXFLX.5.12.4/'
roughness_links = generate_download_links([download_year], ROUGHNESS_BASE_URL,
                                          'tavg1_2d_flx_Nx', roughness_para)
            
download_manager.download_path = 'roughness_download'
download_manager.download_urls = roughness_links


%time download_manager.start_download(4)

Wall time: 8min 59s


---
# Get lat and lon dimensions

In [9]:

lat_lon_dimension_para = 'lat' + requested_lat + ',lon' + requested_lon
# Creating the download url.
dimension_url = 'http://goldsmr4.sci.gsfc.nasa.gov:80/opendap/MERRA2/M2T1NXSLV.5.12.4/2014/01/MERRA2_400.tavg1_2d_slv_Nx.20140101.nc4.nc4?'
dimension_url = dimension_url + lat_lon_dimension_para
download_manager.download_path = 'dimension_scale'
download_manager.download_urls = [dimension_url]
%time download_manager.start_download()


Wall time: 4.81 s


In [10]:
ds_dim = xr.open_dataset(os.path.join('dimension_scale', DownloadManager.get_filename(dimension_url)))
df_dim = ds_dim.to_dataframe()
lat_array = ds_dim['lat'].data.tolist()
lon_array = ds_dim['lon'].data.tolist()

log.info(lat_array)
log.info(lon_array)


INFO:notebook:[54.0, 54.5, 55.0, 55.5]
INFO:notebook:[-3.125, -2.5, -1.875, -1.25, -0.625, -5.920304394294029e-13]


---
# Setting up wind dataframe

In [11]:
def extract_date(data_set):
    """
    Extracts the date from the file before merging the datasets
    """
    # find a match between . and .nc4 that does not have . .
    exp = r'(?<=\.)[^\.]*(?=\.nc4)'
    try:
        f_name = data_set.attrs['HDF5_GLOBAL.Filename']
        res = re.search(exp, f_name).group(0)
        y, m, d = res[0:4], res[4:6], res[6:8]
        date_str = ('%s-%s-%s' % (y, m, d))
        data_set = data_set.assign(date=date_str)
        return data_set

    except KeyError:
        # The last dataset is the one all the other sets will be merged into. 
        # Therefore, no date can be extracted.
        return data_set

file_path = os.path.join('download', '*.nc4')


ds_wind = xr.open_mfdataset(file_path, 
                       concat_dim='date', 
                       preprocess=extract_date)

df_wind = ds_wind.to_dataframe()


In [12]:
df_wind.reset_index(inplace=True)

In [13]:
# TODO: Do not hardcode the value for start_date
start_date = datetime.strptime('2014-01-01', '%Y-%m-%d')

def calculate_datetime(d_frame):
    cur_date = datetime.strptime(d_frame['date'], '%Y-%m-%d')
    hour = int(d_frame['time'])
    delta = abs(cur_date - start_date).days
    date_time_value = (delta * 24) + (hour)
    return date_time_value


df_wind['date_time'] = df_wind.apply(calculate_datetime, axis=1)


In [14]:
# Siren windspeed
def calculate_windspeed(d_frame, idx_u, idx_v):
    """
    Calculates the windspeed. The returned unit is m/s
    """
    um = int(d_frame[idx_u])
    vm = int(d_frame[idx_v])
    speed = math.sqrt((um ** 2) + (vm ** 2))
    return round(speed, 2)

calc_windspeed_2m = partial(calculate_windspeed, idx_u='U2M', idx_v='V2M')
calc_windspeed_10m = partial(calculate_windspeed, idx_u='U10M', idx_v='V10M')
calc_windspeed_50m = partial(calculate_windspeed, idx_u='U50M', idx_v='V50M')

df_wind['v_2m'] = df_wind.apply(calc_windspeed_2m, axis=1)
df_wind['v_10m']= df_wind.apply(calc_windspeed_10m, axis=1)
df_wind['v_50m'] = df_wind.apply(calc_windspeed_50m, axis=1)
df_wind


Unnamed: 0,date,lat,lon,time,V50M,DISPH,U50M,V10M,U10M,U2M,V2M,date_time,v_2m,v_10m,v_50m
0,2014-01-01,0,0,0,9.018753,0.124908,1.214383,6.828988,0.730423,0.592086,5.250453,0,5.00,6.00,9.06
1,2014-01-01,0,0,1,8.809630,0.124908,0.599721,6.710205,0.306871,0.255468,5.188453,1,5.00,6.00,8.00
2,2014-01-01,0,0,2,8.581703,0.124878,0.266955,6.623117,0.090796,0.083196,5.146648,2,5.00,6.00,8.00
3,2014-01-01,0,0,3,8.712752,0.124847,0.175792,6.792528,-0.026495,-0.017614,5.292618,3,5.00,6.00,8.00
4,2014-01-01,0,0,4,8.413611,0.124817,-0.960201,6.793629,-0.851569,-0.669570,5.319051,4,5.00,6.00,8.00
5,2014-01-01,0,0,5,8.219543,0.124817,-2.512240,6.829075,-2.202860,-1.735779,5.372691,5,5.10,6.32,8.25
6,2014-01-01,0,0,6,8.276698,0.124786,-3.744040,7.019167,-3.378544,-2.687513,5.530348,6,5.39,7.62,8.54
7,2014-01-01,0,0,7,8.288552,0.124756,-4.773696,7.022337,-4.339244,-3.458275,5.534028,7,5.83,8.06,8.94
8,2014-01-01,0,0,8,8.637740,0.124756,-6.047885,7.149990,-5.429934,-4.299541,5.615814,8,6.40,8.60,10.00
9,2014-01-01,0,0,9,8.885589,0.124725,-7.452211,7.281688,-6.610246,-5.209391,5.687150,9,7.07,9.22,10.63


# Setting up Roughness dataframe

In [15]:
file_path = os.path.join('roughness_download', '*.nc4')
ds_rough = xr.open_mfdataset(file_path, concat_dim='date', 
                             preprocess=extract_date)

df_rough = ds_rough.to_dataframe()
df_rough.reset_index(inplace=True)

df_rough

Unnamed: 0,date,lat,lon,time,Z0M
0,2014-01-01,0,0,0,0.024747
1,2014-01-01,0,0,1,0.024747
2,2014-01-01,0,0,2,0.024747
3,2014-01-01,0,0,3,0.024755
4,2014-01-01,0,0,4,0.024755
5,2014-01-01,0,0,5,0.024770
6,2014-01-01,0,0,6,0.024793
7,2014-01-01,0,0,7,0.024816
8,2014-01-01,0,0,8,0.024862
9,2014-01-01,0,0,9,0.024923


## Concatenating/combining individual files
Combine individual (daily) dataset files to one single dataframe

In [16]:
df = pd.merge(df_wind, df_rough, on=['date', 'lat', 'lon', 'time'])

---
# Structure the dataframe, add and remove columns

In [17]:
# Calculate height for v_2m and v_10m (2 + DISPH or 10 + DISPH)
df['h_v1'] = df.apply((lambda x:int(x['DISPH']) + 2), axis=1)
df['h_v2'] = df.apply((lambda x:int(x['DISPH']) + 10), axis=1)

df['v_100m'] = np.nan
df.drop('DISPH', axis=1, inplace=True)
df.drop(['time', 'date'], axis=1, inplace=True)
df.drop(['U2M', 'U10M', 'U50M', 'V2M', 'V10M', 'V50M'], axis=1, inplace=True)

---
## Renaming the columns


In [18]:
# TODO: RENAME
# Changing lat lon from 0/1 etc to actual values using the values 
# extracted earlier.

df['lat'] = df['lat'].apply(lambda x: lat_array[int(x)])
df['lon'] = df['lon'].apply(lambda x: lon_array[int(x)])

rename_map = {'date_time': 'date/time', 
              'v_2m': 'v1', 
              'v_10m': 'v2', 
              'Z0M': 'z0'
             }

df.rename(columns=rename_map, inplace=True)

# Change order of the columns
columns = ['date/time', 'lat', 'lon',
        'v1', 'v2', 'v_50m', 'v_100m',
        'h_v1', 'h_v2', 'z0']
df = df[columns]

---
## First look at the final data frame structure and format

In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 210240 entries, 0 to 210239
Data columns (total 10 columns):
date/time    210240 non-null int64
lat          210240 non-null float64
lon          210240 non-null float64
v1           210240 non-null float64
v2           210240 non-null float64
v_50m        210240 non-null float64
v_100m       0 non-null float64
h_v1         210240 non-null int64
h_v2         210240 non-null int64
z0           210240 non-null float64
dtypes: float64(7), int64(3)
memory usage: 17.6 MB


## Saving dataframe
Save the final dataframe locally

In [20]:
df.to_csv('weather_script_result.csv', index=False)

---
The further processing of the data frame and the compiling of the final data package can be found in [Part3](weather_data_3_processing.ipynb) of this script.

In [21]:
df

Unnamed: 0,date/time,lat,lon,v1,v2,v_50m,v_100m,h_v1,h_v2,z0
0,0,54.0,-3.125000e+00,5.00,6.00,9.06,,2,10,0.024747
1,1,54.0,-3.125000e+00,5.00,6.00,8.00,,2,10,0.024747
2,2,54.0,-3.125000e+00,5.00,6.00,8.00,,2,10,0.024747
3,3,54.0,-3.125000e+00,5.00,6.00,8.00,,2,10,0.024755
4,4,54.0,-3.125000e+00,5.00,6.00,8.00,,2,10,0.024755
5,5,54.0,-3.125000e+00,5.10,6.32,8.25,,2,10,0.024770
6,6,54.0,-3.125000e+00,5.39,7.62,8.54,,2,10,0.024793
7,7,54.0,-3.125000e+00,5.83,8.06,8.94,,2,10,0.024816
8,8,54.0,-3.125000e+00,6.40,8.60,10.00,,2,10,0.024862
9,9,54.0,-3.125000e+00,7.07,9.22,10.63,,2,10,0.024923
