# Format NREL time stamps

The typical meteorological year data sets have a 01-24 hour range.  MySQL and python both need 00-23.  This notebook loads the NREL data set as downloaded and creates a new column with a properly formatted timestamp.

http://rredc.nrel.gov/solar/old_data/nsrdb/1991-2005/tmy3/by_state_and_city.html



In [2]:
import pandas as pd
import re

In [3]:
!ls ./TMY_data/

alturas_CA.CSV		    houston_TX_fmt.CSV	  new_york_NY_fmt.CSV
alturas_CA_fmt.CSV	    knoxville_TN.CSV	  portland_OR.CSV
denver_CO.CSV		    knoxville_TN_fmt.CSV  portland_OR_fmt.CSV
denver_CO_fmt.CSV	    lafayette_IN.CSV	  san_diego_CA.CSV
ely_MN.CSV		    lafayette_IN_fmt.CSV  san_diego_CA_fmt.CSV
fort_lauderdale_FL.CSV	    laramie_WY.CSV	  seattle_WA.CSV
fort_lauderdale_FL_fmt.CSV  laramie_WY_fmt.CSV	  seattle_WA_fmt.CSV
houston_TX.CSV		    new_york_NY.CSV


File path for the data set.  I generate mine using a key word of 'city_state'.

In [4]:
city = 'ely_MN'

in_path = './TMY_data/' + city + '.CSV'

Read the CSV file into a pandas dataframe, skip the row at the top with location info.

In [5]:
df = pd.read_csv(in_path, skiprows=1)

Option to look at some of the data and confirm import.

In [6]:
#print(df[:4])

This loop unpacks each date and time using a regular expression and a mapping.  A new timestamp is constructed with proper formatting and added to the new column, 'Time_fmt'.

In [7]:
for x in range(len(df)):
    
    MM,DD,_ = map(str, re.findall('\d+',df['Date (MM/DD/YYYY)'][x]))
    hh,_ = map(int, re.findall('\d+',df['Time (HH:MM)'][x]))
    
    # if statement accounts for leading zeros in single digit hour values
    if hh<11:    
        datetime_str = '2000-' + MM + '-' + DD + '-0' + str(hh-1) + ":" + '00:00'
    else:
        datetime_str = '2000-' + MM + '-' + DD + '-' + str(hh-1) + ":" + '00:00'
    
    
    df.set_value(x,'Time_fmt',datetime_str)

Check the new column.

In [8]:
print(df['Time_fmt'][:2])

0    2000-01-01-00:00:00
1    2000-01-01-01:00:00
Name: Time_fmt, dtype: object


Move the new column right next to the original Date and Time columns.  First get list of column names.

In [9]:
df.columns.tolist()

['Date (MM/DD/YYYY)',
 'Time (HH:MM)',
 'ETR (W/m^2)',
 'ETRN (W/m^2)',
 'GHI (W/m^2)',
 'GHI source',
 'GHI uncert (%)',
 'DNI (W/m^2)',
 'DNI source',
 'DNI uncert (%)',
 'DHI (W/m^2)',
 'DHI source',
 'DHI uncert (%)',
 'GH illum (lx)',
 'GH illum source',
 'Global illum uncert (%)',
 'DN illum (lx)',
 'DN illum source',
 'DN illum uncert (%)',
 'DH illum (lx)',
 'DH illum source',
 'DH illum uncert (%)',
 'Zenith lum (cd/m^2)',
 'Zenith lum source',
 'Zenith lum uncert (%)',
 'TotCld (tenths)',
 'TotCld source',
 'TotCld uncert (code)',
 'OpqCld (tenths)',
 'OpqCld source',
 'OpqCld uncert (code)',
 'Dry-bulb (C)',
 'Dry-bulb source',
 'Dry-bulb uncert (code)',
 'Dew-point (C)',
 'Dew-point source',
 'Dew-point uncert (code)',
 'RHum (%)',
 'RHum source',
 'RHum uncert (code)',
 'Pressure (mbar)',
 'Pressure source',
 'Pressure uncert (code)',
 'Wdir (degrees)',
 'Wdir source',
 'Wdir uncert (code)',
 'Wspd (m/s)',
 'Wspd source',
 'Wspd uncert (code)',
 'Hvis (m)',
 'Hvis source',

Copy and paste list to a new variable name and rearrange.  This should not have to change with the NREL data sets

In [10]:
cols = ['Date (MM/DD/YYYY)',
 'Time (HH:MM)',
 'Time_fmt',
 'ETR (W/m^2)',
 'ETRN (W/m^2)',
 'GHI (W/m^2)',
 'GHI source',
 'GHI uncert (%)',
 'DNI (W/m^2)',
 'DNI source',
 'DNI uncert (%)',
 'DHI (W/m^2)',
 'DHI source',
 'DHI uncert (%)',
 'GH illum (lx)',
 'GH illum source',
 'Global illum uncert (%)',
 'DN illum (lx)',
 'DN illum source',
 'DN illum uncert (%)',
 'DH illum (lx)',
 'DH illum source',
 'DH illum uncert (%)',
 'Zenith lum (cd/m^2)',
 'Zenith lum source',
 'Zenith lum uncert (%)',
 'TotCld (tenths)',
 'TotCld source',
 'TotCld uncert (code)',
 'OpqCld (tenths)',
 'OpqCld source',
 'OpqCld uncert (code)',
 'Dry-bulb (C)',
 'Dry-bulb source',
 'Dry-bulb uncert (code)',
 'Dew-point (C)',
 'Dew-point source',
 'Dew-point uncert (code)',
 'RHum (%)',
 'RHum source',
 'RHum uncert (code)',
 'Pressure (mbar)',
 'Pressure source',
 'Pressure uncert (code)',
 'Wdir (degrees)',
 'Wdir source',
 'Wdir uncert (code)',
 'Wspd (m/s)',
 'Wspd source',
 'Wspd uncert (code)',
 'Hvis (m)',
 'Hvis source',
 'Hvis uncert (code)',
 'CeilHgt (m)',
 'CeilHgt source',
 'CeilHgt uncert (code)',
 'Pwat (cm)',
 'Pwat source',
 'Pwat uncert (code)',
 'AOD (unitless)',
 'AOD source',
 'AOD uncert (code)',
 'Alb (unitless)',
 'Alb source',
 'Alb uncert (code)',
 'Lprecip depth (mm)',
 'Lprecip quantity (hr)',
 'Lprecip source',
 'Lprecip uncert (code)',
 'PresWth (METAR code)',
 'PresWth source',
 'PresWth uncert (code)']

Reorder the columns using pandas label based indexing.

In [11]:
df = df[cols]

Check the dataframe

In [12]:
print(df[:3])

  Date (MM/DD/YYYY) Time (HH:MM)             Time_fmt  ETR (W/m^2)  \
0        01/01/1996        01:00  2000-01-01-00:00:00            0   
1        01/01/1996        02:00  2000-01-01-01:00:00            0   
2        01/01/1996        03:00  2000-01-01-02:00:00            0   

   ETRN (W/m^2)  GHI (W/m^2)  GHI source  GHI uncert (%)  DNI (W/m^2)  \
0             0            0           1               0            0   
1             0            0           1               0            0   
2             0            0           1               0            0   

   DNI source          ...            Alb (unitless)  Alb source  \
0           1          ...                       0.3           F   
1           1          ...                       0.3           F   
2           1          ...                       0.3           F   

   Alb uncert (code)  Lprecip depth (mm)  Lprecip quantity (hr)  \
0                  8                   0                      1   
1                  

Save to new CSV using with formatted flag '_fmt'.

In [13]:
out_path = './TMY_data/' + city + '_fmt.CSV'

df.to_csv(out_path,index=False)

And now it's ready for SQL (or whatever).