# Weather Data Parser
- ASOS data downloaded from https://mesonet.agron.iastate.edu/request/download.phtml?network=FR__ASOS

- Gets the WEATHER data into a dataframe ready for analytics
- Does not perform any analysis or non-reversible mods

As-downloaded data fields include:

  - station: three or four character site identifier
  - valid: timestamp of the observation
  - tmpf: Air Temperature in Fahrenheit, typically @ 2 meters
  - dwpf: Dew Point Temperature in Fahrenheit, typically @ 2 meters
  - relh: Relative Humidity in %
  - drct: Wind Direction in degrees from north
  - sknt: Wind Speed in knots 
  - p01i: One hour precipitation for the period from the observation time to the time of the previous hourly precipitation reset. This varies slightly by site. Values are in inches. This value may or may not contain frozen precipitation melted by some device on the sensor or estimated by some other means. Unfortunately, we do not know of an authoritative database denoting which station has which sensor.
  - alti: Pressure altimeter in inches
  - mslp: Sea Level Pressure in millibar
  - vsby: Visibility in miles
  - gust: Wind Gust in knots
  - skyc1: Sky Level 1 Coverage
  - skyc2: Sky Level 2 Coverage
  - skyc3: Sky Level 3 Coverage
  - skyc4: Sky Level 4 Coverage
  - skyl1: Sky Level 1 Altitude in feet
  - skyl2: Sky Level 2 Altitude in feet
  - skyl3: Sky Level 3 Altitude in feet
  - skyl4: Sky Level 4 Altitude in feet
  - wxcodes: Present Weather Codes (space seperated)
  - metar: unprocessed reported observation in METAR format 

In [91]:
# Preliminary setup
import pandas as pd
import zipfile
import os
import numpy as np

zipname = 'wx'
dataname = 'LFPO'
#nskiprows = int(1.1e5)
nskiprows = 0
#readrows = int(1.e5)
readrows = None # Comment out for smaller test load.

In [2]:
# Extract zip if necessary
if not os.path.isfile(dataname + '.txt'):
    zip_ref = zipfile.ZipFile('src/' + zipname +'.zip', 'r')
    zip_ref.extractall('.')
    zip_ref.close()

In [133]:
# Define column types to save processor-intensive guesswork
coltypes = { 'alti': 'float64', 'drct': 'float64', 'dwpf': 'float64', 'gust': 'float64', 'metar': 'str', 'mslp': 'float64', 'p01i': 'float64', 'relh': 'float64', 'sknt': 'float64', 'skyc1': 'object', 'skyc2': 'object', 'skyc3': 'object', 'skyc4': 'object', 'skyl1': 'float64', 'skyl2': 'float64', 'skyl3': 'float64', 'skyl4': 'float64', 'station': 'object', 'tmpf': 'float64', 'valid': 'object', 'vsby': 'float64', 'wxcodes': 'object'}

# Following dict for read_csv na_values=null_m_cols
# Should allow M as null in the specified columns... but it doesn't.  Bug in Pandas?
#null_m_cols = {'alti': 'M', 'drct': 'M', 'dwpf': 'M', 'gust': 'M', 'mslp': 'M', 'p01i': 'M', 'relh': 'M', 'sknt': 'M', 'skyl1': ['M'], 'skyl2': 'M', 'skyl3': 'M', 'skyl4': 'M', 'tmpf': 'M', 'vsby': 'M'}

In [139]:
df = pd.read_csv(dataname + '.txt'
                 #,nrows=3000
                 ,low_memory=False
                 ,dtype=coltypes
                 ,parse_dates=['valid']
                 ,na_values='M'
                )
df.rename(columns=lambda x: x.strip(),inplace=True) # Remove spaces from header names

In [140]:
# df.valid is in UTC.  Use as index, and convert to Paris TZ: CET (UTC+1)
df.index = df.valid
df.tz_localize('UTC',copy=False)
d = df.tz_convert('CET',copy=False)

# Unit conversion
df['tmpc'] = (df.tmpf - 32) * 5/9
df['dwpc'] = (df.dwpf - 32) * 5/9

# Drop unwanted fields
df.drop(['station', 'valid', 'tmpf', 'dwpf', 'sknt', 'p01i', 'mslp'
    ,'vsby', 'gust', 'skyc1', 'skyc2', 'skyc3', 'skyc4'
    ,'skyl1', 'skyl2', 'skyl3', 'skyl4', 'wxcodes', 'metar'
        ],axis=1,inplace=True)

In [141]:
# Store it for later cleanup
df.to_pickle("allwxdata.pickle")

In [142]:
df

Unnamed: 0_level_0,relh,drct,alti,tmpc,dwpc
valid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2006-12-16 01:00:00+01:00,86.89,210.0,30.21,5.0,3.0
2006-12-16 01:30:00+01:00,86.89,210.0,30.21,5.0,3.0
2006-12-16 02:00:00+01:00,93.19,190.0,30.21,4.0,3.0
2006-12-16 02:30:00+01:00,93.19,180.0,30.21,4.0,3.0
2006-12-16 03:00:00+01:00,86.79,170.0,30.21,4.0,2.0
2006-12-16 03:30:00+01:00,86.79,190.0,30.21,4.0,2.0
2006-12-16 04:00:00+01:00,93.14,180.0,30.18,3.0,2.0
2006-12-16 04:30:00+01:00,93.19,190.0,30.18,4.0,3.0
2006-12-16 05:00:00+01:00,93.19,200.0,30.18,4.0,3.0
2006-12-16 05:30:00+01:00,93.19,190.0,30.18,4.0,3.0
