# Historical Weather data fetch
This notebook will collect, process, and save as a csv file the weather data for Pasadena CA (Caltech and JPL location) and San Jose CA (Office001 location).

Historical weather data was collected using the meteostat python library
https://github.com/meteostat/meteostat-python

The output of this notebook are two csv files
1. caltech_historical_weather.csv
2. office_historical_weather.csv

resources (open these during development):
https://dev.meteostat.net/docs/formats.html#data-structure
https://dev.meteostat.net/formats.html#meteorological-parameters


# Create locations 
To create points (locations) we pass the locations lat, long, and altitude. The altitude for Pasadena and San Jose were found from their respective weather stations.

In [5]:
# Import Meteostat library and dependencies
from datetime import datetime

import pandas as pd
from meteostat import Point, Hourly
import numpy as np

# Set time period
start = datetime(2018, 1, 1)
end = datetime(2021, 9, 14, 23, 59)

# Create location points using lat, long, altitude
cal_location = Point(34.1347, -118.1169, 227)
sjc_location = Point(37.33680466796926, -121.90743423142634, 25)

# Get hourly data
cal_data = Hourly(cal_location, start, end)
cal_data = cal_data.fetch()

sjc_data = Hourly(sjc_location, start, end)
sjc_data = sjc_data.fetch()

# Print DataFrame
print(sjc_data.head())


                     temp  dwpt  rhum  prcp  snow   wdir  wspd  wpgt    pres  \
time                                                                           
2018-01-01 00:00:00  12.9   8.0  72.0   NaN   NaN    NaN   NaN   NaN     NaN   
2018-01-01 01:00:00  13.3   5.7  60.0   0.0   NaN  290.0   7.6   NaN  1018.8   
2018-01-01 02:00:00  13.3   6.6  64.0   0.0   NaN  230.0   5.4   NaN  1019.6   
2018-01-01 03:00:00  13.3   6.6  64.0   0.0   NaN    NaN   0.0   NaN  1020.2   
2018-01-01 04:00:00  12.8   7.3  69.0   0.0   NaN    NaN   0.0   NaN  1020.4   

                     tsun  coco  
time                             
2018-01-01 00:00:00   NaN   NaN  
2018-01-01 01:00:00   NaN   NaN  
2018-01-01 02:00:00   NaN   5.0  
2018-01-01 03:00:00   NaN   5.0  
2018-01-01 04:00:00   NaN   5.0  


# Payload metadata
The payload from the hourly request contains many fields. Below is the data dictionary for those columns

| Column  | 	Description                                                                         | 	Type      |
|---------|--------------------------------------------------------------------------------------|------------|
| station | 	The Meteostat ID of the weather station (only if query refers to multiple stations) | 	String    |
| time    | 	The datetime of the observation 	                                                   | Datetime64 |
|temp 	| The air temperature in °C | 	Float64   |
|dwpt 	| The dew point in °C | 	Float64   |
|rhum 	| The relative humidity in percent (%) | 	Float64   |
|prcp | 	The one hour precipitation total in mm | 	Float64   |
snow | 	The snow depth in mm | 	Float64   |
wdir | 	The average wind direction in degrees (°) 	| Float64    |
wspd | 	The average wind speed in km/h| 	Float64   |
wpgt | 	The peak wind gust in km/h | 	Float64   |
pres | 	The average sea-level air pressure in hPa | 	Float64   |
tsun | 	The one hour sunshine total in minutes (m) 	| Float64    |
coco | 	The weather condition code | 	Float64   |

# Weather condition codes
The column `coco` has numeric codes that correspond to different weather conditions. Below is a table of the codes and the conditions

 
|Code | 	Weather Condition   |
|---|----------------------|
|1 | 	Clear               
|2 | 	Fair                |
|3| 	Cloudy              |
|4| 	Overcast            |
|5| 	Fog                 |
|6| 	Freezing Fog        |
|7| 	Light Rain          |
|8| 	Rain                |
|9| 	Heavy Rain          |
|10| 	Freezing Rain       |
|11| 	Heavy Freezing Rain |
|12| 	Sleet               |
|13| 	Heavy Sleet         |
|14| 	Light Snowfall      |
|15| 	Snowfall            |
|16| 	Heavy Snowfall      |
|17| 	Rain Shower         ||
|18| 	Heavy Rain Shower   |
|19| 	Sleet Shower        |
|20| 	Heavy Sleet Shower  |
|21| 	Snow Shower         |
|22| 	Heavy Snow Shower   |
|23| 	Lightning           |
|24| 	Hail                |
|25| 	Thunderstorm        |
|26| 	Heavy Thunderstorm  |
|27| 	Storm               |

In [2]:
weather_codes = {1 : 'Clear',
                 2 :'Fair',
                 3 : 'Cloudy',
                 4 : 'Overcast',
                 5 : 'Fog',
                 6 : 'Freezing Fog',
                 7 : 'Light Rain',
                 8 : 'Rain',
                 9 : 'Heavy Rain',
                 10 : 'Freezing Rain',
                 11 : 'Heavy Freezing Rain',
                 12 : 'Sleet',
                 13 : 'Heavy Sleet',
                 14 : 'Light Snowfall',
                 15 : 'Snowfall',
                 16 : 'Heavy Snowfall',
                 17 : 'Rain Shower',
                 18 : 'Heavy Rain Shower',
                 19 : 'Sleet Shower',
                 20 : 'Heavy Sleet Shower',
                 21 : 'Snow Shower',
                 22 : 'Heavy Snow Shower',
                 23 : 'Lightning',
                 24 : 'Hail',
                 25 : 'Thunderstorm',
                 26 : 'Heavy Thunderstorm',
                 27 : 'Storm',
                 }
# sjc_data['coco'].map(weather_codes)

In [15]:
def process_weather_historicals(df):
    """processes the historical weather data by mapping weather codes and renaming columns"""
    df['coco'] = df['coco'].map(weather_codes)
    df = df.rename(columns={'temp':'temperature_degC','dwpt':'dewpoint_degC',
                     'rhum':'relative_humidity_%', 'prcp':'precipitation_mm',
                     'snow':'snow_depth_mm', 'wdir':'wind_direction_degrees',
                     'wspd':'wind_speed_avg_kmh', 'wpgt':'wind_gust_kmh',
                     'pres':'pressure_hpa','tsun':'sunshine_amount_min',
                     'coco':'weather_condition'})
    # TODO: process nans?
    df['wind_speed_mph'] = df['wind_speed_avg_kmh'].apply(lambda x: np.round(x*0.621371,2))
    return df

process_weather_historicals(cal_data).to_csv('caltech_historical_weather.csv')
process_weather_historicals(sjc_data).to_csv('office_historical_weather.csv')

In [16]:
weather_features = ['temperature_degC', 'dewpoint_degC','relative_humidity_%', 'wind_speed_mph']
pd.read_csv('caltech_historical_weather.csv').set_index('time')[weather_features]

Unnamed: 0_level_0,temperature_degC,dewpoint_degC,relative_humidity_%,wind_speed_mph
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2018-01-01 00:00:00,17.1,9.0,59.0,5.59
2018-01-01 01:00:00,17.3,9.5,60.0,5.84
2018-01-01 02:00:00,15.7,10.0,69.0,3.36
2018-01-01 03:00:00,14.5,11.3,81.0,4.72
2018-01-01 04:00:00,13.4,11.8,90.0,3.36
...,...,...,...,...
2021-09-14 19:00:00,26.8,10.1,35.0,5.84
2021-09-14 20:00:00,29.0,9.2,29.0,5.84
2021-09-14 21:00:00,31.2,5.6,20.0,8.08
2021-09-14 22:00:00,30.7,7.2,23.0,11.43
