# Generate Data File For All Weather Stations

### This notebook combines the historic weather data for all the weather stations into a single csv file.

This script reads the data from each weather station into a single data frame to act as labels for our classifier. I have put in data from Jacksonville, Miami and Tampa Bay International Airports in 'jacksonville_temp.csv','miami_temp.csv' and 'tampa_temp.csv'. Feel free to add more Florida weather stations and see if this makes a difference.

The data for the individual weather stations is downloaded from https://climatecenter.fsu.edu/climate-data-access-tools/downloadable-data. We are arbitrarily using data from 01.01.2000 to 01.01.2012. To use dates outside this, the script will need to be modified.

In [44]:
import os 
import pandas as pd

weather_stations = []
# create list containing csvs of weather stations
for file in os.listdir():
    if file[-4:] == ".csv" and file != "all_stations_temp.csv":
        weather_stations.append(file) 


In [39]:
df = pd.DataFrame() # empty data frame to store all the data in

# loop through the weather stations, putting the weather information from each station into the dataframe
for station in weather_stations:
    station_df = pd.read_csv(station) # open station as df
    if weather_stations.index(station) == 0: # import dates from first csv file
        df["date"] = station_df[" YEAR"].astype(int)*10000+station_df[" MONTH"].astype(int)*100+station_df[" DAY"].astype(int)
    df[station[:station.rfind("-temp.csv")]+ " precipitation"] = station_df[" PRECIPITATION"]
    df[station[:station.rfind("-temp.csv")]+" mean temp"] = station_df[" MEAN TEMP"]
    df[station[:station.rfind("-temp.csv")]+" max temp"] = station_df[" MAX TEMP"]
    df[station[:station.rfind("-temp.csv")]+" min temp"] = station_df[" MIN TEMP"]

df.set_index('date')

Unnamed: 0_level_0,jacksonville precipitation,jacksonville mean temp,jacksonville max temp,jacksonville min temp,miami precipitation,miami mean temp,miami max temp,miami min temp,tampa precipitation,tampa mean temp,tampa max temp,tampa min temp
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
20000101,0.00,64.5,76.0,53.0,0.00,72.0,81.0,63.0,0.00,69.0,77.0,61.0
20000102,0.00,63.5,77.0,50.0,0.00,72.5,79.0,66.0,0.00,69.5,80.0,59.0
20000103,0.00,64.0,79.0,49.0,0.00,72.0,80.0,64.0,0.00,70.0,80.0,60.0
20000104,0.18,69.0,82.0,56.0,0.00,75.5,80.0,71.0,0.00,72.0,79.0,65.0
20000105,0.00,50.5,60.0,41.0,0.24,73.5,80.0,67.0,0.00,60.0,67.0,53.0
20000106,0.00,55.0,70.0,40.0,0.00,76.0,81.0,71.0,0.52,69.5,80.0,59.0
20000107,0.30,56.0,64.0,48.0,0.00,74.0,80.0,68.0,0.19,70.0,76.0,64.0
20000108,0.00,58.0,71.0,45.0,0.00,73.0,80.0,66.0,0.00,66.5,71.0,62.0
20000109,0.00,65.0,80.0,50.0,0.00,73.0,81.0,65.0,0.00,68.0,76.0,60.0
20000110,0.03,63.5,73.0,54.0,0.00,75.5,81.0,70.0,0.09,72.0,79.0,65.0


In [None]:
df.to_csv('all_stations_temp.csv')