# Weather Data Preprocessing Notebook

This is a Python 3 notebook dedicated for preprocessing weather data (temperature and precipitation) in Florida from January 1 to July 3, 2020. The goal of this notebook is to extract data from the weather CSV files and summarize weather data from January 1 to July 3, 2020.

## Libraries

Before running the cells of this notebook, the following libraries mustt be installed on your terminal:
- `pandas`
- `tqdm`

The following libraries were installed via `pip`: `pip install <library-name>`. Run the cell below to load the libraries.

In [145]:
import pandas as pd
import os
from tqdm.notebook import tqdm

# PART 1: Concatenating all datasets

Each file contains daily average temperature and precipitation data from a weather station.

In [146]:
df = pd.read_csv('80478_2020_1_1_2020.csv')
df

Unnamed: 0,COOPID,YEAR,MONTH,DAY,PRECIPITATION,MAX TEMP,MIN TEMP,MEAN TEMP
0,80478,2020,1,1,0.00,70.0,50.0,60.0
1,80478,2020,1,2,0.00,75.0,52.0,63.5
2,80478,2020,1,3,0.00,83.0,52.0,67.5
3,80478,2020,1,4,0.00,87.0,70.0,78.5
4,80478,2020,1,5,0.67,80.0,45.0,62.5
...,...,...,...,...,...,...,...,...
361,80478,2020,12,27,0.00,56.0,35.0,45.5
362,80478,2020,12,28,0.00,70.0,41.0,55.5
363,80478,2020,12,29,0.00,75.0,48.0,61.5
364,80478,2020,12,30,0.00,75.0,52.0,63.5


In [147]:
df.columns

Index(['COOPID', ' YEAR', ' MONTH', ' DAY', ' PRECIPITATION', ' MAX TEMP',
       ' MIN TEMP', ' MEAN TEMP'],
      dtype='object')

We concatenate all of the data from all files using `pd.concat()` into one dataframe. 

In [148]:
weather_data = pd.DataFrame()

path = os.getcwd()
dir_list = os.listdir(path)

for i in tqdm(range(len(dir_list))):
    if dir_list[i] in ['.ipynb_checkpoints', 'Weather Preprocessing Notebook.ipynb']:
        continue

    old_file = path + "/" + dir_list[i]
    name, extension = os.path.splitext(old_file)

    temp = pd.read_csv(dir_list[i])

    weather_data = pd.concat([weather_data, temp], axis=0)

weather_data.rename(columns = {' MONTH':'MONTH', ' DAY':'DAY', ' YEAR':'YEAR', ' PRECIPITATION':'PRECIPITATION', ' MAX TEMP':'MAX TEMP', ' MIN TEMP': 'MIN TEMP', ' MEAN TEMP':'MEAN TEMP'}, inplace = True)

  0%|          | 0/40 [00:00<?, ?it/s]

We can drop the following fields since they are irrelevant to the analysis of data:

- `MAX TEMP`
- `MIN TEMP`

We can also drop the `YEAR`, `MONTH`, and `DAY` fields and instead reformat it to `MM/DD` since the year is the same for all records.

In [149]:
weather_data['DATE'] = weather_data['MONTH'].apply(lambda x: '{0:0>2}'.format(x)) + '/' + weather_data['DAY'].apply(lambda x: '{0:0>2}'.format(x))

weather_data = weather_data.drop(['MAX TEMP', 'MIN TEMP', 'YEAR', 'MONTH', 'DAY'],axis=1)
weather_data

Unnamed: 0,COOPID,PRECIPITATION,MEAN TEMP,DATE
0,80478,0.00,60.0,01/01
1,80478,0.00,63.5,01/02
2,80478,0.00,67.5,01/03
3,80478,0.00,78.5,01/04
4,80478,0.67,62.5,01/05
...,...,...,...,...
361,89525,0.21,60.5,12/27
362,89525,0.01,66.5,12/28
363,89525,0.01,71.5,12/29
364,89525,0.01,74.5,12/30


# PART 2: Categorizing COOPID into counties

We now proceed to mapping stations (COOPID) to their respective counties. The following code block contains the location of stations in Florida.

In [150]:
station_county = {
    80478: 'Polk', 82150: 'Volusia', 82158: 'Volusia', 82229: 'Volusia',
    82944: 'Nassau', 83163: 'Broward', 83168: 'Broward', 83186: 'Lee',
    83326: 'Alachua', 83470: 'Baker', 83909: 'Miami-Dade', 83986: 'Hillsborough',
    84289: 'Citrus', 84358: 'Duval', 84366: 'Duval', 84394: 'Hamilton',
    84731: 'Columbia', 85612: 'Brevard', 85658: 'Miami-Dade', 85663: 'Miami-Dade',
    85973: 'Polk', 86065: 'Sarasota', 86078: 'Collier', 86414: 'Marion',
    86628: 'Orange', 86842: 'Bay', 86997: 'Escambia', 87020: 'Miami-Dade',
    87205: 'Hillsborough', 87760: 'Palm Beach', 87886: 'Pinellas',
    87982: 'Seminole', 88620: 'Martin', 88758: 'Leon', 88788: 'Hillsborough',
    88824: 'Pinellas', 88942: 'Brevard', 89176: 'Sarasota', 89525: 'Palm Beach'
}

weather_data['COUNTY'] = weather_data['COOPID'].map(station_county)
county_data = weather_data.groupby(['COUNTY','DATE'])
final = county_data.agg('mean').reset_index()

final = final.drop(['COOPID'], axis=1)

final

Unnamed: 0,COUNTY,DATE,PRECIPITATION,MEAN TEMP
0,Alachua,01/01,0.000000,52.500000
1,Alachua,01/02,0.000000,61.500000
2,Alachua,01/03,0.000010,73.000000
3,Alachua,01/04,0.680000,63.000000
4,Alachua,01/05,0.000000,50.000000
...,...,...,...,...
9145,Volusia,12/27,0.000000,45.500000
9146,Volusia,12/28,0.000000,54.333333
9147,Volusia,12/29,0.006667,60.833333
9148,Volusia,12/30,0.000003,63.666667


# PART : Exporting Data

We can now export the data to a readable CSV file.

In [151]:
final.to_csv('../florida_weather_data.csv', encoding='utf-8', index=False)