<div style="background-color:#FCE205; padding:10px; border-radius:5px; color:black; font-weight:bold;">
    <h3>Processing Weather Data to Quarterly Intervals</h3>
</div>

The weather data collected in notebook 1a is daily data which has to be condensed to quarterly data to align the Bee dataset.

In [12]:
# Import libraries
import pandas as pd
import os

In [13]:
# set working directory
ITM_DIR = os.path.join(os.getcwd(), '../data/import')

In [14]:
# read in weather data and combine
file_names = [f'Hourly_Weather_Data_part{i}.csv' for i in range(1, 7)]
weather_data = pd.concat([pd.read_csv(os.path.join(ITM_DIR, file)) for file in file_names], ignore_index=True)
weather_data.shape

(146150, 12)

In [15]:
# confirm date is indeed daily
weather_data['date']

0         2015-01-01 00:00:00+00:00
1         2015-01-02 00:00:00+00:00
2         2015-01-03 00:00:00+00:00
3         2015-01-04 00:00:00+00:00
4         2015-01-05 00:00:00+00:00
                    ...            
146145    2022-12-28 00:00:00+00:00
146146    2022-12-29 00:00:00+00:00
146147    2022-12-30 00:00:00+00:00
146148    2022-12-31 00:00:00+00:00
146149    2023-01-01 00:00:00+00:00
Name: date, Length: 146150, dtype: object

In [16]:
weather_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146150 entries, 0 to 146149
Data columns (total 12 columns):
 #   Column                     Non-Null Count   Dtype  
---  ------                     --------------   -----  
 0   date                       146150 non-null  object 
 1   latitude                   146150 non-null  float64
 2   longitude                  146150 non-null  float64
 3   wind_speed_10m_max         146150 non-null  float64
 4   weather_code               146150 non-null  float64
 5   temperature_2m_mean        146150 non-null  float64
 6   temperature_2m_max         146150 non-null  float64
 7   temperature_2m_min         146150 non-null  float64
 8   precipitation_hours        146150 non-null  float64
 9   relative_humidity_2m_mean  146150 non-null  float64
 10  relative_humidity_2m_max   146150 non-null  float64
 11  relative_humidity_2m_min   146150 non-null  float64
dtypes: float64(11), object(1)
memory usage: 13.4+ MB


<div style="background-color:#FCE205; padding:10px; border-radius:5px; color:black; font-weight:bold;">
    <h3>Condense weather data to represent Quarterly information</h3>
    </div>

- For weather code, the sum of how often the code appear across the quarter is calculated.

    (eg. weather_code_33 = 5 would mean that that particular code was the most severe weather code for 5 days across the quarter)

- For the daily mean temperature, both the mean and sum is taken across all days of the quarter and assigned meanmean or meansum respectively.

- For the daily mean relative humidity, both the mean and sum is taken across all days of the quarter and assigned meanmean or meansum respectively.

In [17]:
weather_data['weather_code'] = weather_data['weather_code'].astype(int)

In [18]:
# Turn weather_code into separate columns per code with 1 or 0
unique_weather_codes = weather_data['weather_code'].unique()
for code in unique_weather_codes:
    weather_data[f'weather_code_{code}'] = (weather_data['weather_code'] == code).astype(int)

In [19]:
# Condense data into a quarterly summary per latitude/longitude combination
weather_data['quarter'] = pd.to_datetime(weather_data['date']).dt.to_period('Q')

# Group by latitude, longitude, and quarter
quarterly_summary = weather_data.groupby(['latitude', 'longitude', 'quarter']).agg(
    {   # Aggregate mean columns with mean
        'temperature_2m_mean': ['mean', 'sum'],
        'relative_humidity_2m_mean': ['mean', 'sum'],
        # Aggregate max columns with max
        'wind_speed_10m_max': 'max',
        'temperature_2m_max': 'max',
        'relative_humidity_2m_max': 'max',
        # Aggregate min columns with min
        'temperature_2m_min': 'min',
        'relative_humidity_2m_min': 'min',
        # Sum counts of weather codes
        'precipitation_hours': 'sum',
        'weather_code_53': 'sum',
        'weather_code_63': 'sum',
        'weather_code_61': 'sum',
        'weather_code_65': 'sum',
        'weather_code_3': 'sum',
        'weather_code_2': 'sum',
        'weather_code_0': 'sum',
        'weather_code_51': 'sum',
        'weather_code_1': 'sum',
        'weather_code_55': 'sum',
        'weather_code_71': 'sum',
        'weather_code_75': 'sum',
        'weather_code_73': 'sum',
    }
).reset_index()


quarterly_summary

  weather_data['quarter'] = pd.to_datetime(weather_data['date']).dt.to_period('Q')


Unnamed: 0_level_0,latitude,longitude,quarter,temperature_2m_mean,temperature_2m_mean,relative_humidity_2m_mean,relative_humidity_2m_mean,wind_speed_10m_max,temperature_2m_max,relative_humidity_2m_max,...,weather_code_65,weather_code_3,weather_code_2,weather_code_0,weather_code_51,weather_code_1,weather_code_55,weather_code_71,weather_code_75,weather_code_73
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,mean,sum,mean,sum,max,max,max,...,sum,sum,sum,sum,sum,sum,sum,sum,sum,sum
0,20.902977,-156.207483,2015Q1,23.107811,2079.703006,73.154187,6583.876815,41.500053,29.044500,98.460330,...,0,18,10,1,28,9,7,0,0,0
1,20.902977,-156.207483,2015Q2,24.647683,2242.939170,73.468107,6685.597706,44.670470,30.744501,94.885765,...,0,7,5,0,48,7,3,0,0,0
2,20.902977,-156.207483,2015Q3,27.289700,2510.652412,76.519537,7039.797368,40.104060,31.844501,96.768500,...,0,5,2,0,53,5,4,0,0,0
3,20.902977,-156.207483,2015Q4,25.447535,2341.173230,76.703521,7056.723919,46.249844,30.494501,95.030210,...,0,3,10,0,39,3,6,0,0,0
4,20.902977,-156.207483,2016Q1,23.835366,2169.018340,70.748610,6438.123503,38.706280,29.694500,94.330864,...,0,12,15,1,29,19,2,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1645,61.370716,-152.404419,2022Q1,-16.669162,-1500.224616,73.232969,6590.967244,15.175612,-10.366501,98.426600,...,0,13,0,6,0,2,0,10,33,26
1646,61.370716,-152.404419,2022Q2,-8.186693,-744.989041,69.424942,6317.669709,9.021574,3.783500,99.622440,...,0,43,5,3,1,4,0,13,8,12
1647,61.370716,-152.404419,2022Q3,-4.537017,-417.405525,89.136554,8200.562925,10.137692,3.733500,100.000000,...,0,16,0,0,9,0,1,9,22,19
1648,61.370716,-152.404419,2022Q4,-15.110500,-1390.165955,64.777468,5959.527062,12.224107,-1.316500,100.000000,...,0,22,5,5,0,1,0,12,22,25


In [20]:
# change multiindex columns to single index
quarterly_summary.columns = ['_'.join(col).strip() for col in quarterly_summary.columns.values]
quarterly_summary.columns = quarterly_summary.columns.str.replace(' ', '')
quarterly_summary.columns = quarterly_summary.columns.str.replace(')', '')
quarterly_summary.columns = quarterly_summary.columns.str.replace('mean_', 'mean')
quarterly_summary.columns = quarterly_summary.columns.str.replace('sum_', 'sum')
quarterly_summary.columns = quarterly_summary.columns.str.replace('max_', 'max')
quarterly_summary.columns = quarterly_summary.columns.str.replace('min_', 'min')

In [21]:
quarterly_summary.columns

Index(['latitude_', 'longitude_', 'quarter_', 'temperature_2m_meanmean',
       'temperature_2m_meansum', 'relative_humidity_2m_meanmean',
       'relative_humidity_2m_meansum', 'wind_speed_10m_maxmax',
       'temperature_2m_maxmax', 'relative_humidity_2m_maxmax',
       'temperature_2m_minmin', 'relative_humidity_2m_minmin',
       'precipitation_hours_sum', 'weather_code_53_sum', 'weather_code_63_sum',
       'weather_code_61_sum', 'weather_code_65_sum', 'weather_code_3_sum',
       'weather_code_2_sum', 'weather_code_0_sum', 'weather_code_51_sum',
       'weather_code_1_sum', 'weather_code_55_sum', 'weather_code_71_sum',
       'weather_code_75_sum', 'weather_code_73_sum'],
      dtype='object')

In [22]:
# change columns names
quarterly_summary = quarterly_summary.rename(columns=
               {"latitude_": 'latitude', 
                "longitude_": 'longitude',
                "quarter_": 'quarter',

                "weather_code_53_sum": 'moderate_drizzle_sum',
                "weather_code_63_sum": 'moderate_rain_sum',
                "weather_code_61_sum": 'light_rain_sum',
                "weather_code_65_sum": 'heavy_rain_sum',
                "weather_code_3_sum": 'overcast_sum',
                "weather_code_2_sum": 'partly_cloudy_sum',
                "weather_code_0_sum": 'clear_sky_sum',
                "weather_code_51_sum": 'light_drizzle_sum',
                "weather_code_1_sum": 'mainly_clear_sum',
                "weather_code_55_sum": 'heavy_drizzle_sum',
                "weather_code_71_sum": 'light_snow_sum',
                "weather_code_75_sum": 'heavy_snow_sum',
                "weather_code_73_sum": 'moderate_snow_sum',        
                })

<div style="background-color:#FCE205; padding:10px; border-radius:5px; color:black; font-weight:bold;">
    <h3>Add state names to their respective longitudes and latitudes</h3>
    </div>

In [40]:
# read in us_states_lat_long.csv
ITM_DIR = os.path.join(os.getcwd(), '../data/import')

us_states_lat_long = pd.read_csv(os.path.join(ITM_DIR, 'us_states_lat_long.csv'))

# left join weather data to us_states_lat_long on latitude and longitude
weather_data_quarter_states = pd.merge(us_states_lat_long, quarterly_summary, on=['latitude', 'longitude'], how='left')

weather_data_quarter_states.isnull().sum()

state                            0
latitude                         0
longitude                        0
quarter                          0
temperature_2m_meanmean          0
temperature_2m_meansum           0
relative_humidity_2m_meanmean    0
relative_humidity_2m_meansum     0
wind_speed_10m_maxmax            0
temperature_2m_maxmax            0
relative_humidity_2m_maxmax      0
temperature_2m_minmin            0
relative_humidity_2m_minmin      0
precipitation_hours_sum          0
moderate_drizzle_sum             0
moderate_rain_sum                0
light_rain_sum                   0
heavy_rain_sum                   0
overcast_sum                     0
partly_cloudy_sum                0
clear_sky_sum                    0
light_drizzle_sum                0
mainly_clear_sum                 0
heavy_drizzle_sum                0
light_snow_sum                   0
heavy_snow_sum                   0
moderate_snow_sum                0
dtype: int64

In [None]:
# split quarter column into year and quarter for easier merging with bees in notebook 2a
weather_data_quarter_states[['year', 'quarter']] = weather_data_quarter_states['quarter'].astype(str).str.split('Q', expand=True)

In [11]:
weather_data_quarter_states

Unnamed: 0,date,latitude,longitude,wind_speed_10m_max,weather_code,temperature_2m_mean,temperature_2m_max,temperature_2m_min,precipitation_hours,relative_humidity_2m_mean,relative_humidity_2m_max,relative_humidity_2m_min
0,2015-01-01,32.806671,-86.79113,8.905908,53,5.943249,12.099500,2.049500,3.0,74.610850,84.892395,58.686630
1,2015-01-02,32.806671,-86.79113,12.287555,63,8.716166,11.299499,6.599500,11.0,95.030800,99.656850,88.092660
2,2015-01-03,32.806671,-86.79113,19.803272,61,14.155751,19.499500,11.149500,6.0,97.524770,99.041664,92.491360
3,2015-01-04,32.806671,-86.79113,21.605999,65,15.499500,18.999500,10.499500,13.0,87.860110,100.000000,56.977260
4,2015-01-05,32.806671,-86.79113,19.201874,3,5.987000,9.649500,1.299500,0.0,56.137234,65.977200,41.198547
...,...,...,...,...,...,...,...,...,...,...,...,...
146145,2022-12-28,42.755966,-107.30249,40.995766,73,-4.244334,-1.598500,-8.448500,4.0,83.742096,95.266304,73.835980
146146,2022-12-29,42.755966,-107.30249,29.784426,71,-12.581832,-10.398500,-15.948501,0.0,76.470560,84.502800,66.460710
146147,2022-12-30,42.755966,-107.30249,39.740480,3,-12.821415,-7.498500,-15.448501,0.0,81.555490,87.242065,75.683365
146148,2022-12-31,42.755966,-107.30249,39.399857,73,-5.558917,-1.548500,-8.848500,3.0,86.567320,94.409520,74.006710


In [44]:
# Save the quarterly summary to a CSV file
OUT_DIR = os.path.join(os.getcwd(), '../data/intermediate')

weather_data_quarter_states.to_csv(os.path.join(OUT_DIR, 'quarterly_weather_summary.csv'), index=False)