# Testing Weather Data

## Trevor Rowland - 3/25/25

This notebook aims to test the `regionweather.py` class, investigate how data should be cleaned, and start working on forecasting methods for this data.

## 1. Importing Packages and Data

Here we will import required packages, the RegionWeather class, and the dictionary of regions to use.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
from backend.regionweather import RegionWeather

In [2]:
region_data = {
    'US-FLA-FMPP': {'lat': 28.525581, 'lon': -81.536775, 'alt': 0},
    'US-FLA-FPC': {'lat': 28.996695, 'lon': -82.886613, 'alt': 0},
    'US-FLA-FPL': {'lat': 27.917488, 'lon': -81.450970, 'alt': 0},
    'US-FLA-GVL': {'lat': 29.619310, 'lon': -82.328732, 'alt': 0},
    'US-FLA-HST': {'lat': 25.456904, 'lon': -80.588092, 'alt': 0},
    'US-FLA-JEA': {'lat': 30.390902, 'lon': -83.679837, 'alt': 0},
    'US-FLA-SEC': {'lat': 28.805983, 'lon': -82.306291, 'alt': 0},
    'US-FLA-TAL': {'lat': 30.437174, 'lon': -84.248042, 'alt': 0},
    'US-FLA-TEC': {'lat': 27.959413, 'lon': -82.144821, 'alt': 0}
}

## 2. Testing RegionWeather Functions

Now let's test to make sure the initial implementation of RegionWeather works.

In [3]:
region_name = 'US-FLA-FMPP'
region = region_data['US-FLA-FMPP']
lat = region['lat']
lon = region['lon']
alt = region['alt']
end = dt.datetime.today()
start = end - dt.timedelta(365) # 30 days of data

rw = RegionWeather(region_name, lat, lon, alt, start, end)

fetching Hourly Object...
Hourly Object Fetched!
Fetching Hourly Data from Object...
Data:
                     temp  dwpt  rhum  prcp  snow   wdir  wspd  wpgt    pres  \
time                                                                           
2024-03-27 21:00:00  24.6  16.3  60.0   0.0   NaN  300.0  18.4   NaN  1015.2   
2024-03-27 22:00:00  23.5  16.3  64.0   0.0   NaN  280.0  11.2   NaN  1015.0   
2024-03-27 23:00:00  23.5  16.3  64.0   0.0   NaN  280.0  11.2   NaN  1015.0   
2024-03-28 00:00:00  23.0  17.0  69.0   0.0   NaN  120.0  18.4   NaN  1012.4   
2024-03-28 01:00:00  22.4  17.3  73.0   0.0   NaN  190.0   5.4   NaN  1013.5   

                     tsun  coco  
time                             
2024-03-27 21:00:00   NaN   3.0  
2024-03-27 22:00:00   NaN   3.0  
2024-03-27 23:00:00   NaN   3.0  
2024-03-28 00:00:00   NaN   3.0  
2024-03-28 01:00:00   NaN   3.0  
Columns: ['temp', 'dwpt', 'rhum', 'prcp', 'snow', 'wdir', 'wspd', 'wpgt', 'pres', 'tsun', 'coco']
Hourly Objec

## 3. RegionWeather EDA

Now all of the data has been fetched, aggregated and interpolated. We can check the `to_dict()` function from `RegionWeather` to view the data and perform a quick EDA.

In [4]:
d = rw.to_dict()
hourly = d['df_hourly']
daily = d['df_daily']
weekly = d['df_weekly']
monthly = d['df_monthly']
fifteen_m = d['df_15m']

In [5]:
hourly.head(10)

Unnamed: 0_level_0,temp,dwpt,rhum,prcp,snow,wdir,wspd,wpgt,pres,tsun,coco
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2024-03-27 21:00:00,24.6,16.3,60.0,0.0,,300.0,18.4,,1015.2,,3.0
2024-03-27 22:00:00,23.5,16.3,64.0,0.0,,280.0,11.2,,1015.0,,3.0
2024-03-27 23:00:00,23.5,16.3,64.0,0.0,,280.0,11.2,,1015.0,,3.0
2024-03-28 00:00:00,23.0,17.0,69.0,0.0,,120.0,18.4,,1012.4,,3.0
2024-03-28 01:00:00,22.4,17.3,73.0,0.0,,190.0,5.4,,1013.5,,3.0
2024-03-28 02:00:00,21.9,18.5,81.0,0.0,,0.0,0.0,,1013.8,,3.0
2024-03-28 03:00:00,23.0,19.2,79.0,0.0,,0.0,0.0,,1011.9,,3.0
2024-03-28 04:00:00,23.0,19.2,79.0,0.0,,140.0,31.7,,1011.9,,3.0
2024-03-28 05:00:00,23.0,20.2,84.0,0.0,,180.0,22.3,,1011.4,,3.0
2024-03-28 06:00:00,22.4,20.9,91.0,0.0,,180.0,22.3,,1010.1,,7.0


In [6]:
daily.head(10) # failure to get pres data, impute with aggregated hourly data?

Unnamed: 0_level_0,tmin,tmax,tavg,dwpt,rhum,prcp,snow,wdir,wspd,wpgt,pres,tsun,coco
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2024-03-27,23.5,24.6,23.866667,16.3,62.666667,0.0,,286.636273,13.6,,1015.066667,0.0,3.0
2024-03-28,20.8,26.3,23.304167,17.904167,75.083333,9.1,,248.763662,17.054167,,1011.708333,0.0,17.0
2024-03-29,15.2,25.2,19.595833,9.895833,58.708333,0.0,,352.82774,9.783333,,1019.545833,0.0,1.0
2024-03-30,13.5,26.9,19.858333,12.441667,66.5,0.0,,51.005637,8.333333,,1021.079167,0.0,3.0
2024-03-31,15.2,26.3,21.220833,13.645833,64.208333,0.0,,280.939044,6.816667,,1020.354167,0.0,5.0
2024-04-01,16.9,29.1,22.879167,15.283333,64.666667,0.0,,198.104397,8.7625,,1019.408333,0.0,5.0
2024-04-02,19.6,30.2,24.425,17.9375,69.291667,0.0,,219.190438,11.720833,,1015.55,0.0,3.0
2024-04-03,18.5,28.0,24.216667,19.854167,77.291667,4.3,,219.731286,21.270833,,1007.983333,0.0,9.0
2024-04-04,16.3,24.6,19.4375,11.35,65.125,7.3,,275.346627,17.770833,,1010.154167,0.0,9.0
2024-04-05,13.5,26.3,19.270833,9.008333,54.708333,0.0,,276.499449,17.5625,,1015.4875,0.0,1.0


In [7]:
weekly.head(10)

Unnamed: 0_level_0,temp,dwpt,rhum,prcp,snow,wdir,wspd,wpgt,pres,tsun,coco
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2024-03-31,21.081818,13.557576,66.020202,9.1,,319.427315,10.590909,,1018.077778,0.0,17.0
2024-04-07,21.223214,13.434524,64.375,11.6,,257.020321,13.939286,,1014.819048,0.0,9.0
2024-04-14,21.640476,13.22381,62.089286,7.9,,112.062909,16.141071,,1019.017262,0.0,17.0
2024-04-21,24.380357,16.479762,64.309524,0.0,,137.004088,10.832143,,1018.828571,0.0,3.0
2024-04-28,21.855952,14.148214,63.755952,0.5,,65.387521,13.763095,,1020.65119,0.0,17.0
2024-05-05,24.596429,18.696429,72.136905,3.8,,98.334806,10.952381,,1016.688095,0.0,17.0
2024-05-12,27.5,20.039286,66.357143,37.1,,183.983808,13.235714,,1013.422619,0.0,17.0
2024-05-19,27.071429,21.691071,74.416667,28.3,,205.764188,13.633929,,1012.867262,0.0,17.0
2024-05-26,27.342857,20.054167,66.553571,1.8,,39.515237,10.589286,,1014.333929,0.0,18.0
2024-06-02,28.525595,18.7125,57.869048,3.2,,42.148369,11.502381,,1017.585119,0.0,18.0


In [8]:
monthly.head(10)

Unnamed: 0_level_0,tavg,tmin,tmax,prcp,wspd,pres,tsun
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2024-03-01,21.569167,13.5,26.9,9.1,11.1175,1017.550833,0.0
2024-04-01,22.349444,12.4,32.4,20.0,13.608333,1018.337639,0.0
2024-05-01,27.285618,19.6,36.3,74.2,11.784409,1014.453226,0.0
2024-06-01,28.093194,21.3,36.9,148.9,11.387639,1015.27125,0.0
2024-07-01,29.147984,24.1,36.3,117.1,8.593683,1018.064382,0.0
2024-08-01,28.558871,23.5,35.8,265.7,11.538441,1016.021505,0.0
2024-09-01,27.773472,23.5,34.1,202.7,9.951667,1013.040278,0.0
2024-10-01,24.556183,14.1,33.5,165.5,13.258199,1017.553898,0.0
2024-11-01,21.972778,8.5,34.6,23.9,11.416389,1017.391944,0.0
2024-12-01,17.176075,5.8,27.4,30.1,11.867876,1022.452285,0.0


In [9]:
fifteen_m.head(10)

Unnamed: 0_level_0,temp,dwpt,rhum,prcp,snow,wdir,wspd,wpgt,pres,tsun,coco
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2024-03-27 21:00:00,24.6,16.3,60.0,0.0,,300.0,18.4,,1015.2,,3.0
2024-03-27 21:15:00,24.325,16.3,61.0,0.0,,295.0,16.6,,1015.15,,3.0
2024-03-27 21:30:00,24.05,16.3,62.0,0.0,,290.0,14.8,,1015.1,,3.0
2024-03-27 21:45:00,23.775,16.3,63.0,0.0,,285.0,13.0,,1015.05,,3.0
2024-03-27 22:00:00,23.5,16.3,64.0,0.0,,280.0,11.2,,1015.0,,3.0
2024-03-27 22:15:00,23.5,16.3,64.0,0.0,,280.0,11.2,,1015.0,,3.0
2024-03-27 22:30:00,23.5,16.3,64.0,0.0,,280.0,11.2,,1015.0,,3.0
2024-03-27 22:45:00,23.5,16.3,64.0,0.0,,280.0,11.2,,1015.0,,3.0
2024-03-27 23:00:00,23.5,16.3,64.0,0.0,,280.0,11.2,,1015.0,,3.0
2024-03-27 23:15:00,23.375,16.475,65.25,0.0,,240.0,13.0,,1014.35,,3.0


For this region, S-FLA-FMPP, we are seeing a lot of `NaN` data for the columns `snow`, `wpgt`, and `tsun`. Let's take a look at the finest resolution from the API (hourly) to see what we should do with the NA data.

In [10]:
snow = hourly['snow']

print(f'There are {snow.isna().count()} missing values.')
print(f'There are {len(snow)} total values in the column.')
print(f'The percent missing is: {(snow.isna().count())/(len(snow))*100}%')

There are 8760 missing values.
There are 8760 total values in the column.
The percent missing is: 100.0%


From this we see that 100% of the snow data is unavailable. This could mean that the station does not measure snow, a measurement tool was damaged during this time, or that there was no snow. **How can we approach this programmatically in the pipeline?**

Now let's examine the other missing values

In [11]:
tsun = hourly['tsun']

print(f'There are {tsun.isna().count()} missing values.')
print(f'There are {len(tsun)} total values in the column.')
print(f'The percent missing is: {(tsun.isna().count())/(len(tsun))*100}%')

There are 8760 missing values.
There are 8760 total values in the column.
The percent missing is: 100.0%


In [12]:
wpgt = hourly['wpgt']

print(f'There are {wpgt.isna().count()} missing values.')
print(f'There are {len(wpgt)} total values in the column.')
print(f'The percent missing is: {(wpgt.isna().count())/(len(wpgt))*100}%')

There are 8760 missing values.
There are 8760 total values in the column.
The percent missing is: 100.0%


Now that we have examined a year's worth of data, we can see that **there are no measurements for the `snow`, `wpgt`, or `tsun` columns**. This means that we will need to use a fallback data source, or omit the columns from the dataset.

For now we will proceed with creating models to forecast future temperature values, then deal with fallback data later, as I believe we can get forecast data from the NOAA.

The notebook `notebooks/forecasting/intro-to-forecasting.ipynb` will contain the first forecasting notebook.