# Meeting Notebook for August 29, 2025

## Preliminary Notes:
From the August 21 meeting, we established that a transformer model on the aggregated data with a dashboard alone was not sufficient because the amount of data was very small. Instead, we have pivoted to an LSTM/Transformer/DLinear model comparison with the following changes:
- Consider a collection of neighboring counties. Grab the environmental variables from each of the other counties and use that data as extra columns in the feature vector.
- What is the definition of neighboring counties? Well, we want to stretch this definition to three possible interpretations: 
  1. Contiguous connection
  2. All of the central valley counties
  3. All counties surrounding the central valley
- From a recent paper, it seems rats and other rodents are carriers for Coccidioidomycosis. So look into getting rat spray data
- Eventually convert this into a better dashboard (stretch goal)


In [13]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
import sys 
import os 

sys.path.append(os.path.abspath('..'))

In [14]:
# Set up the data

all_county_df = pd.read_csv("../../data/county_adjacency_list.csv")

In [15]:
all_county_df.head()

Unnamed: 0,County Name,County GEOID,Neighbor Name,Neighbor GEOID
0,"Autauga County, AL",1001,"Autauga County, AL",1001
1,"Autauga County, AL",1001,"Chilton County, AL",1021
2,"Autauga County, AL",1001,"Dallas County, AL",1047
3,"Autauga County, AL",1001,"Elmore County, AL",1051
4,"Autauga County, AL",1001,"Lowndes County, AL",1085


In [16]:
adj_map = (
  # first remove the repeat of county_name and neighbor_name, as every county is its own
  # neighbor
  all_county_df[all_county_df['County Name'] != all_county_df['Neighbor Name']]

  # you want to groupby the county_name but then select the neighbors
  .groupby('County Name')['Neighbor Name']

  # sort, remove repeats, and remove any NAs
  .apply(lambda s_: sorted(s_.dropna().unique()))

  # Convert to a dict for easy searching
  .to_dict()
)

In [17]:
adj_map["Fresno County, CA"]

['Inyo County, CA',
 'Kings County, CA',
 'Madera County, CA',
 'Merced County, CA',
 'Mono County, CA',
 'Monterey County, CA',
 'San Benito County, CA',
 'Tulare County, CA']

In [18]:
aqi_2008_df = pd.read_csv("../../data/daily_aqi_by_county_2008.csv")
aqi_2008_df.head()

Unnamed: 0,State Name,county Name,State Code,County Code,Date,AQI,Category,Defining Parameter,Defining Site,Number of Sites Reporting
0,Alabama,Baldwin,1,3,2008-01-04,51,Moderate,PM2.5,01-003-0010,1
1,Alabama,Baldwin,1,3,2008-01-07,21,Good,PM2.5,01-003-0010,1
2,Alabama,Baldwin,1,3,2008-01-10,20,Good,PM2.5,01-003-0010,1
3,Alabama,Baldwin,1,3,2008-01-13,50,Good,PM2.5,01-003-0010,1
4,Alabama,Baldwin,1,3,2008-01-16,41,Good,PM2.5,01-003-0010,1


In [19]:
aqi_2008_df['Date'] = pd.to_datetime(aqi_2008_df['Date'])

In [24]:
aqi_2008_df['YearMonth'] = aqi_2008_df['Date'].dt.to_period('M')
aqi_2008_df.head()

Unnamed: 0,State Name,county Name,State Code,County Code,Date,AQI,Category,Defining Parameter,Defining Site,Number of Sites Reporting,YearMonth
0,Alabama,Baldwin,1,3,2008-01-04,51,Moderate,PM2.5,01-003-0010,1,2008-01
1,Alabama,Baldwin,1,3,2008-01-07,21,Good,PM2.5,01-003-0010,1,2008-01
2,Alabama,Baldwin,1,3,2008-01-10,20,Good,PM2.5,01-003-0010,1,2008-01
3,Alabama,Baldwin,1,3,2008-01-13,50,Good,PM2.5,01-003-0010,1,2008-01
4,Alabama,Baldwin,1,3,2008-01-16,41,Good,PM2.5,01-003-0010,1,2008-01


In [32]:
aqi_monthly_25 = (
  aqi_2008_df[aqi_2008_df['Defining Parameter'] == 'PM2.5']
  .groupby(['county Name', 'YearMonth'])['AQI'].mean()
  .reset_index()
  .groupby('county Name')
  .apply(lambda s: dict(zip(s['YearMonth'].astype(str), s['AQI'])))
  .to_dict()
)

  aqi_2008_df[aqi_2008_df['Defining Parameter'] == 'PM2.5']


In [33]:
aqi_monthly_25['Fresno']

{'2008-01': 80.48148148148148,
 '2008-02': 84.24137931034483,
 '2008-03': 63.5,
 '2008-04': 65.2,
 '2008-05': 64.7,
 '2008-07': 81.0,
 '2008-08': 59.666666666666664,
 '2008-09': 64.16666666666667,
 '2008-10': 68.81818181818181,
 '2008-11': 103.5,
 '2008-12': 93.0}

In [39]:
aqi_monthly_10 = (
  aqi_2008_df[aqi_2008_df['Defining Parameter'] == 'PM10']
  .groupby(['county Name', 'YearMonth'])['AQI'].mean()
  .unstack(fill_value=0)                               # fill missing months with 0
  .stack()
  .reset_index()
  .groupby('county Name')
  .apply(lambda s: dict(zip(s['YearMonth'].astype(str), s['AQI'])))
  .to_dict()
)

KeyError: 'AQI'

In [38]:
aqi_monthly_10["Kern"]

{'2008-01': 101.25,
 '2008-02': 100.0,
 '2008-04': 79.0,
 '2008-05': 94.75,
 '2008-06': 80.0,
 '2008-10': 181.0}