# Air Quality API Exploration

**Date:** January 3, 2026
**Goal:** Test OpenWeather Air Pollution API and understand data structure

## Cities
- **Los Angeles, CA:** (34.0522, -118.2437) - High pollution benchmark
- **Phoenix, AZ:** (33.4484, -112.0740) - Desert climate effects
- **Madison, WI:** (43.0731, -89.4012) - Local relevance, cleaner baseline

In [1]:
# Standard libraries
import os
from datetime import datetime, timedelta
import time

# Data manipulation
import pandas as pd
import numpy as np

# API calls
import requests
import json

# Environment variables
from dotenv import load_dotenv

# Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

In [2]:
# Load API key
load_dotenv()
API_KEY = os.getenv('OPENWEATHER_API_KEY')

# Verify key loaded
if API_KEY:
    print(f"API key loaded successfully!")
else:
    print("API key not found!")

# City coordinates
CITIES = {
    "Los Angeles": {"lat": 34.0522, "lon": -118.2437},
    "Phoenix": {"lat": 33.4484, "lon": -112.0740},
    "Madison": {"lat": 43.0731, "lon": -89.4012}
}

print(f"\nTracking {len(CITIES)} cities")

API key loaded successfully!

Tracking 3 cities


In [3]:
# Test API call for Madison
url = "http://api.openweathermap.org/data/2.5/air_pollution"
params = {
    "lat": 43.0731,
    "lon": -89.4012,
    "appid": API_KEY
}

response = requests.get(url, params=params)

print(f"Status Code: {response.status_code}")
print(f"\nRaw JSON:")
print(json.dumps(response.json(), indent=2))

Status Code: 200

Raw JSON:
{
  "coord": {
    "lon": -89.4012,
    "lat": 43.0731
  },
  "list": [
    {
      "main": {
        "aqi": 2
      },
      "components": {
        "co": 168.31,
        "no": 0.08,
        "no2": 7.05,
        "o3": 70.98,
        "so2": 2.03,
        "pm2_5": 4.67,
        "pm10": 5.07,
        "nh3": 1.06
      },
      "dt": 1767477951
    }
  ]
}


In [4]:
# Parse the response and extract the reading
data = response.json()
reading = data['list'][0]

# Timestamp conversion
timestamp = datetime.fromtimestamp(reading['dt'])
print(f"\nReading from: {timestamp}")

# Pollutant values
components = reading['components']
print(f"\nPollutant Levels (μg/m³):")
for pollutant, value in components.items():
    print(f"  {pollutant.upper():6s}: {value:7.2f}")

# Key metric
pm25 = components['pm2_5']
print(f"\nPM2.5: {pm25:.2f} μg/m³")
print(f"   OpenWeather AQI: {reading['main']['aqi']} (1-5 scale)")
print(f"   Estimated EPA AQI: ~{int(pm25 * 4.17)} (0-500 scale)")


Reading from: 2026-01-03 16:05:51

Pollutant Levels (μg/m³):
  CO    :  168.31
  NO    :    0.08
  NO2   :    7.05
  O3    :   70.98
  SO2   :    2.03
  PM2_5 :    4.67
  PM10  :    5.07
  NH3   :    1.06

PM2.5: 4.67 μg/m³
   OpenWeather AQI: 2 (1-5 scale)
   Estimated EPA AQI: ~19 (0-500 scale)


In [6]:
print("Testing all cities...\n")

for city_name, coords in CITIES.items():
    params = {
        "lat": coords['lat'],
        "lon": coords['lon'],
        "appid": API_KEY
    }
    
    response = requests.get(
        "http://api.openweathermap.org/data/2.5/air_pollution",
        params=params
    )
    
    if response.status_code == 200:
        data = response.json()
        pm25 = data['list'][0]['components']['pm2_5']
        owm_aqi = data['list'][0]['main']['aqi']
        
        print(f"{city_name:15s} PM2.5: {pm25:6.2f} μg/m³  |  OWM AQI: {owm_aqi}/5")
    else:
        print(f"{city_name}: Error {response.status_code}")
    
    # Pause to respect API rate limits
    time.sleep(1)


Testing all cities...

Los Angeles     PM2.5:   1.71 μg/m³  |  OWM AQI: 2/5
Phoenix         PM2.5:   5.22 μg/m³  |  OWM AQI: 2/5
Madison         PM2.5:   4.67 μg/m³  |  OWM AQI: 2/5


**Observations: LA**
-LA has suprisingly clean air. Currently cleaner than Madison
-It is a Satuday which could means less traffic usually
-Winter season means less wildfires and therefore cleaner air during this time
-Hypothesis: LA will have more variability in its air quality as compared to Madison

**Observations: Madison/Pheonix**
-Both in the "Good" range
-Only 0.55 μg/m³ difference between the two cities
-Pheonix is slightly higher
-All cities show OWM AQI to be 2/5 (need to calculate EPA AQI manually for better analysis)

**Questions Raised**
-Is this typical for each city?
-What's the normal PM2.5 range?
-Does air quality vary between time of day and day of the week?
-Which factor drives air quality for each city?

**Notes**
-All polutants measured in micrograms per cubic meter
-API response structure is consistent acorss all cities




In [8]:
# Save current snapshot for reference
import pandas as pd
from datetime import datetime

snapshot_data = []
for city_name, coords in CITIES.items():
    params = {
        "lat": coords['lat'],
        "lon": coords['lon'],
        "appid": API_KEY
    }
    
    response = requests.get(
        "http://api.openweathermap.org/data/2.5/air_pollution",
        params=params
    )
    
    if response.status_code == 200:
        data = response.json()
        reading = data['list'][0]
        
        snapshot_data.append({
            'city': city_name,
            'timestamp': datetime.fromtimestamp(reading['dt']),
            'pm2_5': reading['components']['pm2_5'],
            'pm10': reading['components']['pm10'],
            'no2': reading['components']['no2'],
            'o3': reading['components']['o3'],
            'co': reading['components']['co'],
            'so2': reading['components']['so2'],
            'owm_aqi': reading['main']['aqi']
        })
    
    time.sleep(1)

# Create DataFrame
snapshot_df = pd.DataFrame(snapshot_data)

# Display
print("\nCurrent Air Quality Snapshot Saved:")
print(snapshot_df.to_string(index=False))

# Save to CSV for reference
snapshot_df.to_csv('../data/raw/snapshot_2026-01-03.csv', index=False)
print("\nSaved to: data/raw/snapshot_2026-01-03.csv")


Current Air Quality Snapshot Saved:
       city           timestamp  pm2_5  pm10  no2    o3     co  so2  owm_aqi
Los Angeles 2026-01-03 16:29:27   1.71  2.80 0.25 66.61 114.66 0.09        2
    Phoenix 2026-01-03 16:31:12   6.45 32.39 0.34 67.43 108.36 0.06        2
    Madison 2026-01-03 16:32:02   4.35  4.72 6.44 72.56 166.44 1.99        2

Saved to: data/raw/snapshot_2026-01-03.csv


## Testing Weather API

**Goal**: Fetch current weather for Madison and understand the data structure

**Key fields we need**:
- Temperature (convert from Kelvin to Fahrenheit)
- Humidity 
- Pressure 
- Wind speed and wind direction
- Wind direction 

In [10]:
# API endpoint for current weather
weather_url = "http://api.openweathermap.org/data/2.5/weather"

# Parameters for Madison
params = {
    "lat": 43.0731,
    "lon": -89.4012,
    "appid": API_KEY
}

# Make the API call
response = requests.get(weather_url, params=params)

# Check if successful
print(f"Status Code: {response.status_code}")

# Print the raw JSON
print(f"\nRaw JSON Response:")
print(json.dumps(response.json(), indent=2))

Status Code: 200

Raw JSON Response:
{
  "coord": {
    "lon": -89.4012,
    "lat": 43.0731
  },
  "weather": [
    {
      "id": 804,
      "main": "Clouds",
      "description": "overcast clouds",
      "icon": "04n"
    }
  ],
  "base": "stations",
  "main": {
    "temp": 267.35,
    "feels_like": 264.71,
    "temp_min": 266.64,
    "temp_max": 268.31,
    "pressure": 1020,
    "humidity": 64,
    "sea_level": 1020,
    "grnd_level": 982
  },
  "visibility": 10000,
  "wind": {
    "speed": 1.54,
    "deg": 280
  },
  "clouds": {
    "all": 100
  },
  "dt": 1767480353,
  "sys": {
    "type": 2,
    "id": 2032790,
    "country": "US",
    "sunrise": 1767446952,
    "sunset": 1767479682
  },
  "timezone": -21600,
  "id": 5261457,
  "name": "Madison",
  "cod": 200
}


### Extracting Key Weather Fields

We need to:
1. Extract temperature and convert Kelvin to Fahrenheit
2. Extract humidity
3. Extract pressure 
4. Extract wind speed and convert m/s to mph
5. Extract wind direction 
6. Convert the timestamp to readable format

In [11]:
# Parse the JSON response and extract the timestamp
weather_data = response.json()
timestamp = datetime.fromtimestamp(weather_data['dt'])

# Extract main weather metrics
main = weather_data['main']
wind = weather_data['wind']

# Temperature conversion: Kelvin to Fahrenheit
temp_kelvin = main['temp']
temp_fahrenheit = (temp_kelvin - 273.15) * 9/5 + 32

# Wind speed conversion: m/s to mph
wind_speed_ms = wind['speed']
wind_speed_mph = wind_speed_ms * 2.237

# Wind direction
wind_direction = wind['deg']

# Other metrics (already in correct units)
humidity = main['humidity']
pressure = main['pressure']

# Display everything nicely
print(f"Timestamp: {timestamp}")
print(f"\nTemperature:")
print(f"   Kelvin:      {temp_kelvin:.2f} K")
print(f"   Fahrenheit:  {temp_fahrenheit:.1f}°F")
print(f"\nWind:")
print(f"   Speed:       {wind_speed_ms:.2f} m/s  ({wind_speed_mph:.1f} mph)")
print(f"   Direction:   {wind_direction}° (WNW)")
print(f"\nAtmospheric:")
print(f"   Humidity:    {humidity}%")
print(f"   Pressure:    {pressure} hPa")
print(f"\nConditions:  {weather_data['weather'][0]['description']}")

Timestamp: 2026-01-03 16:45:53

Temperature:
   Kelvin:      267.35 K
   Fahrenheit:  21.6°F

Wind:
   Speed:       1.54 m/s  (3.4 mph)
   Direction:   280° (WNW)

Atmospheric:
   Humidity:    64%
   Pressure:    1020 hPa

Conditions:  overcast clouds


## Comparing Weather Across All 3 Cities

**Goal**: Determine if weather differences explain air quality differences

**Hypothesis**: LA's unexpectedly clean air might be due to favorable weather conditions today

**What will be compared**:
- Temperature 
- Wind speed (affects dispersion)
- Pressure (affects trapping)
- Humidity 

In [12]:
# Fetch weather for all 3 cities
print("Fetching weather data for all cities...\n")

weather_results = []

for city_name, coords in CITIES.items():
    # Set up API parameters
    params = {
        "lat": coords['lat'],
        "lon": coords['lon'],
        "appid": API_KEY
    }
    
    # Make API call
    response = requests.get(
        "http://api.openweathermap.org/data/2.5/weather",
        params=params
    )
    
    # If successful, extract data
    if response.status_code == 200:
        data = response.json()
        
        # Extract and convert key metrics
        temp_f = (data['main']['temp'] - 273.15) * 9/5 + 32
        wind_mph = data['wind']['speed'] * 2.237
        
        # Store results
        weather_results.append({
            'city': city_name,
            'temp_f': temp_f,
            'humidity': data['main']['humidity'],
            'pressure': data['main']['pressure'],
            'wind_mph': wind_mph,
            'wind_deg': data['wind']['deg'],
            'conditions': data['weather'][0]['description']
        })
        
        print(f"{city_name}: {temp_f:.1f}°F, {wind_mph:.1f} mph wind")
    else:
        print(f"{city_name}: Error {response.status_code}")
    
    # Be nice to the API (avoid rate limiting)
    time.sleep(1)

print("\nDone!")

Fetching weather data for all cities...

Los Angeles: 58.7°F, 9.2 mph wind
Phoenix: 69.8°F, 0.0 mph wind
Madison: 21.8°F, 3.4 mph wind

Done!


### Weather Comparison Table

Organizing all weather metrics side-by-side to identify patterns

In [13]:
# Create a DataFrame for easy comparison
weather_df = pd.DataFrame(weather_results)

# Add PM2.5 data collected earlier (from pollution API)
pm25_values = {
    'Los Angeles': 1.71,
    'Phoenix': 5.22,
    'Madison': 4.67
}

weather_df['pm2_5'] = weather_df['city'].map(pm25_values)

# Reorder columns for readability
weather_df = weather_df[['city', 'pm2_5', 'temp_f', 'wind_mph', 'wind_deg', 
                         'humidity', 'pressure', 'conditions']]

# Round numbers for clean display
weather_df['temp_f'] = weather_df['temp_f'].round(1)
weather_df['wind_mph'] = weather_df['wind_mph'].round(1)

# Display the table
print("=" * 80)
print("WEATHER & AIR QUALITY COMPARISON - January 3, 2026")
print("=" * 80)
print(weather_df.to_string(index=False))
print("=" * 80)

WEATHER & AIR QUALITY COMPARISON - January 3, 2026
       city  pm2_5  temp_f  wind_mph  wind_deg  humidity  pressure      conditions
Los Angeles   1.71    58.7       9.2       130        95      1015            mist
    Phoenix   5.22    69.8       0.0         0        51      1013   broken clouds
    Madison   4.67    21.8       3.4       300        64      1020 overcast clouds


### Key Findings from Weather-Pollution Analysis

#### Finding #1: Wind Speed Dominates
- LA has stronger wind than Madison (9.2 vs 3.4 mph)
- This explains why LA's PM2.5 (1.71) is lowest despite typically high emissions
-Wind speed will be a critical feature in the predictive model

#### Finding #2: Ocean Breeze Effect (LA)
- Wind direction: 130° (Southeast) = coming from Pacific Ocean
- Humidity: 95% + mist conditions
- Ocean air is displacing polluted city air
- LA's air quality is highly variable depending on wind direction

#### Finding #3: Complete Calm in Phoenix
- Wind: 0.0 mph 
- Yet PM2.5 only 5.22 (still good)
- Reason is that Saturday night = low traffic/industrial emissions
- Day of week are important predictors

#### Finding #4: Pressure Variations
- Madison: 1020 hPa (highest)
- Phoenix: 1013 hPa (lowest)
- LA: 1015 hPa (middle)
- Small variations across cities
- Pressure might be less important than wind speed

#### Finding #5: Temperature Extremes
- Range: 21.8°F (Madison) to 69.8°F (Phoenix) = 48°F difference
- Different seasons, same moment in time
- Temperature affects pollution chemistry differently in each city

### Hypotheses to Test with Historical Data

1. **LA variability**: Does LA have much higher PM2.5 on calm days?
2. **Phoenix heat**: Does Phoenix have worse O3 in summer heat?
3. **Madison inversions**: Does high pressure + winter = pollution spikes?
4. **Weekend effect**: Are these cities always cleaner on weekends?
5. **Rush hour**: How much does traffic time matter in each city?

In [14]:
# Save the combined snapshot
weather_df.to_csv('../data/raw/weather_pollution_snapshot_2026-01-03.csv', index=False)

print("Saved combined weather + pollution snapshot")
print(f"   Location: data/raw/weather_pollution_snapshot_2026-01-03.csv")
print(f"   Rows: {len(weather_df)}")
print(f"   Columns: {len(weather_df.columns)}")

Saved combined weather + pollution snapshot
   Location: data/raw/weather_pollution_snapshot_2026-01-03.csv
   Rows: 3
   Columns: 8


## EPA AQI Calculation Function

**Purpose**: Convert PM2.5 (μg/m³) to EPA AQI (0-500 scale)

**Why?**: 
- OpenWeather's 1-5 scale is too small
- EPA scale is the standard for US air quality
- Needed for model target variable and user-facing predictions

**Method**: Piecewise linear interpolation using EPA breakpoints

In [15]:
def calculate_epa_aqi(pm25):
    """
    Calculate EPA AQI from PM2.5 concentration.
    
    Parameters:
    -----------
    pm25 : float
        PM2.5 concentration in μg/m³
    
    Returns:
    --------
    aqi : int
        EPA AQI value (0-500 scale)
    category : str
        AQI category (e.g., "Good", "Moderate")
    color : str
        Associated color for visualization
    
    Formula:
    --------
    AQI = [(I_high - I_low) / (C_high - C_low)] × (C - C_low) + I_low
    
    Where C is PM2.5 concentration, I is AQI value
    """
    
    # EPA PM2.5 breakpoints: (C_low, C_high, I_low, I_high, category, color)
    breakpoints = [
        (0.0,   12.0,  0,   50,  "Good",                           "Green"),
        (12.1,  35.4,  51,  100, "Moderate",                       "Yellow"),
        (35.5,  55.4,  101, 150, "Unhealthy for Sensitive Groups", "Orange"),
        (55.5,  150.4, 151, 200, "Unhealthy",                      "Red"),
        (150.5, 250.4, 201, 300, "Very Unhealthy",                 "Purple"),
        (250.5, 500.4, 301, 500, "Hazardous",                      "Maroon"),
    ]
    
    # Handle edge cases
    if pm25 < 0:
        return 0, "Invalid", "Gray"
    if pm25 > 500.4:
        return 500, "Beyond Index", "Maroon"
    
    # Find the appropriate breakpoint range
    for c_low, c_high, i_low, i_high, category, color in breakpoints:
        if c_low <= pm25 <= c_high:
            # Apply the linear formula
            aqi = ((i_high - i_low) / (c_high - c_low)) * (pm25 - c_low) + i_low
            return int(round(aqi)), category, color
    
    # Fallback (shouldn't reach here)
    return 0, "Error", "Gray"

In [16]:
# Test with our current data
print("Testing EPA AQI Calculator")
print("=" * 60)

test_cities = {
    'Los Angeles': 1.71,
    'Phoenix': 5.22,
    'Madison': 4.67
}

for city, pm25 in test_cities.items():
    aqi, category, color = calculate_epa_aqi(pm25)
    print(f"{city:15s} PM2.5: {pm25:5.2f} to AQI: {aqi:3d} ({category}, {color})")

print("=" * 60)

Testing EPA AQI Calculator
Los Angeles     PM2.5:  1.71 to AQI:   7 (Good, Green)
Phoenix         PM2.5:  5.22 to AQI:  22 (Good, Green)
Madison         PM2.5:  4.67 to AQI:  19 (Good, Green)


In [17]:
# Test edge cases and different AQI ranges
print("\nTesting Edge Cases & Different AQI Ranges")
print("=" * 70)

edge_cases = [
    ("Negative (sensor error)", -5.0),
    ("Perfect air", 0.0),
    ("Good/Moderate boundary", 12.0),
    ("Just into Moderate", 12.1),
    ("Upper Moderate", 35.0),
    ("Unhealthy for Sensitive", 50.0),
    ("Unhealthy", 100.0),
    ("Very Unhealthy", 200.0),
    ("Hazardous", 300.0),
    ("Off the scale", 600.0),
]

for description, pm25 in edge_cases:
    aqi, category, color = calculate_epa_aqi(pm25)
    print(f"{description:30s} PM2.5: {pm25:6.1f} → AQI: {aqi:3d} ({category:30s} {color})")

print("=" * 70)


Testing Edge Cases & Different AQI Ranges
Negative (sensor error)        PM2.5:   -5.0 → AQI:   0 (Invalid                        Gray)
Perfect air                    PM2.5:    0.0 → AQI:   0 (Good                           Green)
Good/Moderate boundary         PM2.5:   12.0 → AQI:  50 (Good                           Green)
Just into Moderate             PM2.5:   12.1 → AQI:  51 (Moderate                       Yellow)
Upper Moderate                 PM2.5:   35.0 → AQI:  99 (Moderate                       Yellow)
Unhealthy for Sensitive        PM2.5:   50.0 → AQI: 137 (Unhealthy for Sensitive Groups Orange)
Unhealthy                      PM2.5:  100.0 → AQI: 174 (Unhealthy                      Red)
Very Unhealthy                 PM2.5:  200.0 → AQI: 250 (Very Unhealthy                 Purple)
Hazardous                      PM2.5:  300.0 → AQI: 340 (Hazardous                      Maroon)
Off the scale                  PM2.5:  600.0 → AQI: 500 (Beyond Index                   Maroon)


### Adding EPA AQI to Our Dataset

With this working AQI calculator, I will add EPA AQI to our weather comparison table and save the updated version.

In [18]:
# Add EPA AQI calculations to our weather DataFrame
weather_df['epa_aqi'] = weather_df['pm2_5'].apply(
    lambda x: calculate_epa_aqi(x)[0]  # [0] gets just the AQI number
)

weather_df['aqi_category'] = weather_df['pm2_5'].apply(
    lambda x: calculate_epa_aqi(x)[1]  # [1] gets the category
)

weather_df['aqi_color'] = weather_df['pm2_5'].apply(
    lambda x: calculate_epa_aqi(x)[2]  # [2] gets the color
)

# Reorder columns for better readability
weather_df = weather_df[['city', 'pm2_5', 'epa_aqi', 'aqi_category', 'aqi_color',
                         'temp_f', 'wind_mph', 'wind_deg', 'humidity', 'pressure', 'conditions']]

# Display the updated table
print("=" * 90)
print("UPDATED WEATHER & AIR QUALITY - WITH EPA AQI")
print("=" * 90)
print(weather_df.to_string(index=False))
print("=" * 90)

UPDATED WEATHER & AIR QUALITY - WITH EPA AQI
       city  pm2_5  epa_aqi aqi_category aqi_color  temp_f  wind_mph  wind_deg  humidity  pressure      conditions
Los Angeles   1.71        7         Good     Green    58.7       9.2       130        95      1015            mist
    Phoenix   5.22       22         Good     Green    69.8       0.0         0        51      1013   broken clouds
    Madison   4.67       19         Good     Green    21.8       3.4       300        64      1020 overcast clouds


In [19]:
# Save the updated dataset with EPA AQI
weather_df.to_csv('../data/raw/weather_pollution_epa_aqi_2026-01-03.csv', index=False)

print("Saved updated dataset with EPA AQI")
print(f"   Location: data/raw/weather_pollution_epa_aqi_2026-01-03.csv")
print(f"   Columns: {list(weather_df.columns)}")
print(f"\nColumn Summary:")
print(f"   • Identifiers: city")
print(f"   • Air Quality: pm2_5, epa_aqi, aqi_category, aqi_color")
print(f"   • Weather: temp_f, wind_mph, wind_deg, humidity, pressure, conditions")

Saved updated dataset with EPA AQI
   Location: data/raw/weather_pollution_epa_aqi_2026-01-03.csv
   Columns: ['city', 'pm2_5', 'epa_aqi', 'aqi_category', 'aqi_color', 'temp_f', 'wind_mph', 'wind_deg', 'humidity', 'pressure', 'conditions']

Column Summary:
   • Identifiers: city
   • Air Quality: pm2_5, epa_aqi, aqi_category, aqi_color
   • Weather: temp_f, wind_mph, wind_deg, humidity, pressure, conditions


## Session Summary - January 3, 2026

### Today's Accomplishments

#### **1. API Setup & Testing**
- Successfully connected to OpenWeather Air Pollution API
- Successfully connected to OpenWeather Weather API
- Tested all endpoints with 3 cities (Los Angeles, Phoenix, Madison)
- Confirmed data quality: no missing values, consistent structure

#### **2. Data Collection**
- Fetched current air quality for all 3 cities
- Fetched current weather conditions for all 3 cities
- Created combined dataset with 11 features
- Saved snapshots for future reference

#### **3. Built EPA AQI Calculator**
- Implemented official EPA piecewise linear formula
- Handles all 6 AQI categories
- Includes edge case handling (negative values, extreme values)
- Tested with 10 test cases (all passed)

### Questions for Next Session

1. How does air quality vary hour-by-hour in each city?
2. What's the typical PM2.5 range for each city over a year?
3. Does LA always have high wind, or is today unusual?
4. How predictable is air quality 24 hours ahead?
5. Which features are most important for prediction?

### Next Steps

**Immediate (Next Session):**
- Test historical pollution API (get 1 year of data)
- Test historical weather API or find alternative source
- Design database schema for time-series storage
- Begin collecting daily data (start building validation dataset)

**Short-term:**
- Collect 12 months historical data for all cities
- Comprehensive EDA with visualizations
- Identify hourly, daily, and seasonal patterns
- Calculate feature correlations

**Medium-term:**
- Build baseline models (persistence, moving average)
- Develop ML models (Random Forest, XGBoost, LSTM)
- Model evaluation and comparison
- Feature importance analysis

**Long-term:**
- Power BI dashboard development
- Streamlit web app deployment
- Real-time prediction validation
- Documentation and portfolio presentation

### Reflection

**What went well:**
- API setup was smooth, no major issues
- EPA AQI function worked as expected after trial and error
- Discovered meaningful insight about wind speed impact
- Clean, documented code 

**What was surprising:**
- LA had cleanest air (expected it to be worst)
- Ocean breeze effect was clearly visible in the data
- Phoenix had zero wind 
- All three cities in "Good" category despite very different conditions

**What I learned:**
- Weather is a powerful predictor of air quality
- Wind speed/direction are more important than temperature for dispersion
- Day of week matters (weekend effect)
- Need historical data to understand typical patterns vs anomalies