## visualize_transit_weather_trends.ipynb

This notebook creates visualizations to explore the relationship between Seattle transit activity and local weather using the Silver and Gold Delta tables.

### Purpose
To generate SQL-based visual summaries and charts that can be used in Databricks Dashboards for storytelling and trend analysis.

### Summary of Visualizations
- Vehicle update trends by date  
- Weather and temperature impact on transit  
- Route type usage  
- Time-of-day activity patterns  
- Weather conditions during forecast periods


#### Step 1: Vehicle Updates Per Day (SQL + Chart)

In [0]:
%sql
-- Daily count of vehicle position updates
-- Filters out invalid/placeholder 1970-01-01 data
SELECT
  event_date,
  COUNT(*) AS vehicle_updates
FROM delta.`dbfs:/silver/gtfs_rt`
WHERE event_date > '2023-01-01'  
GROUP BY event_date
ORDER BY event_date


Databricks visualization. Run in Databricks to view.

#### Step 2: Average Temperature + Vehicle Update Counts Per Day

In [0]:
%sql
-- Daily average temperature vs. transit activity
SELECT
  event_date,
  COUNT(*) AS vehicle_updates,
  ROUND(AVG(temperature), 1) AS avg_temperature
FROM delta.`dbfs:/gold/gtfs_rt_weather_joined`
WHERE event_date > '2023-01-01'
GROUP BY event_date
ORDER BY event_date;


Databricks visualization. Run in Databricks to view.

#### Step 3: Vehicle updates by route type

In [0]:
%sql
-- Vehicle updates by raw route_type value
SELECT
  route_type,
  COUNT(*) AS vehicle_updates
FROM delta.`dbfs:/gold/gtfs_rt_enriched`
WHERE event_date > '2023-01-01'
GROUP BY route_type
ORDER BY vehicle_updates DESC;


Databricks visualization. Run in Databricks to view.

In [0]:
%sql
-- More readable route type labels (bus, rail, etc.)
SELECT
  CASE route_type
    WHEN 0 THEN 'Light Rail'
    WHEN 1 THEN 'Subway'
    WHEN 2 THEN 'Rail'
    WHEN 3 THEN 'Bus'
    WHEN 4 THEN 'Ferry'
    WHEN 5 THEN 'Cable Car'
    WHEN 6 THEN 'Gondola'
    WHEN 7 THEN 'Funicular'
    ELSE 'Other'
  END AS route_type_label,
  COUNT(*) AS vehicle_updates
FROM delta.`dbfs:/gold/gtfs_rt_enriched`
WHERE event_date > '2023-01-01'
GROUP BY route_type_label
ORDER BY vehicle_updates DESC;


#### Step 4: SQL Query: Updates Per Hour of Day

In [0]:
%sql
-- Vehicle updates by hour of day
SELECT
  HOUR(event_ts) AS hour_of_day,
  COUNT(*) AS vehicle_updates
FROM delta.`dbfs:/silver/gtfs_rt`
WHERE event_date > '2023-01-01'
GROUP BY hour_of_day
ORDER BY hour_of_day;


Databricks visualization. Run in Databricks to view.

#### Step 5: SQL Query: Weather Conditions by Frequency

In [0]:
%sql
-- Most common weather conditions recorded
SELECT
  condition,
  COUNT(*) AS count
FROM delta.`dbfs:/silver/weather`
WHERE ingestion_date > '2025-05-25'
GROUP BY condition
ORDER BY count DESC;


Databricks visualization. Run in Databricks to view.

#### Step 6: Temperature Impact on Transit Activity

In [0]:
%sql
-- Temperature bucket vs. vehicle update counts
SELECT
  ROUND(temperature, 0) AS temp_bucket,
  COUNT(*) AS vehicle_updates
FROM delta.`dbfs:/gold/gtfs_rt_weather_joined`
WHERE event_date >= '2025-05-25'
GROUP BY ROUND(temperature, 0)
ORDER BY temp_bucket;


Databricks visualization. Run in Databricks to view.

#### Step 7: Weather Condition vs. Vehicle Updates

In [0]:
%sql
-- Weather condition vs. vehicle update count
SELECT
  condition,
  COUNT(*) AS vehicle_updates
FROM delta.`dbfs:/gold/gtfs_rt_weather_joined`
WHERE event_date >= '2025-05-25'
GROUP BY condition
ORDER BY vehicle_updates DESC;


Databricks visualization. Run in Databricks to view.