# Data Pipeline Methodology

> Ahmed Hasan (7932883) — COMP 4710 Group 11, Winnipeg PTN Analysis

This notebook documents the data pipeline, demonstrates GTFS analysis libraries, and provides frequency analysis outputs.

## 1. Data Sources

| Source | Dataset | Purpose |
|--------|---------|--------|
| Winnipeg Transit | GTFS | Schedule network (stops, routes, trips, stop_times) |
| Winnipeg Open Data | Neighbourhoods (`8k6x-xxsy`) | Coverage boundaries |
| Winnipeg Open Data | Community Areas (`gfvw-fk34`) | Aggregation boundaries |
| Winnipeg Open Data | Pass-ups (`mer2-irmb`) | Service quality events |
| Winnipeg Open Data | On-time (`gp3k-am4u`) | Schedule deviation |

**Primary Transit Network launch date**: June 29, 2025

## 2. Technology Stack

| Component | Role |
|-----------|------|
| DuckDB + Spatial | Embedded analytical database |
| gtfs-kit | GTFS parsing, route/stop statistics |
| gtfs-segments | Segment-level corridor analysis |
| NetworkX | Graph analytics |
| GeoPandas | Spatial analysis |
| Folium | Interactive maps (CartoDB tiles) |
| Matplotlib | Static visualization |

## 3. Pipeline Architecture

| Stage | Description | Output |
|-------|-------------|--------|
| **Ingest** | Download GTFS + Open Data | `data/raw/` |
| **Load** | Materialize tables in DuckDB | `data/processed/wpg_transit.duckdb` |
| **Transform** | Build edges, views via SQL | Network graph, metrics tables |
| **Consume** | Analysis modules query DuckDB | Frequency, network, coverage |

Build command: `make data`

## 4. Setup & Helpers

In [7]:
from pathlib import Path
import sys
import warnings
warnings.filterwarnings('ignore')

repo_root = Path.cwd()
while repo_root != repo_root.parent and not (repo_root / 'ptn_analysis').exists():
    repo_root = repo_root.parent
sys.path.insert(0, str(repo_root))

import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import folium
from shapely import wkt

from ptn_analysis.data import get_duckdb
from ptn_analysis.data.loaders import load_gtfs_feed
from ptn_analysis.config import GTFS_ZIP_PATH, WPG_BOUNDS

duckdb_connection = get_duckdb()
gtfs_feed = load_gtfs_feed()
gtfs_zip_path = str(GTFS_ZIP_PATH)

# Winnipeg map center
WPG_CENTER = [WPG_BOUNDS['center_lat'], WPG_BOUNDS['center_lon']]

print(f"✓ DuckDB connected")
print(f"✓ GTFS Feed: {len(gtfs_feed.stops)} stops, {len(gtfs_feed.routes)} routes")

✓ DuckDB connected
✓ GTFS Feed: 3873 stops, 71 routes


In [8]:
def load_neighbourhoods_gdf():
    neighbourhood_df = duckdb_connection.execute("""
        SELECT name, area_km2, ST_AsText(geometry) as geometry_wkt FROM neighbourhoods
    """).fetchdf()
    neighbourhood_df['geometry'] = neighbourhood_df['geometry_wkt'].apply(wkt.loads)
    return gpd.GeoDataFrame(neighbourhood_df, geometry='geometry', crs='EPSG:4326')

def load_stops_gdf():
    stops_df = duckdb_connection.execute("""
        SELECT s.stop_id, s.stop_name, s.stop_lat, s.stop_lon,
               COUNT(DISTINCT t.route_id) as routes_served
        FROM stops s
        LEFT JOIN stop_times st ON s.stop_id = st.stop_id
        LEFT JOIN trips t ON st.trip_id = t.trip_id
        GROUP BY s.stop_id, s.stop_name, s.stop_lat, s.stop_lon
    """).fetchdf()
    return gpd.GeoDataFrame(
        stops_df,
        geometry=gpd.points_from_xy(stops_df['stop_lon'], stops_df['stop_lat']),
        crs='EPSG:4326'
    )

neighbourhood_gdf = load_neighbourhoods_gdf()
stops_gdf = load_stops_gdf()
print(f"✓ {len(neighbourhood_gdf)} neighbourhoods, {len(stops_gdf)} stops")

✓ 237 neighbourhoods, 3873 stops


## 5. Data Status

In [9]:
tables = [
    ('stops', 'Stops'), ('routes', 'Routes'), ('trips', 'Trips'),
    ('stop_times', 'Stop Times'), ('calendar', 'Calendar'),
    ('stop_connections', 'Connections'), ('stop_connections_weighted', 'Weighted'),
    ('neighbourhoods', 'Neighbourhoods'), ('community_areas', 'Communities'),
    ('passups', 'Pass-ups'), ('ontime_performance', 'On-time'),
    ('passenger_counts', 'Passenger counts'), ('cycling_paths', 'Cycling paths'),
    ('walkways', 'Walkways'),
]
print('DATA STATUS')
print('=' * 40)
total_rows = 0
for table, name in tables:
    try:
        count = duckdb_connection.execute(f'SELECT COUNT(*) FROM {table}').fetchone()[0]
        total_rows += count
        print(f'{name:<25} {count:>15,} rows')
    except Exception:
        print(f'{name:<25} {"NOT LOADED":>15}')
print('=' * 40)
print(f'{"TOTAL ROWS":<25} {total_rows:>15,} rows')

DATA STATUS
Stops                               3,873 rows
Routes                                 71 rows
Trips                              10,834 rows
Stop Times                        464,265 rows
Calendar                                3 rows
Connections                       453,431 rows
Weighted                            4,427 rows
Neighbourhoods                        237 rows
Communities                            12 rows
Pass-ups                           11,087 rows
On-time                         5,496,268 rows
Passenger counts                2,275,201 rows
Cycling paths                       9,082 rows
Walkways                           50,000 rows
TOTAL ROWS                      8,778,791 rows


## 6. gtfs-kit

In [10]:
# Fix UTM CRS issue by pre-computing trip stats without distance calculation
trip_stats = gtfs_feed.compute_trip_stats(compute_dist_from_shapes=False)

service_dates = gtfs_feed.get_dates()
sample_date = service_dates[0]

route_statistics = gtfs_feed.compute_route_stats(
    dates=[sample_date], 
    trip_stats=trip_stats,
    split_directions=True
)
stop_statistics = gtfs_feed.compute_stop_stats(
    dates=[sample_date],
    split_directions=True
)

print(f"Service dates: {len(service_dates)} | Route stats: {len(route_statistics)} | Stop stats: {len(stop_statistics)}")

most_frequent = route_statistics.nsmallest(10, 'mean_headway')[
    ['route_short_name', 'direction_id', 'num_trips', 'mean_headway']
]
display(most_frequent)

Service dates: 119 | Route stats: 72 | Stop stats: 2665


Unnamed: 0,route_short_name,direction_id,num_trips,mean_headway
36,BLUE,0,87,12.103448
37,BLUE,1,86,12.571429
63,F8,1,67,15.234043
71,FX4,1,59,15.422222
70,FX4,0,58,15.454545
62,F8,0,65,15.688889
69,FX3,1,47,21.59375
68,FX3,0,45,22.09375
56,F5,0,43,22.806452
57,F5,1,42,23.1


## 7. gtfs-segments

In [11]:
SEGMENTS_AVAILABLE = False

try:
    import numpy as np
    if not hasattr(np, 'in1d'):
        np.in1d = np.isin
    
    from gtfs_segments import get_gtfs_segments
    transit_segments = get_gtfs_segments(gtfs_zip_path)
    SEGMENTS_AVAILABLE = True
    
    print(f"Segments: {len(transit_segments):,}")
    print(f"Traversals: {transit_segments['traversals'].min()} - {transit_segments['traversals'].max()}")
except ImportError:
    print("gtfs-segments not installed")
except Exception as error:
    print(f"Error: {error}")

Using the busiest day: 2025-12-15
Total trips processed:  9276
Segments: 5,590
Traversals: 1 - 247


In [12]:
if SEGMENTS_AVAILABLE:
    high_frequency_corridors = transit_segments.nlargest(300, 'traversals').copy()
    
    m = folium.Map(location=WPG_CENTER, zoom_start=12, tiles='CartoDB positron')
    
    max_trav = high_frequency_corridors['traversals'].max()
    min_trav = high_frequency_corridors['traversals'].min()
    
    # Add corridor segments with color gradient
    for _, row in high_frequency_corridors.iterrows():
        coords = [(c[1], c[0]) for c in row.geometry.coords]
        intensity = (row['traversals'] - min_trav) / (max_trav - min_trav)
        weight = 2 + intensity * 6
        # Orange to red gradient
        r = int(255)
        g = int(100 * (1 - intensity))
        b = int(0)
        color = f"#{r:02x}{g:02x}{b:02x}"
        folium.PolyLine(
            coords, weight=weight, color=color, opacity=0.8,
            popup=f"Traversals: {row['traversals']}"
        ).add_to(m)
    
    # Add legend
    legend_html = f'''
    <div style="position: fixed; bottom: 50px; left: 50px; z-index: 1000;
                background: white; padding: 10px; border-radius: 5px;
                border: 2px solid grey; font-size: 12px;">
        <b>Transit Corridor Frequency</b><br>
        <i style="background: #ff6400; width: 18px; height: 3px; display: inline-block;"></i> Low ({min_trav} trips)<br>
        <i style="background: #ff3200; width: 18px; height: 5px; display: inline-block;"></i> Medium<br>
        <i style="background: #ff0000; width: 18px; height: 8px; display: inline-block;"></i> High ({max_trav} trips)
    </div>
    '''
    m.get_root().html.add_child(folium.Element(legend_html))
    
    display(m)

## 8. Stop Frequency Analysis (SQL)

In [None]:
# Get stop frequencies for AM peak (6-9 AM) with route info
am_peak_stops = duckdb_connection.execute("""
    SELECT s.stop_id, s.stop_name, s.stop_lat, s.stop_lon,
           COUNT(*) as trip_count,
           COUNT(DISTINCT t.route_id) as routes_served,
           STRING_AGG(DISTINCT r.route_short_name, ', ' ORDER BY r.route_short_name) as route_names
    FROM stops s
    JOIN stop_times st ON s.stop_id = st.stop_id
    JOIN trips t ON st.trip_id = t.trip_id
    JOIN routes r ON t.route_id = r.route_id
    WHERE CAST(SPLIT_PART(st.departure_time, ':', 1) AS INT) BETWEEN 6 AND 8
    GROUP BY s.stop_id, s.stop_name, s.stop_lat, s.stop_lon
    ORDER BY trip_count DESC
""").fetchdf()

print(f"AM Peak (6-9 AM): {len(am_peak_stops)} stops")
print(f"Trip counts: {am_peak_stops['trip_count'].min()} - {am_peak_stops['trip_count'].max()}")
print(f"Top 5 busiest stops:")
display(am_peak_stops[['stop_name', 'trip_count', 'routes_served', 'route_names']].head())

In [None]:
from folium.plugins import MarkerCluster
import branca.colormap as cm

m = folium.Map(location=WPG_CENTER, zoom_start=12, tiles='CartoDB positron')

max_trips = am_peak_stops['trip_count'].max()
min_trips = am_peak_stops['trip_count'].min()

# Color scale: yellow (low) -> orange -> red (high)
colormap = cm.LinearColormap(
    colors=['#ffffb2', '#fecc5c', '#fd8d3c', '#f03b20', '#bd0026'],
    vmin=min_trips, vmax=max_trips,
    caption='AM Peak Departures (6-9 AM)'
)

# Add stops with color based on frequency
for _, row in am_peak_stops.iterrows():
    intensity = (row['trip_count'] - min_trips) / (max_trips - min_trips)
    radius = 4 + intensity * 12
    color = colormap(row['trip_count'])
    
    folium.CircleMarker(
        location=[row['stop_lat'], row['stop_lon']],
        radius=radius,
        color=color,
        fill=True,
        fill_color=color,
        fill_opacity=0.7,
        popup=folium.Popup(
            f"<b>{row['stop_name']}</b><br>"
            f"Departures: {row['trip_count']}<br>"
            f"Routes: {row['routes_served']}<br>"
            f"<small>{row['route_names']}</small>",
            max_width=250
        )
    ).add_to(m)

# Add colormap to map
colormap.add_to(m)

display(m)

## 9. Frequency Analysis (DuckDB)

Pre-computed metrics from gtfs-kit are stored in `gtfs_route_stats` and `gtfs_stop_stats` tables in DuckDB.

In [15]:
from ptn_analysis.analysis.frequency import get_frequency_summary, compute_route_frequency, get_hourly_profile

summary = get_frequency_summary()
print(f"Routes: {summary['total_routes']} | Trips: {summary['total_trips']:,} | Avg headway: {summary['mean_headway_minutes']:.1f} min")
print(f"Routes <15min: {summary['routes_under_15min']} | Routes <30min: {summary['routes_under_30min']}")

[32m2026-02-04 21:27:13.216[0m | [1mINFO    [0m | [36mptn_analysis.analysis.frequency[0m:[36mcompute_route_frequency[0m:[36m134[0m - [1mUsing default service date: 2025-12-14[0m
[32m2026-02-04 21:27:13.257[0m | [1mINFO    [0m | [36mptn_analysis.analysis.frequency[0m:[36mcompute_route_frequency[0m:[36m183[0m - [1mRetrieved frequency for 36 route groups[0m


Routes: 36 | Trips: 2,086 | Avg headway: 38.6 min
Routes <15min: 1 | Routes <30min: 13


In [None]:
route_frequency = compute_route_frequency()
hourly_departures = get_hourly_profile()
valid_headways = route_frequency[route_frequency['mean_headway'].notna()]

from ptn_analysis.analysis.frequency import get_departures_by_hour_by_route
hourly_by_route = get_departures_by_hour_by_route()

fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Hourly departures with top 10 routes breakdown
top_routes_list = route_frequency.nlargest(10, 'num_trips')['route_short_name'].tolist()
hourly_pivot = hourly_by_route[hourly_by_route['route_short_name'].isin(top_routes_list)].pivot(
    index='hour', columns='route_short_name', values='departures'
).fillna(0)
hourly_pivot.plot(kind='bar', stacked=True, ax=axes[0, 0], width=0.8, colormap='tab10')
axes[0, 0].set_title('Hourly Departures (Top 10 Routes)', fontsize=12)
axes[0, 0].set_xlabel('Hour')
axes[0, 0].set_ylabel('Departures')
axes[0, 0].legend(title='Route', loc='upper right', fontsize=8, ncol=2)
axes[0, 0].axvspan(5.5, 8.5, alpha=0.15, color='orange', label='AM Peak')
axes[0, 0].axvspan(15.5, 18.5, alpha=0.15, color='orange', label='PM Peak')

# Headway scatter plot with ALL routes labeled
ax1 = axes[0, 1]
colors = ['#27ae60' if h < 15 else '#f39c12' if h < 30 else '#e74c3c' for h in valid_headways['mean_headway']]
ax1.scatter(valid_headways['mean_headway'], valid_headways['num_trips'], c=colors, s=80, alpha=0.7)
ax1.axvline(15, color='green', linestyle='--', linewidth=2, label='15 min threshold')
ax1.axvline(30, color='orange', linestyle='--', linewidth=2, label='30 min threshold')
ax1.set_title('Route Headway vs Trips', fontsize=12)
ax1.set_xlabel('Mean Headway (minutes)')
ax1.set_ylabel('Number of Trips')
ax1.legend()

# Label all routes on the scatter plot
for _, row in valid_headways.iterrows():
    ax1.annotate(row['route_short_name'], 
                 xy=(row['mean_headway'], row['num_trips']),
                 xytext=(3, 3), textcoords='offset points',
                 fontsize=7, alpha=0.8)

# Top 20 routes by trips
top20 = route_frequency.nlargest(20, 'num_trips')
colors = ['#27ae60' if h < 15 else '#f39c12' if h < 30 else '#e74c3c' for h in top20['mean_headway']]
axes[1, 0].barh(top20['route_short_name'].astype(str), top20['num_trips'], color=colors)
axes[1, 0].set_title('Top 20 Routes by Trips (colored by headway)', fontsize=12)
axes[1, 0].set_xlabel('Number of Trips')
axes[1, 0].invert_yaxis()

# Add headway values to bars
for i, (_, row) in enumerate(top20.iterrows()):
    axes[1, 0].text(row['num_trips'] + 1, i, f"{row['mean_headway']:.0f}m", va='center', fontsize=8)

# Routes by frequency tier with route names
tier_data = {
    '<15 min\n(Frequent)': valid_headways[valid_headways['mean_headway'] < 15]['route_short_name'].tolist(),
    '15-30 min\n(Regular)': valid_headways[(valid_headways['mean_headway'] >= 15) & (valid_headways['mean_headway'] < 30)]['route_short_name'].tolist(),
    '30-60 min\n(Hourly)': valid_headways[(valid_headways['mean_headway'] >= 30) & (valid_headways['mean_headway'] < 60)]['route_short_name'].tolist(),
    '>60 min\n(Infrequent)': valid_headways[valid_headways['mean_headway'] >= 60]['route_short_name'].tolist()
}
tier_colors = ['#27ae60', '#f39c12', '#e74c3c', '#c0392b']
tier_counts = [len(v) for v in tier_data.values()]
bars = axes[1, 1].bar(tier_data.keys(), tier_counts, color=tier_colors)
axes[1, 1].set_title('Routes by Frequency Tier', fontsize=12)
axes[1, 1].set_ylabel('Number of Routes')

# Add route names below each bar
for i, (tier, routes) in enumerate(tier_data.items()):
    count = len(routes)
    axes[1, 1].text(i, count + 0.3, str(count), ha='center', fontsize=11, fontweight='bold')
    if routes:
        route_text = ', '.join(routes[:6])  # Show first 6 routes
        if len(routes) > 6:
            route_text += f'... (+{len(routes)-6})'
        axes[1, 1].text(i, -1.5, route_text, ha='center', fontsize=7, rotation=0, wrap=True)

axes[1, 1].set_ylim(-3, max(tier_counts) + 2)

plt.tight_layout()
plt.show()

# Print route summary table
print("\n📊 Route Frequency Summary:")
print(valid_headways[['route_short_name', 'num_trips', 'mean_headway']].sort_values('mean_headway').to_string(index=False))

In [17]:
# Route Headway Map - Shows routes colored by frequency tier
from shapely.geometry import LineString

# Get route shapes with headway data
route_shapes = duckdb_connection.execute("""
    SELECT r.route_id, r.route_short_name, r.route_long_name,
           sh.shape_id, sh.shape_pt_lat, sh.shape_pt_lon, sh.shape_pt_sequence
    FROM routes r
    JOIN trips t ON r.route_id = t.route_id
    JOIN shapes sh ON t.shape_id = sh.shape_id
    GROUP BY r.route_id, r.route_short_name, r.route_long_name,
             sh.shape_id, sh.shape_pt_lat, sh.shape_pt_lon, sh.shape_pt_sequence
    ORDER BY r.route_id, sh.shape_id, sh.shape_pt_sequence
""").fetchdf()

# Merge with headway data
route_headways = valid_headways[['route_short_name', 'mean_headway']].drop_duplicates()

m = folium.Map(location=WPG_CENTER, zoom_start=12, tiles='CartoDB positron')

# Group by route and create lines
for route_name in route_headways['route_short_name'].unique():
    headway = route_headways[route_headways['route_short_name'] == route_name]['mean_headway'].values[0]
    route_pts = route_shapes[route_shapes['route_short_name'] == route_name]
    
    if len(route_pts) < 2:
        continue
    
    # Color by frequency tier
    if headway < 10:
        color = '#27ae60'  # Green - high frequency
        tier = '<10 min'
    elif headway < 15:
        color = '#2ecc71'  # Light green
        tier = '10-15 min'
    elif headway < 30:
        color = '#f39c12'  # Orange
        tier = '15-30 min'
    elif headway < 60:
        color = '#e74c3c'  # Red
        tier = '30-60 min'
    else:
        color = '#c0392b'  # Dark red - low frequency
        tier = '>60 min'
    
    # Build line from shape points
    for shape_id in route_pts['shape_id'].unique():
        shape_pts = route_pts[route_pts['shape_id'] == shape_id].sort_values('shape_pt_sequence')
        coords = list(zip(shape_pts['shape_pt_lat'], shape_pts['shape_pt_lon']))
        if len(coords) >= 2:
            folium.PolyLine(
                coords, weight=3, color=color, opacity=0.7,
                popup=f"Route {route_name}: {headway:.0f} min headway ({tier})"
            ).add_to(m)
            break  # Only draw one shape per route

# Add legend
legend_html = '''
<div style="position: fixed; bottom: 50px; left: 50px; z-index: 1000;
            background: white; padding: 10px; border-radius: 5px;
            border: 2px solid grey; font-size: 12px;">
    <b>Route Headway (Frequency)</b><br>
    <i style="background: #27ae60; width: 20px; height: 3px; display: inline-block;"></i> &lt;10 min (High)<br>
    <i style="background: #2ecc71; width: 20px; height: 3px; display: inline-block;"></i> 10-15 min<br>
    <i style="background: #f39c12; width: 20px; height: 3px; display: inline-block;"></i> 15-30 min<br>
    <i style="background: #e74c3c; width: 20px; height: 3px; display: inline-block;"></i> 30-60 min<br>
    <i style="background: #c0392b; width: 20px; height: 3px; display: inline-block;"></i> &gt;60 min (Low)
</div>
'''
m.get_root().html.add_child(folium.Element(legend_html))

display(m)

## 10. Transfer Points

Stops serving 2+ routes - key connectivity nodes in the network.

In [None]:
m = folium.Map(location=WPG_CENTER, zoom_start=12, tiles='CartoDB positron')

# Filter to stops with 2+ routes (transfer points)
transfer_stops = stops_gdf[stops_gdf['routes_served'] > 1].copy()
max_routes = transfer_stops['routes_served'].max()

for _, row in transfer_stops.iterrows():
    intensity = row['routes_served'] / max_routes
    # Green to blue gradient based on connectivity
    r = int(50 + 50 * (1 - intensity))
    g = int(100 + 100 * intensity)
    b = int(150 + 100 * intensity)
    color = f"#{r:02x}{g:02x}{b:02x}"
    folium.CircleMarker(
        location=[row['stop_lat'], row['stop_lon']],
        radius=3 + intensity * 5,
        color=color,
        fill=True,
        fill_opacity=0.7,
        popup=f"{row['stop_name']}: {row['routes_served']} routes"
    ).add_to(m)

# Add legend
legend_html = f'''
<div style="position: fixed; bottom: 50px; left: 50px; z-index: 1000;
            background: white; padding: 10px; border-radius: 5px;
            border: 2px solid grey; font-size: 12px;">
    <b>Transfer Points (2+ routes)</b><br>
    <svg width="12" height="12"><circle cx="6" cy="6" r="3" fill="#646496" opacity="0.7"/></svg> 2 routes<br>
    <svg width="12" height="12"><circle cx="6" cy="6" r="5" fill="#50c8c8" opacity="0.7"/></svg> 3-5 routes<br>
    <svg width="16" height="16"><circle cx="8" cy="8" r="7" fill="#32c8fa" opacity="0.7"/></svg> {max_routes}+ routes
</div>
'''
m.get_root().html.add_child(folium.Element(legend_html))

print(f"Showing {len(transfer_stops)} transfer stops (out of {len(stops_gdf)} total)")
display(m)

## 11. Data Coverage

In [None]:
from datetime import date
from ptn_analysis.config import PTN_LAUNCH_DATE

print(f"PTN Launch: {PTN_LAUNCH_DATE} | Today: {date.today()}")
print('=' * 50)

try:
    feed_info = duckdb_connection.execute('SELECT feed_start_date, feed_end_date FROM feed_info').fetchone()
    print(f"GTFS Feed: {feed_info[0]} to {feed_info[1]}")
except: pass

try:
    r = duckdb_connection.execute("SELECT MIN(date), MAX(date), COUNT(DISTINCT date) FROM gtfs_route_stats").fetchone()
    print(f"Route stats: {r[0]} to {r[1]} ({r[2]} dates)")
except: pass

for tbl, col, nm in [('passups', 'time', 'Pass-ups'), ('ontime_performance', 'scheduled_time', 'On-time')]:
    try:
        r = duckdb_connection.execute(f"SELECT MIN({col}), MAX({col}), COUNT(*) FROM {tbl}").fetchone()
        print(f"{nm}: {str(r[0])[:10]} to {str(r[1])[:10]} ({r[2]:,} rows)")
    except:
        print(f"{nm}: Not loaded")

## References

### Data Sources
- [Winnipeg Transit GTFS](https://winnipegtransit.com)
- [City of Winnipeg Open Data](https://data.winnipeg.ca)

### Libraries
- [gtfs-kit](https://github.com/mrcagney/gtfs_kit) — GTFS parsing by MRCagney
- [gtfs-segments](https://github.com/UTEL-UIUC/gtfs_segments) — Segment analysis by UTEL-UIUC
- [Folium](https://python-visualization.github.io/folium/) — Interactive mapping
- [DuckDB](https://duckdb.org) — Embedded analytics database

---
**Ahmed Hasan (7932883)** — COMP 4710, University of Manitoba