## CTA Bus Stops, Train Stations & Metra Stations Mapping

This notebook extracts and visualizes public transit infrastructure in Cook County using multiple sources, including KMZ/KML files and the Overpass API. The goal is to create an interactive map that displays the spatial layout of **CTA bus stops**, **CTA train stations**, and **Metra commuter rail stations**, which is critical for evaluating **transit access** in urban planning and zoning projects.

### Data Sources:
- **CTA Bus Stops** and **CTA Rail (Train) Stations**:
  - Downloaded from the **official CTA website** in KMZ format.
  - Extracted and parsed using XML/KML tools.
- **Metra Stations**:
  - Retrieved from **OpenStreetMap** via the Overpass Turbo API, filtered for Cook County.

### Workflow Overview:
1. Extract KMZ → KML files for CTA bus and train stations.
2. Parse and convert station data into usable formats (GeoDataFrames).
3. Query and collect Metra station locations using Overpass API.
4. Visualize all stations together using **Folium**, with distinct markers for each transit type.

### Purpose:
This map will be used to:
- Assess **transit-oriented development (TOD)** potential.
- Support spatial analysis related to **Connected Communities** zoning overlays.
- Overlay with parcel and zoning data to identify priority zones for investment or equitable development.

## Extracting CTA Bus stops file

In [1]:
# importing required libraries
import zipfile
import os
import geopandas as gpd
from pykml import parser
from lxml import etree
import xml.etree.ElementTree as ET
import folium
from folium.plugins import MarkerCluster
import requests
import pandas as pd
from shapely.ops import unary_union
import random
from shapely.geometry import Point, box
from folium.plugins import FastMarkerCluster

In [5]:
# Extract the KMZ file to get KML file
kmz_file = 'C:/Users/kaur6/Downloads/Urban Analytics/CTA_BusStops.kmz'

with zipfile.ZipFile(kmz_file, 'r') as kmz:
    kmz.extractall('extracted_kmz')

## Visualizing CTA bus stops on Map

In [8]:
# Load the KML file
kml_file = "C:/Users/kaur6/Downloads/Urban Analytics/extracted_kmz_b/CTA_BusStops.kml"
tree = ET.parse(kml_file)
root = tree.getroot()

# Define KML namespace
namespace = {'kml': 'http://www.opengis.net/kml/2.2'}

# Extract all Placemark elements
placemarks = root.findall('.//kml:Placemark', namespaces=namespace)

# Extract coordinates (longitude, latitude)
bus_stops = []
for placemark in placemarks:
    coordinates = placemark.find('.//kml:coordinates', namespaces=namespace)
    if coordinates is not None:
        # KML format: longitude,latitude
        coords = coordinates.text.strip().split(',')
        longitude = float(coords[0])
        latitude = float(coords[1])
        bus_stops.append((latitude, longitude))

# Create a map centered at the first bus stop
if bus_stops:
    first_stop = bus_stops[0]
    m = folium.Map(location=first_stop, zoom_start=12)

    # Use MarkerCluster for efficiency
    marker_cluster = MarkerCluster().add_to(m)

    # Add all bus stops to the cluster
    for lat, lon in bus_stops:
        folium.Marker(
            [lat, lon], 
            popup=f"Bus Stop: {lat}, {lon}"
        ).add_to(marker_cluster)

    # Save and display the map
    output_file = "C:/Users/kaur6/Downloads/Urban Analytics/bus_stops_map.html"
    m.save(output_file)
    print(f"✅ Map saved as '{output_file}'. Open this file in a browser to view.")

else:
    print("❌ No bus stops found in the KML file.")

✅ Map saved as 'C:/Users/kaur6/Downloads/Urban Analytics/bus_stops_map.html'. Open this file in a browser to view.


## Extracting CTA Train Stations

In [7]:
# Extract the KMZ file to get KML file
kmz_r_file = 'C:/Users/kaur6/Downloads/Urban Analytics/CTA_RailStations.kmz'  # Path to your KMZ file
with zipfile.ZipFile(kmz_r_file, 'r') as kmz:
    kmz.extractall('extracted_kmz')

## Visualizing CTA train stations on Map

In [None]:
# Load the KML file
kml_r_file = "C:/Users/kaur6/Downloads/Urban Analytics/extracted_kmz_r/CTA_RailStations.kml"
tree = ET.parse(kml_r_file)
root = tree.getroot()

# Define KML namespace
namespace = {'kml': 'http://www.opengis.net/kml/2.2'}

# Extract all Placemark elements
placemarks = root.findall('.//kml:Placemark', namespaces=namespace)

# Extract coordinates (longitude, latitude)
rail_stations = []
for placemark in placemarks:
    coordinates = placemark.find('.//kml:coordinates', namespaces=namespace)
    if coordinates is not None:
        # KML format: longitude,latitude
        coords = coordinates.text.strip().split(',')
        longitude = float(coords[0])
        latitude = float(coords[1])
        rail_stations.append((latitude, longitude))

# Create a map centered at the first rail station
if rail_stations:
    first_station = rail_stations[0]
    m = folium.Map(location=first_station, zoom_start=12)

    # Add markers for all bus stops
    for lat, lon in rail_stations:
        folium.Marker([lat, lon], popup=f"Rail Station: {lat}, {lon}").add_to(m)

    # Save and display the map
    m.save("rail_stations_map.html")
    print("Map saved as 'rail_stations_map.html'. Open this file in a browser to view.")

else:
    print("No rail stations found in the KML file.")

## Getting Metra Stations in Cook County from Overpass API

In [5]:
# Overpass Turbo API URL
overpass_url = "http://overpass-api.de/api/interpreter"
query = """
[out:json];
area[name="Cook County"]->.searchArea;
node["railway"="station"]["operator"~"Metra"](area.searchArea);
out body;
"""

# Fetch data from Overpass API
response = requests.get(overpass_url, params={"data": query})
data = response.json()

# Extract relevant information (ID, name, latitude, longitude)
stations = []
for element in data["elements"]:
    if "lat" in element and "lon" in element:
        name = element["tags"].get("name", "Unknown Station")
        stations.append({
            "ID": element["id"],
            "Name": name,
            "Latitude": element["lat"],
            "Longitude": element["lon"]
        })

# Convert to GeoDataFrame
gdf = gpd.GeoDataFrame(stations, geometry=gpd.points_from_xy(
    [s["Longitude"] for s in stations], [s["Latitude"] for s in stations]
))

# Set CRS for Metra stations (WGS 84 - EPSG:4326)
gdf.set_crs("EPSG:4326", allow_override=True, inplace=True)

# Save as GeoJSON and CSV
geojson_path = "metra_stations_cook_county.geojson"
csv_path = "metra_stations_cook_county.csv"
gdf.to_file(geojson_path, driver="GeoJSON")
gdf.drop(columns=["geometry"]).to_csv(csv_path, index=False)

print(f"Saved Metra stations to {geojson_path} and {csv_path}")

Saved Metra stations to metra_stations_cook_county.geojson and metra_stations_cook_county.csv


## Visualizing CTA and Metra Stations on same map

In [6]:
# Define KML namespace
namespace = {'kml': 'http://www.opengis.net/kml/2.2'}

# Load CTA Rail Stations KML file
kml_r_file = "C:/Users/kaur6/Downloads/Urban Analytics/extracted_kmz_r/CTA_RailStations.kml"
tree = ET.parse(kml_r_file)
root = tree.getroot()

# Extract all Placemark elements
placemarks = root.findall('.//kml:Placemark', namespaces=namespace)

# Extract CTA rail station coordinates
cta_stations = []
for placemark in placemarks:
    coordinates = placemark.find('.//kml:coordinates', namespaces=namespace)
    if coordinates is not None:
        coords = coordinates.text.strip().split(',')
        longitude = float(coords[0])
        latitude = float(coords[1])
        cta_stations.append((latitude, longitude))

# Print number of CTA stations
print(f"Number of CTA Rail Stations: {len(cta_stations)}")

# Convert CTA data to a GeoDataFrame
cta_gdf = gpd.GeoDataFrame(cta_stations, columns=["Latitude", "Longitude"], 
                           geometry=[Point(lon, lat) for lat, lon in cta_stations])

# Load Metra Rail Stations from GeoJSON
metra_gdf = gpd.read_file("C:/Users/kaur6/Downloads/Urban Analytics/metra_stations_cook_county.geojson")

# Print number of Metra stations
print(f"Number of Metra Rail Stations: {len(metra_gdf)}")

# Create a Folium map centered at an average location
map_center = [cta_gdf["Latitude"].mean(), cta_gdf["Longitude"].mean()]
m = folium.Map(location=map_center, zoom_start=11)

# Add CTA stations (Blue markers)
for idx, row in cta_gdf.iterrows():
    folium.Marker(
        location=[row["Latitude"], row["Longitude"]],
        popup=f"CTA Rail Station: {row['Latitude']}, {row['Longitude']}",
        icon=folium.Icon(color="blue", icon="train")
    ).add_to(m)

# Add Metra stations (Red markers)
for idx, row in metra_gdf.iterrows():
    folium.Marker(
        location=[row["Latitude"], row["Longitude"]],
        popup=f"Metra Station: {row['Name']}",
        icon=folium.Icon(color="red", icon="train")
    ).add_to(m)

# Save and display the map
m.save("cta_metra_stations_map.html")
print("Map saved as 'cta_metra_stations_map.html'. Open this file in a browser to view.")

Number of CTA Rail Stations: 144
Number of Metra Rail Stations: 132
Map saved as 'cta_metra_stations_map.html'. Open this file in a browser to view.


In [3]:
# Load KML file (it uses a driver via fiona)
kml_path = "C:/Users/kaur6/Downloads/Urban Analytics/extracted_kmz_b/CTA_BusStops.kml"

# Use the correct layer (usually first)
gdf = gpd.read_file(kml_path, driver='KML')

# Optional: preview
print(gdf.head())

# Save to GeoJSON
output_path = "C:/Users/kaur6/Downloads/Urban Analytics/CTA_BusStops.geojson"
gdf.to_file(output_path, driver="GeoJSON")
print(f"✅ Converted and saved as GeoJSON: {output_path}")

         id                          Name  \
0  ID_00000        East River Rd & Berwyn   
1  ID_00001  Cumberland Blue Line Station   
2  ID_00002         Lawrence & Cumberland   
3  ID_00003           Cumberland & Foster   
4  ID_00004           Cumberland & Berwyn   

                                         description timestamp begin end  \
0  <html xmlns:fo="http://www.w3.org/1999/XSL/For...       NaT   NaT NaT   
1  <html xmlns:fo="http://www.w3.org/1999/XSL/For...       NaT   NaT NaT   
2  <html xmlns:fo="http://www.w3.org/1999/XSL/For...       NaT   NaT NaT   
3  <html xmlns:fo="http://www.w3.org/1999/XSL/For...       NaT   NaT NaT   
4  <html xmlns:fo="http://www.w3.org/1999/XSL/For...       NaT   NaT NaT   

    altitudeMode  tessellate  extrude  visibility  drawOrder  icon snippet  \
0  clampToGround          -1        0          -1        NaN  None           
1  clampToGround          -1        0          -1        NaN  None           
2  clampToGround          -1        0 

In [4]:
# Load GeoJSON
gdf = gpd.read_file("C:/Users/kaur6/Downloads/Urban Analytics/CTA_BusStops.geojson")

# Preview first few rows
print(gdf.head())

# See all column names
print(gdf.columns)

# Optional: check CRS and row count
print("CRS:", gdf.crs)
print("Total bus stops:", len(gdf))

         id                          Name  \
0  ID_00000        East River Rd & Berwyn   
1  ID_00001  Cumberland Blue Line Station   
2  ID_00002         Lawrence & Cumberland   
3  ID_00003           Cumberland & Foster   
4  ID_00004           Cumberland & Berwyn   

                                         description timestamp begin   end  \
0  <html xmlns:fo="http://www.w3.org/1999/XSL/For...      None  None  None   
1  <html xmlns:fo="http://www.w3.org/1999/XSL/For...      None  None  None   
2  <html xmlns:fo="http://www.w3.org/1999/XSL/For...      None  None  None   
3  <html xmlns:fo="http://www.w3.org/1999/XSL/For...      None  None  None   
4  <html xmlns:fo="http://www.w3.org/1999/XSL/For...      None  None  None   

    altitudeMode  tessellate  extrude  visibility drawOrder  icon snippet  \
0  clampToGround          -1        0          -1      None  None           
1  clampToGround          -1        0          -1      None  None           
2  clampToGround          -1 