# Using Transit API for Public Transit Data Extraction

This notebook demonstrates how to extract public transit data for Montreal using the Transit API.
The goal is to collect nearby transit routes, stops, and available network data to support further analysis.

The data collected from the API was later merged with GTFS schedule data and used in downstream Python analysis, SQL exploration, and Tableau dashboards.

### **Note:**
This notebook demonstrates the structure of API requests only. No data retrieved from the API is stored, displayed, or included in the repository.

<span style="font-size:20px">**API Reference and Approach**

Transit provides an OpenAPI specification in JSON format that outlines all available endpoints, parameters, and data schemas.

The endpoints used in this notebook include:

- /public/nearby_routes: Returns transit routes near a specific location

- /public/nearby_stops: Returns nearby physical stops

- /public/available_networks: Lists all supported transit networks

<span style="font-size:20px">**Key implementation details:**

- The correct header field for authorization is "apiKey"

- Requests require parameters like latitude, longitude, radius, and flags (e.g., should_update_realtime)

- Responses come in nested JSON format, which we flattened using pandas.json_normalize() for analysis


<span style="font-size:20px">**API Access**

An API key was requested directly from the Transit team via their website. 
This key was then passed via headers on all HTTP requests:

```
HEADERS = {
    "apiKey": API_KEY,
    "Accept-Language": "en"
}

<span style="font-size:20px">**Querying Locations Across Montreal**

To get meaningful coverage, we selected five key locations across the city:

```
key_locations = [
    (45.5019, -73.5674),  # Downtown
    (45.5371, -73.5804),  # Plateau
    (45.4945, -73.6104),  # NDG
    (45.4689, -73.5702),  # Verdun
    (45.5480, -73.6078),  # Rosemont
]

<span style="font-size:20px">**Final Output**
  
- montreal_routes_all.csv	 : Unique routes near selected Montreal coordinates
- montreal_nearby_stops.csv	 : Routable physical stops within 1km radius
- available_networks.csv	 : Transit networks supported by the API

In [None]:
# Import required libraries

import requests
import pandas as pd
from time import sleep
import time
import os
from config import API_KEY

**NOTE:**

This notebook uses the Transit API. Please set your own API key as an environment variable or directly in the code.
**API keys are not included for security reasons.**

In [None]:
# Strip any whitespace
API_KEY = API_KEY.strip()

HEADERS = {
    "apiKey": API_KEY,
    "Accept-Language": "en"
}

key_locations = [
    (45.5019, -73.5674),  # Downtown
    (45.5371, -73.5804),  # Plateau
    (45.4945, -73.6104),  # NDG
    (45.4689, -73.5702),  # Verdun
    (45.5480, -73.6078),  # Rosemont
]

## Nearby Routes

In [None]:
def get_nearby_routes(api_key, lat, lon, radius=1500):
    """x
    Fetch all nearby transit routes for a given location and radius.
    Returns a DataFrame with full route details.
    """
    # Endpoint for retrieving nearby transit routes
    url = "https://external.transitapp.com/v3/public/nearby_routes"
    
    params = {
        "lat": lat,
        "lon": lon,
        "max_distance": radius,
        "should_update_realtime": False
    }

    # Send GET request to the API
    response = requests.get(url, headers=HEADERS, params=params)

    # HTTP status code 200
    if response.status_code == 200:
        
        routes = response.json().get("routes", [])
        if routes:
            df = pd.json_normalize(routes)
            df["query_lat"] = lat
            df["query_lon"] = lon

            return df
        else:
            print(f"No routes found near ({lat}, {lon})")
            return pd.DataFrame()
    else:
        print("Error:", response.status_code, response.text)
        return pd.DataFrame()

In [None]:
# Initialize an empty list to store results from each location
route_dfs = []

for lat, lon in key_locations:
    
    df = get_nearby_routes(API_KEY, lat, lon, radius=1500)
    if not df.empty:
        route_dfs.append(df)

# Combine all collected route dfs
all_routes_df = pd.concat(route_dfs, ignore_index=True)

# Remove duplicates
all_routes_df = all_routes_df.drop_duplicates(subset=["global_route_id"])

#all_routes_df.to_csv("montreal_routes_all.csv", index=False)
print("Total unique routes collected:", all_routes_df.shape[0])

In [None]:
all_routes_df.columns

## Nearby Stops

In [None]:
def get_nearby_stops(api_key, lat, lon, radius=1000):
    """
    Fetch nearby transit stops for a given location and radius using the Transit API.
    Filters out non-routable or invalid stops. Returns a cleaned DataFrame.
    """

    # Set API endpoint and headers
    url = "https://external.transitapp.com/v3/public/nearby_stops"
    headers = {
        "apiKey": api_key,
        "Accept-Language": "en"
    }

    # Define query parameters
    params = {
        "lat": lat,
        "lon": lon,
        "radius": radius,
        "stop_filter": "Routable"
    }

    # Send GET request
    response = requests.get(url, headers=headers, params=params)

    if response.status_code == 200:
        # Extract stops from response
        stops = response.json().get("stops", [])
        if stops:
            df = pd.json_normalize(stops)
            df["query_lat"] = lat
            df["query_lon"] = lon
            # Filter for valid stops only
            df = df[(df["location_type"] == 0) & df["global_stop_id"].notna()]
            return df
        else:
            print("No stops found.")
            return pd.DataFrame()
    else:
        # Handle errors
        print("Error:", response.status_code, response.text)
        return pd.DataFrame()

In [None]:
all_stops = []

for lat, lon in key_locations:
    df = get_nearby_stops(API_KEY, lat=lat, lon=lon, radius=1000)
    if not df.empty:
        all_stops.append(df)
    time.sleep(3)  #avoid rapid-fire calls

# Combine all stops into one DataFrame
all_stops_df = pd.concat(all_stops, ignore_index=True).drop_duplicates()
print("All stops shape:", all_stops_df.shape)

# Save for analysis
#all_stops_df.to_csv("montreal_nearby_stops.csv", index=False)
all_stops_df.head()

In [None]:
all_stops_df.columns

## Available Networks

In [None]:
def get_available_networks(api_key):
    """
    Fetch the list of all available transit networks from the Transit API.
    
    Returns a DataFrame with network metadata such as network name, location, and ID.
    """
    url = "https://external.transitapp.com/v3/public/available_networks"
    response = requests.get(url, headers={"apiKey": api_key, "Accept-Language": "en"})

    if response.status_code == 200:
        networks = response.json().get("networks", [])
        if networks:
            df = pd.json_normalize(networks)
            return df
        else:
            print("No networks found.")
            return pd.DataFrame()
    else:
        print("Error:", response.status_code, response.text)
        return pd.DataFrame()

In [None]:
networks_df = get_available_networks(API_KEY)
#networks_df.to_csv("available_networks.csv", index=False)
print("Networks fetched:", networks_df.shape[0])
networks_df.head()

In [None]:
networks_df.columns

<span style="font-size:20px">**Summary**

- Applied an OpenAPI spec to understand the available endpoints

- Queried REST APIs using secure headers and custom parameters

- Flattened nested JSON responses using pandas

- Built and exported structured datasets for downstream analysis

- These datasets were later combined with GTFS schedule data for further analysis, forecasting and visualizations in Python, SQL, and Tableau. 