# üöå Accessibility to Public Transport
---

## üìñ Overview
This notebook provides a data-driven framework to evaluate **how easily residents can access public transit services** within a city.

By analyzing GTFS (General Transit Feed Specification) schedules and the local street network, we generate an "Accessibility Score." Unlike simple proximity maps, this analysis evaluates the **Level of Service (LoS)** at each stop, accounting for frequency (headway), transit mode (rail vs. bus), and travel speed. 

---

## üìä Evaluation Criteria: Quality & Proximity

The analysis uses a multi-dimensional scoring system to determine the final **accessibility** score. We evaluate transit stops based on two main pillars:

### 1. Stop Quality
A stop's intrinsic quality is determined by:
*   **Headway:** How long users have to wait for the next vehicle.
*   **Speed:** How fast the transit service moves once on board.
*   **Mode:** The perceived "comfort" or "reliability" factor (e.g., Rail > Tram > Bus).

### 2. Walk Proximity

### Quality functions 

To reflect real-world human behavior, we apply **Elasticity** values to our scoring. Elasticity defines the exponential decay of perceived quality as conditions worsen and can be found in common transport economy studies:
*   **Headway Elasticity:** As wait times increase, the utility of the stop drops exponentially.
*   **Walk Elasticity:** As the walking distance to a stop increases, its accessibility score decreases.
*   **Speed Elasticity:** As transit speed increases, the quality of the service improves.

**Discretization:** Results are automatically binned into 10 accessibility levels, ranging from 0 (Poor/No Access) to 1 (Excellent Access).

<div style="display: flex; width: 100%; max-width: 600px; gap: 2px; margin-bottom: 1rem; font-family: sans-serif; font-size: 0.75rem; font-weight: bold;">
  <div style="flex:1; background-color:#ff6666; color:white; text-align:center; padding:0.3rem 0;">0</div>
  <div style="flex:1; background-color:#ff9999; color:black; text-align:center; padding:0.3rem 0;">0.1</div>
  <div style="flex:1; background-color:#ffcc66; color:black; text-align:center; padding:0.3rem 0;">0.2</div>
  <div style="flex:1; background-color:#ffff66; color:black; text-align:center; padding:0.3rem 0;">0.3</div>
  <div style="flex:1; background-color:#ccff66; color:black; text-align:center; padding:0.3rem 0;">0.4</div>
  <div style="flex:1; background-color:#99ff66; color:black; text-align:center; padding:0.3rem 0;">0.5</div>
  <div style="flex:1; background-color:#66ff66; color:black; text-align:center; padding:0.3rem 0;">0.6</div>
  <div style="flex:1; background-color:#33ff66; color:black; text-align:center; padding:0.3rem 0;">0.7</div>
  <div style="flex:1; background-color:#00cc66; color:white; text-align:center; padding:0.3rem 0;">0.8</div>
  <div style="flex:1; background-color:#00aa55; color:white; text-align:center; padding:0.3rem 0;">0.9</div>
  <div style="flex:1; background-color:#009933; color:white; text-align:center; padding:0.3rem 0;">1</div>
</div>

---

## üõ†Ô∏è Technical Requirements

### üîë API Key Setup
This notebook uses the **Mobility Database** to automatically fetch GTFS feeds. 
1.  Go to [MobilityDatabase.org](https://mobilitydatabase.org/).
2.  Create a free account.
3.  Navigate to **Account Details** to find your `Refresh Token`.
4.  Paste the token in the **Public Transport Data** section below.

### üíª Environment
To run this analysis, ensure your environment is configured with:
> **Core Stack:** `pyGTFSHandler`, `UrbanAccessAnalyzer`, `osmnx`, `geopandas`, `h3`, `polars`  
> **System Dependencies:** `osmium-tool` (required for processing large-scale OSM street data)  
> **Data Sources:** OpenStreetMap (via Overpass/Geofabrik) and WorldPop (Global 100m population rasters)

---

*Prepared for use in Google Colab or local Jupyter environments.*

***

In [None]:
# If using colab
# Takes around 2-3 min

# !pip install matplotlib mapclassify folium
# !apt-get install -y osmium-tool
# !pip install "UrbanAccessAnalyzer[osm,plot,h3] @ git+https://github.com/CityScope/UrbanAccessAnalyzer.git@v1.0.0"
# !pip install "pyGTFSHandler[osm,plot] @ git+https://github.com/CityScope/pyGTFSHandler.git@v1.0.0"

# Restart notebook after installing this if needed

In [None]:
import os
from datetime import datetime, date, timedelta, time
import pandas as pd
import polars as pl
import geopandas as gpd
from pathlib import Path
import zipfile

import osmnx as ox

import UrbanAccessAnalyzer.isochrones as isochrones
import UrbanAccessAnalyzer.graph_processing as graph_processing
import UrbanAccessAnalyzer.osm as osm
import UrbanAccessAnalyzer.utils as utils
import UrbanAccessAnalyzer.h3_utils as h3_utils
import UrbanAccessAnalyzer.population as population
import UrbanAccessAnalyzer.quality as quality_utils
import UrbanAccessAnalyzer.plot_helpers as plot_helpers

from pyGTFSHandler.feed import Feed
from pyGTFSHandler.downloaders.mobility_database import MobilityDatabaseClient
import pyGTFSHandler.plot_helpers as gtfs_plot_helpers
import pyGTFSHandler.gtfs_checker as gtfs_checker
import pyGTFSHandler.processing_helpers as processing_helpers

import numpy as np

## 1 Inputs

#### General

In [None]:
city_name = "Cambridge, MA, USA"

# Download area should be larger than the aoi by 'download_buffer' meters
# It should be max_walk_distance
download_buffer = 1000

# Simplify street graph to avoid edges of less than 'min_edge_length'
min_edge_length = 30 # in m

# If you want results in h3 this is the output h3 resolution
h3_resolution = 10 

# Do show maps. For large datasets this might break or slow down execution
show_maps = True

#### Public transport GTFS timetables

**Stop grouping**

In GTFS data, bus stops typically have separate `stop_id`s for each direction. Additionally, when multiple routes serve the same physical area, they often use different `stop_id`s as well.

To handle this effectively, it makes sense to group nearby stops under a single logical identifier. This is achieved by creating or using the `parent_station` column, which allows related stops to be associated with one parent stop.

**Mode** 

The `route_type` column is mapped to more general transit modes ('bus','tram','rail'). To take better modes into account when adding the effects of multiple routes to the stop headway for 'bus' we include all transit services and for 'tram' we include rail too.


In [None]:
# Use the stop groups created with arg stop_group_distance in Feed to group neraby stops into one
# You could choose 'stop_id' otherwise
stop_id = "parent_station"
# Group near stops (less than x meters apart). This creates or updates the parent_station column
stop_group_distance = 100 # in m


start_date = datetime.today() # Could be None for min date in feed
end_date = start_date + timedelta(days=30) # Could be None for max date in feed
date_type='businessday' # Could be something like 'holiday', 'businessday', 'non_businessday', or 'monday' to only consider some dates from the range.
start_time = time(hour=8)
end_time = time(hour=20) 

# GTFS contain a column 'route_type' with int values 0-7. 
# This dict helps grouping all the possible modes in some more general categories.
simplified_route_type_mapping = {
    'bus':'all',
    'tram':[0,1,2,4,5,6,7],
    'rail':[1,2]
}
# -1 - other/nodata
# 0 - tram 
# 1 - subway 
# 2 - rail 
# 3 - bus
# 4 - ferry 
# 5 - cable car 
# 6 - gondola 
# 7 - funicular

#### Quality scoring parameters

To turn off (disregard) any contributing parameter to stop quality set its elasticity to 0

Elasticities define the exponential curves for how the perceived quality (or demand) drops when headways increase, distance increases or speed decreases

In [None]:
headway_elasticity = 0.35 # Theoretical value 0.5. In practice from 0.2 (less change) to 0.8 (more change) 
walk_elasticity = 0.25 # From 0.1 (less change) to 0.5 (more change). Recommended below than 'headway_elasticity'.
speed_elasticity = 0.2 # From 0.1 (less change) to 0.5 (more change). Recommended below than 'headway_elasticity'.
# Use the same keys as simplified_route_type_mapping
# The stop quality score will be multiplied by this factor
mode_factor = { 
    'bus':0.85, # 15% less for bus vs rail. Reasonable values are 0.75-1
    'tram':0.92, # 8% less for tram vs rail. Reasonable values are 0.75-1
    'rail':1 # Best mode
}

# Number of discrete quality scores (always in the 0-1 range)
n_accessibility_scores = 10

# min and max values you want to consider (affects performance and discretization)
max_headway = 1440 # in min
max_walk_distance = 2000 # in m
max_speed = 150 # in km/h 

min_headway = 5 # in min 
min_walk_distance = 100 # in m 
min_speed = 5 # in km/h

# Define a combination of params that should achieve a score of 1
# and calibrate the quality functions accordingly
best_quality = {  # For quality 1
    'headway':5,
    'mode':'rail',
    'speed':30,
    'distance':100,
}

# Define a combination of params that should achieve a score just above 0
# and calibrate the quality functions accordingly
worst_quality = { # For lowest quality (just above 0)
    'headway':720,
    'mode':'bus',
    'speed':10,
    'distance':2000,
}

### Results folder

Where do you want to save the results?

In [None]:
results_path = os.path.normpath("transit")

gtfs_path = os.path.join(results_path,"gtfs_files") 

city_filename = utils.sanitize_filename(city_name)
city_results_path = os.path.join(results_path,city_filename)

osm_xml_file = os.path.normpath(city_results_path+f"/streets.osm")
streets_graph_path = os.path.normpath(city_results_path+f"/streets.graphml")
streets_path = os.path.normpath(city_results_path+f"/streets.gpkg")
accessibility_streets_path = os.path.normpath(city_results_path+f"/accessibility_streets.gpkg")
population_results_path = os.path.normpath(city_results_path+f"/population.gpkg")

In [None]:
os.makedirs(results_path,exist_ok=True)
os.makedirs(gtfs_path,exist_ok=True)
os.makedirs(city_results_path,exist_ok=True)

### Area of interest
**Area of interest (aoi)**: Polygon. Geographic area where you want to run your analysis.

**Option 1:** From the internet with the city name

In [None]:
aoi = utils.get_city_geometry(city_name)
geo_suggestions = utils.get_geographic_suggestions_from_string(city_name,user_agent="app")
geo_suggestions

**Option 2:** Load your own file

In [None]:
# Geographic file (.gpkg, .geojson or .shp)

# aoi = gpd.read_file("")

In [None]:
# csv file with lat/lon columns in geographic coordinates


# df = pd.read_csv("")


# # Create geometry from lon/lat columns
# geometry = gpd.points_from_xy(df["lon"], df["lat"]) # Change column names if needed
# # Convert to GeoDataFrame
# aoi = gpd.GeoDataFrame(
#     df,
#     geometry=geometry,
#     crs="EPSG:4326"  # geographic crs Change if needed
# )

# # OR Parse WKT geometry column
# df["geometry"] = df["geometry"].apply(wkt.loads) # change to match your geometry column name
# # Convert to GeoDataFrame
# aoi = gpd.GeoDataFrame(
#     df,
#     geometry="geometry",
#     crs="EPSG:4326"  # set to whatever CRS the WKT represents
# )

Use UTM coords and create aoi_download with a buffer of X meters. To avoid boundary effects streets and pois should be downloaded for a larger area.

In [None]:
aoi = gpd.GeoDataFrame(geometry=[aoi.union_all()],crs=aoi.crs) # Ensure there is only one polygon
aoi = aoi.to_crs(aoi.estimate_utm_crs()) # Convert to utm

aoi_download = aoi.buffer(download_buffer) # Area to do streets and poi requests 

### Public transport data

**Option 1** Download GTFS feeds worldwide

With the MobilityData API

This is the organization responsible for the GTFS standard and has info from almost all the world

In [None]:
# Request your refresh token here: https://mobilitydatabase.org/ 
# you can find the token under Account Details
refresh_token = ''
api = MobilityDatabaseClient(refresh_token)

Find Feeds on the API

‚ö†Ô∏è Stop here and check that it found your feeds!!!

In [None]:
feeds = api.search_gtfs_feeds(
    country_code=geo_suggestions['country_codes'],
    subdivision_name=geo_suggestions['subdivision_names'], # This info is not always in the feeds metadata. Comment this if you did not find all feeds.
    municipality=geo_suggestions['municipalities'], # This info is not always in the feeds metadata. Comment this if you did not find all feeds.
    is_official=True, # Set to True if you only want official feeds
    #aoi=aoi, # You could comment the rest of search args and use only aoi but for now the API seems to not do this very well as the metadata is often wrong.
)

for f in feeds:
    print(f['provider'])

Download current active files

In [None]:
file_paths = api.download_feeds(
    feeds=feeds,
    download_folder=gtfs_path,
    overwrite=False
)

**Option 2** Load your own gtfs files

In [None]:
# file_paths = [
#     os.path.normpath("/home/miguel/Downloads/latest.zip"),
# ]

In [None]:
# # Unzip if needed
# unzipped_paths = []
# for p in file_paths:
#     path = Path(p)

#     # Check if it's a zip file
#     if path.is_file() and path.suffix.lower() == ".zip":
#         extract_dir = path.with_suffix("")  # same name, no .zip

#         # Extract only if folder doesn't exist
#         if not extract_dir.exists():
#             with zipfile.ZipFile(path, "r") as zip_ref:
#                 zip_ref.extractall(extract_dir)

#         unzipped_paths.append(str(extract_dir))
#     else:
#         unzipped_paths.append(str(path))

# file_paths = unzipped_paths

**Extra:** Properly check all gtfs files for validity

This takes a few minutes. If you skip this step minor validation without logs will be carried out when loading any gtfs.

In [None]:
# # Check and fix the gtfs files (This takes a few minutes). Set check_files = False in Feed to load faster

# new_gtfs_path = os.path.join(results_path,"revised_gtfs_files") 
# os.makedirs(new_gtfs_path)

# new_file_paths = []
# for f in file_paths:
#     filename = os.path.splitext(os.path.basename(f))[0]
#     if os.path.isdir(os.path.join(new_gtfs_path,filename)):
#         new_file_paths.append(os.path.join(new_gtfs_path,filename))
#     else:
#         new_file_paths.append(gtfs_checker.preprocess_gtfs(f,new_gtfs_path))

# file_paths = new_file_paths

## 2. Quality Functions

Each parameter is mapped to a **quality value in the range [0, 1]** using exponential functions derived from elasticity values.
The **mode quality** is returned directly from a predefined mapping.

All quality functions (including composite ones) may be modified as needed, provided:

* Inputs remain unchanged
* Outputs stay within **[0, 1]**

---

### Individual Quality Functions

* **`headway_quality(headway)`** ‚Üí `[0,1]`
* **`speed_quality(speed)`** ‚Üí `[0,1]`
* **`walk_quality(distance)`** ‚Üí `[0,1]`
* **`mode_quality(mode)`** ‚Üí `[0,1]`

---

### Composite Quality Functions

#### Stop Quality

* **Function:** `stop_quality(headway, mode, speed)`
* **Output:** `[0,1]`

Combines headway, mode, and speed qualities:

$Q_{\text{stop}} = f(Q_{\text{headway}}, Q_{\text{mode}}, Q_{\text{speed}})$

#### Access Quality (Final Quality)

* **Function:** `access_quality(headway, mode, speed, distance)`
* **Output:** `[0,1]`

The access quality **must internally call**:

* `stop_quality(headway, mode, speed)`
* `walk_quality(distance)`

$Q_{\text{final}} = f(Q_{\text{stop}}, Q_{\text{distance}})$

---

This design ensures modularity while allowing flexibility in how individual and composite quality scores are computed.


In [None]:
def headway_quality(headway):
    return quality_utils.elasticity_based_quality(headway,min_headway,-headway_elasticity)

def walk_quality(distance):
    return quality_utils.elasticity_based_quality(distance,min_walk_distance,-walk_elasticity)

def speed_quality(speed):
    return quality_utils.elasticity_based_quality(speed,max_speed,speed_elasticity)

def mode_quality(mode):
    if isinstance(mode, str):
        # single string
        return mode_factor[mode]
    else:
        # convert to np.array if list
        mode_arr = np.array(mode)
        # vectorized lookup
        vectorized_lookup = np.vectorize(lambda m: mode_factor[m])
        return vectorized_lookup(mode_arr)

def stop_quality(headway,mode,speed):
    return (
        headway_quality(headway) * 
        mode_quality(mode) *
        speed_quality(speed)
    )

# Calibrate stop quality with the parameters defined in the beggining
stop_quality = quality_utils.calibrate_quality_func(
    stop_quality,
    min_quality=1/n_accessibility_scores,
    max_quality=1,
    min_point=(worst_quality['headway'],worst_quality['mode'],worst_quality['speed']),
    max_point=(best_quality['headway'],best_quality['mode'],best_quality['speed']),
)

def access_quality(headway,mode,speed,distance):
    return (
        stop_quality(headway,mode,speed) * 
        walk_quality(distance)
    )

# Calibrate access quality with the parameters defined in the beggining
access_quality = quality_utils.calibrate_quality_func(
    access_quality,
    min_quality=1/n_accessibility_scores,
    max_quality=1,
    min_point=(worst_quality['headway'],worst_quality['mode'],worst_quality['speed'],worst_quality['distance']),
    max_point=(best_quality['headway'],best_quality['mode'],best_quality['speed'],best_quality['distance']),
)

#### Discretization

Build grids for fast numerical computations ensuring that the maximum quality change is less than 1/n_quality_scores

In [None]:
headway_grid, mode_grid, speed_grid, distance_grid = quality_utils.build_adaptive_grids(
    access_quality,
    variables=[
        [min_headway, max_headway],
        list(simplified_route_type_mapping.keys()),
        [min_speed, max_speed],
        [min_walk_distance, max_walk_distance],
    ],
    delta=1/n_accessibility_scores
)
# Add 0 as the first speed in the speed grid
# This avoids creating a grid at very low speeds but we still include all of them with the speed 0
speed_grid = [0,*speed_grid] 

## 3 Public transport timetables

### 3.1 Create the gtfs object

This will do:

- Load all .txt files of all gtfs folders given.

- Select only the stops from `stops.txt` inside the area of interest.
- Crop all trips in `stop_times.txt` with the stops inside the aoi + 1 more stop.
- Check the `stop_sequence` in `stop_times.txt`.
- Deal correctly with trips starting on one day and ending in the following day: hours always in 0-24 range but those trips are marked as `next_day` True. New `service_id`s are created to deal with that.
- If the file has `frequencies.txt` this is processed too dealing with the next day problem.
- If departure or arrival times are empty they get filled by interpolation.
- A shape direction column is computed at each stop. (See documentation for details)
- GTFS shapes are for now computed from the stop coordinates.

In [None]:
# If you load too many feeds you might overflow your RAM 
gtfs = Feed(
    file_paths,
    aoi=aoi,
    stop_group_distance=stop_group_distance,
    start_date=start_date,
    end_date=end_date,
    check_files=False # Setting this to False could break but speeds processing up by a lot
)

If there are **no services** in your date range you can always turn `start_date` and `end_date` to 'None' for the feed's `max` and `min` dates

### 3.2 Service Intensity

The number of vehicles that arrive at each stop every day multiplied by the number of stops:

$\text{Service Intensity} = (\text{Number of vehicles per stop}) \times (\text{Number of stops})$

This is a fast way of seing how much service is offered every day in an approximate way

In [None]:
service_intensity = gtfs.get_service_intensity_in_date_range(
    start_date=None, # If None take the feed min date
    end_date=None, # If None take the feed max date
    date_type=None, # Could be something like 'holiday', 'weekday', or 'monday' to only consider some dates from the range.
    by_feed=True
)
service_intensity = service_intensity.to_pandas()
gtfs_plot_helpers.service_intensity(service_intensity)

Select the most representative business day in a date range

In [None]:
idx = processing_helpers.most_frequent_row_index(service_intensity)
selected_day = service_intensity.iloc[idx]['date'].to_pydatetime()
selected_day

### 3.3 Average speed

Compute the average speed at stops.
To compute speed distance is meassured at a fixed time difference of 'time_step' min from each stop

<img src="https://github.com/CityScope/UrbanAccessAnalyzer/tree/main/examples/images/speed.jpg" 
     alt="Speed computation" 
     title="Speed computation" 
     width="500"/>

In [None]:
# Filter by the selected day and time bounds
speed_by = "trip_id" 
stop_speed_lf = gtfs.get_speed_at_stops(
    date=selected_day,
    start_time=start_time,
    end_time=end_time,
    route_types = 'all',
    by = speed_by, # Speed is computed for every 'trip_id' and grouped by this column with the how method
    at = stop_id, # Compute speed for every 'parent_station' 'stop_id' or 'route_id'
    how="mean", # How to group individual trip speeds 'mean' 'max' or 'min'
    direction='both', # Compute speed in 'forward' 'backward' or 'both' directions (walking n_stops in direction)
    time_step=15, # Minutes required to meassure the speed
)
if isinstance(stop_speed_lf,pl.DataFrame):
    stop_speed_lf = stop_speed_lf.lazy()

# gtfs_lf will be used for later processing so we do not need to reprocess it
gtfs_lf = gtfs.filter(
        date=selected_day,
        start_time=start_time,
        end_time=end_time,
)
gtfs_lf = gtfs_lf.join(stop_speed_lf.select([stop_id,speed_by,'speed']),on=[stop_id,speed_by],how='left')

Plot an interactive map with stops and speeds

In [None]:
m=None
if show_maps:
    # Group to speeds per stop and mode (or route_type) 
    best_stop_speed_lf = gtfs_lf.group_by(list(np.unique(["route_type", stop_id]))).agg(
        pl.col("route_id").unique().alias("route_ids"),
        (
            (pl.col("speed").abs() * pl.col("n_trips")).sum()
            / pl.col("n_trips").sum()
        ).alias("speed"),
        pl.col("n_trips").sum().alias("n_trips"),
        pl.col("isin_aoi").any().alias("isin_aoi")
    ).sort("speed")
    # Plot everythng on a map
    best_stop_speed_df = best_stop_speed_lf.collect().to_pandas()
    best_stop_speed_df = gtfs.add_stop_coords(best_stop_speed_df)
    best_stop_speed_df = gtfs.add_route_names(best_stop_speed_df)
    best_stop_speed_df = gpd.GeoDataFrame(
        best_stop_speed_df,
        geometry=gpd.points_from_xy(best_stop_speed_df['stop_lon'],y=best_stop_speed_df['stop_lat']),
        crs=4326
    )
    m = plot_helpers.general_map(
        aoi=aoi,
        pois=best_stop_speed_df,
        poi_cmap="RdYlGn",
        poi_column="speed",
    )
    m.save(city_results_path+"/stop_speed_map.html")
m

### 3.4 Average waiting time (headway) at stops

headway is in *minutes*

In this example we use the *shape_direction* mode 

- This groups all `trip_id`s at every `stop` by local shape direction.  

- Creates 'n_divisions' * 2 groups (*2 to get outbound and inbound directions independently) by clustering the trip shape directions 
- If how = 'best' means the headway is computed only for the best of all divisions at every stop 

Local shape direction computation algorithm (1 division)

<img src="https://github.com/CityScope/UrbanAccessAnalyzer/tree/main/examples/images/shape_grouping.jpg" 
     alt="Local shape direction computation algorithm (1 division)" 
     title="Local shape direction computation algorithm (1 division)" 
     width="500"/>

Local shape direction computation algorithm (3 divisions)

<img src="https://github.com/CityScope/UrbanAccessAnalyzer/tree/main/examples/images/shape_grouping_3_dirs.jpg" 
     alt="Local shape direction computation algorithm (3 divisions)" 
     title="Local shape direction computation algorithm (3 divisions)" 
     width="500"/>

In [None]:
stop_headway_df = []
for mode in mode_grid:
    gtfs_selection = gtfs._filter_by_route_type(
        gtfs_lf,
        route_types=simplified_route_type_mapping[mode]
    )
    gtfs_length = (
        gtfs_selection
        .select(pl.len())
        .collect()
        .item()
    )
    if gtfs_length == 0:
        continue 

    for i in range(len(speed_grid)-1):
        gtfs_selection = gtfs_selection.filter(
            (pl.col("speed") >= speed_grid[i])
        )
        gtfs_length_i = (
            gtfs_selection
            .select(pl.len())
            .collect()
            .item()
        )
        if gtfs_length_i == gtfs_length:
            continue 
        if gtfs_length_i == 0:
            continue 

        gtfs_length = gtfs_length_i
        df = gtfs._get_headway_at_stops(
                gtfs_selection,
                date=selected_day,
                start_time=start_time,
                end_time=end_time,
                by = "shape_direction", # headway is computed for all 'trip_id' grouped by this column and sorted by 'departure_time'
                at = stop_id, # Where to compute the headway 'stop_id' 'parent_station'
                how = "best", 
                # 'best' pick the route with best headway, 
                # 'mean' Combine all headways of all routes, 
                # 'all' return results per stop and route
                n_divisions=1, # Number of divisions for by = 'shape_direction'
        ).with_columns(
            pl.lit(mode).alias("mode"),
            pl.lit(speed_grid[i+1]).alias("speed_grid")
        )
        stop_headway_df.append(df)

stop_headway_df = (
    pl.concat(stop_headway_df)
).to_pandas()
stop_headway_df["headway_grid"] = np.array(headway_grid)[
    np.searchsorted(np.array(headway_grid), stop_headway_df["headway"], side="left")
]
stop_headway_df = gtfs.add_stop_coords(stop_headway_df)
stop_headway_df = gtfs.add_route_names(stop_headway_df)
stop_headway_df = gpd.GeoDataFrame(
    stop_headway_df,
    geometry=gpd.points_from_xy(stop_headway_df['stop_lon'],y=stop_headway_df['stop_lat']),
    crs=4326
)
stop_headway_df = stop_headway_df[stop_headway_df.geometry.is_valid]
stop_headway_df["stop_quality"] = stop_headway_df.apply(
    lambda row: stop_quality(
        row["headway"],
        row["mode"],
        row["speed_grid"],
    ),
    axis=1
)
stop_headway_df["stop_quality_grid"] = stop_headway_df.apply(
    lambda row: stop_quality(
        row["headway_grid"],
        row["mode"],
        row["speed_grid"],
    ),
    axis=1
).round(3)
stop_headway_df = stop_headway_df.sort_values("stop_quality").drop_duplicates(stop_id,keep="last")
stop_headway_df = stop_headway_df.sort_values(stop_id).reset_index(drop=True)
stop_headway_df.to_file(os.path.join(city_results_path,"stops.gpkg"))
stop_headway_df

In [None]:
m=None
if show_maps:
    m = plot_helpers.general_map(
        aoi=aoi,
        pois=stop_headway_df,
        poi_cmap="Blues",
        poi_column="stop_quality",
    )
    m.save(city_results_path+"/stop_quality_map.html")
m

## 4 Street graph

### 4.1 Regionwise file and cropping

- Download best regionwise pbf file. (Covers a large area)

- Crop it to cover our area of interest and save it in .osm format

In [None]:
osm_xml_file = os.path.normpath(city_results_path+f"/streets.osm")
streets_graph_path = os.path.normpath(city_results_path+f"/streets.graphml")
streets_path = os.path.normpath(city_results_path+f"/streets.gpkg")
accessibility_streets_path = os.path.normpath(city_results_path+f"/accessibility_streets.gpkg")
population_results_path = os.path.normpath(city_results_path+f"/population.gpkg")
population_csv_results_path = os.path.normpath(city_results_path+f"/population.csv")

#### OSMIUM

To download the street network needed for the study online, the **osmium** tool is used.  
It is only available for **Linux** and **Mac** (it works in Google Colab too).  

To install, you can either:  

- Visit [osmium-tool website](https://osmcode.org/osmium-tool/)  
- Or run the command:  
```bash
  sudo apt-get install -y osmium-tool
````

Make sure it is added to your `PATH`.

On **Windows**, you can use **conda-forge** to install it.

```bash
  conda install -c conda-forge osmium-tool
````
---

To avoid using **osmium**, you can manually download the data:

1. Go to [OpenStreetMap Export](https://www.openstreetmap.org/export#map=14/40.23633/-3.76084)
2. Select the bounding box containing your area of interest.
3. Click **Export**.
4. Copy the `.osm` file that is downloaded to your project folder.
5. Set the variable `osm_xml_file` to the path where the `.osm` file is located.

In [None]:
# Run only if osmium is installed 

# Select what type of street network you want to load
network_filter = osm.osmium_network_filter("walk+bike+primary")
# Download the region pbf file crop it by aoi and convert to osm format
osm.geofabrik_to_osm(
    osm_xml_file,
    input_file=results_path,
    aoi=aoi_download,
    osmium_filter_args=network_filter,
    overwrite=False
)

In [None]:
# Manual download 
# osm_xml_file = "path/to/file.osm"

### 4.2 Load to osmnx

This way the street network is a networkx graph

In [None]:
# Load
G = ox.graph_from_xml(osm_xml_file)
# Project geometry coordinates to UTM system to allow euclidean meassurements in meters (sorry americans)
G = ox.project_graph(G,to_crs=aoi.estimate_utm_crs())
# Save the graph in graphml format to avoid the slow loading process
ox.save_graphml(G,streets_graph_path)

### 4.3 Simplify graph

Edges with length smaler than X meters are deleted and its nodes merged

In [None]:
G = graph_processing.simplify_graph(G,min_edge_length=min_edge_length,min_edge_separation=min_edge_length*2,undirected=True)
# Save the result in graphml format
ox.save_graphml(G,streets_graph_path)

street_edges = ox.graph_to_gdfs(G,nodes=False)
street_edges = street_edges.to_crs(aoi.crs)
street_edges.to_file(streets_path)

### 4.4 Add Points of interest to graph

In [None]:
G, osmids = graph_processing.add_points_to_graph(
    stop_headway_df,
    G,
    max_dist=100+min_edge_length, # Maximum distance from point to graph edge to project the point
    min_edge_length=min_edge_length # Minimum edge length after adding the new nodes
)
stop_headway_df['osmid'] = osmids # Add the ids of the nodes in the graph to points

## 5 Compute isochrones

This is the discretized matrix relating stop quality, distance and accessibility

In [None]:
distance_matrix = (
    stop_headway_df
    .apply(
        lambda row: [
            {
                "headway_grid": row["headway_grid"],
                "mode": row["mode"],
                "speed_grid": row["speed_grid"],
                "distance_grid": d,
                "stop_quality_grid": row["stop_quality_grid"],
                "quality_grid": access_quality(
                    row["headway_grid"],
                    row["mode"],
                    row["speed_grid"],
                    d
                )
            }
            for d in distance_grid
        ],
        axis=1
    )
)
distance_matrix = (
    distance_matrix
    .explode()
    .apply(pd.Series)
    .drop_duplicates(['stop_quality_grid','distance_grid','quality_grid'])
    .reset_index(drop=True)
)
quality_grid = np.linspace(0,1,n_accessibility_scores+1)
idx = np.searchsorted(
    quality_grid,
    distance_matrix['quality_grid'],
    side="left"
)
idx = np.clip(idx, 0, len(quality_grid) - 1)
distance_matrix['quality_grid'] = quality_grid[idx]
distance_matrix['quality_grid'] = distance_matrix['quality_grid'].round(3)
distance_matrix = distance_matrix.drop_duplicates(['stop_quality_grid','distance_grid','quality_grid'])
distance_matrix = distance_matrix.reset_index(drop=True)
distance_matrix = (
    distance_matrix
    .pivot(
        index="stop_quality_grid",
        columns="distance_grid",
        values="quality_grid"
    )
    .reset_index()
)
distance_matrix

In [None]:
accessibility_graph = isochrones.graph(
    G,
    stop_headway_df,
    distance_matrix,
    poi_quality_col = 'stop_quality_grid', # If all points have the same quality this could be None
    min_edge_length = min_edge_length # Do not add new nodes if there will be an edge with less than this length
)
# Save edges as gpkg
accessibility_nodes, accessibility_edges = ox.graph_to_gdfs(accessibility_graph)
accessibility_edges.to_file(accessibility_streets_path)

#### Lets visualize the results on a map

### 5.2 Convert to H3

In [None]:
access_h3_df = h3_utils.from_gdf(
    accessibility_edges,
    resolution=h3_resolution,
    columns=['accessibility'],
    contain="overlap",
    method="max",
    buffer=10
)

access_h3_df.to_csv(city_results_path+"/accessibility_h3.csv")
# The geodataframe takes much more space than csv as it converts h3 to polygons
# If you do not need it comment it out
access_h3_df = h3_utils.to_gdf(access_h3_df).to_crs(aoi.crs)
access_h3_df.to_file(city_results_path+"/accessibility_h3.gpkg")
access_h3_df

In [None]:
m=None
if show_maps:
    m = plot_helpers.general_map(
        aoi=aoi,
        pois=stop_headway_df,
        gdfs=[access_h3_df,accessibility_edges],
        cmap="RdYlGn",
        column="accessibility",
        poi_cmap="Blues",
        poi_column="stop_quality"
    )
    m.save(city_results_path+"/access_map.html")
m

## 6 Population

### 6.1 Download Worldpop tif file

- One file for every country
- 100m pixel size
- tif format
- available from 2000 to 2030
- gender and age

In [None]:
population_file = population.download_worldpop_population(
    aoi_download,
    2025,
    folder=results_path,
    resolution="100m",
)

In [None]:
pop_h3_df = h3_utils.from_raster(population_file,aoi=aoi_download,resolution=h3_resolution)
pop_h3_df = pop_h3_df.rename(columns={'value':'population'})

### 6.2 Assign level of service to each population cell

In [None]:
results_h3_df = access_h3_df.merge(pop_h3_df,left_index=True,right_index=True,how='outer')
results_h3_df.to_csv(population_csv_results_path)
# The geodataframe takes much more space than csv as it converts h3 to polygons
# If you do not need it comment it out
results_h3_df = h3_utils.to_gdf(results_h3_df).to_crs(aoi.crs)
results_h3_df = results_h3_df[results_h3_df.intersects(aoi.union_all())]
results_h3_df.to_file(population_results_path)
results_h3_df

In [None]:
m=None
if show_maps:
    pop_gdf_points = results_h3_df.copy()
    pop_gdf_points.geometry = pop_gdf_points.geometry.centroid
    pop_gdf_points = pop_gdf_points.dropna(subset=['population'])
    pop_gdf_points = pop_gdf_points[pop_gdf_points['population'] > 1]
    m = plot_helpers.general_map(
        aoi=aoi,
        pois=stop_headway_df,
        gdfs=pop_gdf_points,
        cmap="RdYlGn",
        column="accessibility",
        size_column="population",
        poi_column="stop_quality",
        poi_cmap="Blues",
    )
    m.save(city_results_path+"/population_map.html")
m

### Basic statistics

In [None]:
stats_df = results_h3_df.groupby('accessibility', as_index=False)['population'].sum()
stats_df = stats_df.sort_values("accessibility",ascending=False)
total_population = stats_df['population'].sum()
stats_df = pd.concat([stats_df, pd.DataFrame([{'accessibility': 'total population', 'population': total_population}])], ignore_index=True)
stats_df['population %'] = (stats_df['population'] * 100 / total_population).round(2)
stats_df['population'] = stats_df['population'].round(0).astype(int)
stats_df.to_csv(city_results_path + "/stats.csv")
stats_df

In [None]:
# !zip -r /content/output.zip "{results_path}" # For colab. Export the output folder as zip.

Important files:

- streets.gpkg Has the street geometry as lines (all streets)
- accessibility_streets.gpkg Has the street geometry as lines with the level of service (only streets with level of service)
- population.gpkg Is a grid with population and level of service
- stats.csv Population statistics