# An `r5py` Example

## Introduction

This Jupyter Notebook demonstrates the use of the `r5py` library for transport network analysis and travel time estimation.

It explores:
- Creation transport networks,
- Computing travel time matrices, and
- Visualising results using `folium` and `geopandas`.

> **Notes**, this notebook:
> - assumes an understanding of GTFS and OpenStreetMap data formats, and that you have set up the repository as described in the README.
> - only covers an initial demonstration of the `r5py` library. For more detailed information, please refer to the official documentation: https://r5.readthedocs.io/en/latest/.

## Overview

This Notebook uses the example of travel times to supermarkets across the
Newport, Wales area.

The the reqiured data is stored within the `data` directory:
- `destinations/supermarkets.parquet` contains the locations of some supermarkets in the Newport area.
- `gtfs/rail.zip` and `gtfs/other_modalities.zip` contains GTFS data for the Newport area.
- `origins/pwcs.parquet` contains the locations of origin points in the Newport area, the 2021 Population Weighted Centroids within the Newport Local Authority.
- `osm/newport.osm.pbf` contains OpenStreetMap data for the Newport area.

For more information on the data sources, please refer to the `data/README`.

The notebook demonstrates the following steps:
1. Configure the analysis.
2. Load and prepare the origins and destinations data.
3. Create a transport network using the GTFS and OSM data.
4. Compute travel time matrices.
5. Visualize the results.



## 1. Configure Analysis

Import required modules for this example notebook:

In [None]:
import datetime
import folium
import geopandas as gpd
import pandas as pd
import warnings

from pathlib import Path
from r5py import TransportMode, TransportNetwork, TravelTimeMatrixComputer

Set the analysis variables:

> **Note:** Using `str(Path(...))` below allows the code to run on both Windows and Unix systems.

In [None]:
# input data files
origins = str(Path("data/origins/pwcs.parquet"))
destinations = str(Path("data/destinations/supermarkets.parquet"))
osm = str(Path("data/osm/newport.osm.pbf"))
gtfs = [
    str(Path("data/gtfs/rail.zip")),
    str(Path("data/gtfs/other_modalities.zip")),
]

# travel time configuration
departure = datetime.datetime(
    2024, 3, 5, 8, 0, 0
)  # DO NOT CHANGE `departure` - GTFS has been prefiltered for this date
transport_modes = [TransportMode.TRANSIT]
travel_time_window = 60  # in minutes
max_travel_time = 45  # in minutes
speed_walking = 4.5  # in km/h
speed_cycling = 15.0  # in km/h

Where:

| Variable | Description |
| --- | --- |
| `origin` | The origin locations data, in this case the population weighted centroids `GeoDataFrame` for the Newport area. |
| `destination` | The destination locations data, in this case the supermarket locations `GeoDataFrame` for the Newport area. |
| `osm` | The OpenStreetMap data (in `.pbf` format) for the Newport area. |
| `gtfs` | The public transit data (in `.zip` format) for the Newport area. **Note:** if working with modalities which aren't public transit, then GTFS data isn't needed. It can also be multiple GTFS files if needed, r5py combines them all together at runtime. |
| `departure` | The departure date and time. This is a `datetime` object, and is currently set to 0800 on 5th March 2024 (do not change this, the GTFS has been pre-filtered to this date). |
| `transport_modes` | A list of transport modes to include in the analysis. Examples include `TransportMode.WALK`, `TransportMode.CAR`, `TransportMode.BICYCLE`, and `TransportMode.TRANSIT`. You can read more about these in the r5py documentation: https://r5py.readthedocs.io/en/stable/reference/reference.html#r5py.TransportMode. **Note:** this can be a list of modes too, to explore multi-modal analyses, for example. |
| `travel_time_window` | The duration over which the median travel time will be calculated. |
| `max_travel_time` | The maximum travel time to include in the analysis. This is used to filter out trips that are too long. |
| `speed_walking` | The average walking speed, in km/h. |
| `speed_cycling` | The average cycling speed, in km/h. |


> **Note:** lots more r5py configuration options are available, for example; percentiles other than the median travel time, the maximum number of public transit 'legs', and bicycle stress. See the the r5py reference documentation for more information: https://r5py.readthedocs.io/en/stable/reference/reference.html#reference

## 2. Load the Origin and Destination Data

`r5py` expects 2 things of the origin and destination data:
 1. It has a unique `id` column to identify each location.
 2. The coordinate reference system (CRS) is EPSG:4326 (lat/lon).

The population weighted centroids (PWCs) are in EPSG:27700, and are identified by the `OA21CD` column, so this needs to be preprocessed. The supermarkets are already in EPSG:4326, and are identified by the `id` column - so no preprocessing is needed there.

In [None]:
# read and preprocess origins
origins_gdf = gpd.read_parquet(origins)
origins_gdf = origins_gdf.rename(columns={"OA21CD": "id"})
origins_gdf = origins_gdf.to_crs("EPSG:4326")

# read destinations (no preprocessing required)
destinations_gdf = gpd.read_parquet(destinations)

# display number of origins and destinations
print(f"Number of origins: {len(origins_gdf)}")
print(f"Number of destinations: {len(destinations_gdf)}")

Lets take a look at the data on a map, using `geopandas.explore()`:

In [None]:
origins_gdf.explore(tiles="cartodbpositron")

In [None]:
destinations_gdf.explore(tiles="cartodbpositron", marker_type="marker")

## 3. Make the Transport Network

The transport network can be made from the OpenStreetMap data and any GTFS data. It uses the `r5py.TransportNetwork` class to construct everything needed for travel time estimation. We just need to point the variables to the filenames, as shown below:

> Note: You may see the following output when running this cell:
> ```
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by org.mapdb.Volume$ByteBufferVol (file:/.cache/r5py/r5-v6.9-all.jar) to method java.nio.DirectByteBuffer.cleaner()
> WARNING: Please consider reporting this to the maintainers of org.mapdb.Volume$ByteBufferVol
> WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> ```
> This is a known issue with the underlying `r5` library, and can be safely ignored (see the documentation for more details).


In [None]:
transport_network = TransportNetwork(osm, gtfs)

## 4. Calculate the Travel Times

Now the network is ready, we can calculate the travel times!

We can do this by first building a `r5py.TravelTimeMatrixComputer` object. This is the object that takes our analysis configuration, and can be built as shown below:

In [None]:
# build the travel time matrix computer
tt_computer = TravelTimeMatrixComputer(
    transport_network,
    origins=origins_gdf,
    destinations=destinations_gdf,
    departure=departure,
    transport_modes=transport_modes,
    max_time=datetime.timedelta(minutes=max_travel_time),
    departure_time_window=datetime.timedelta(minutes=travel_time_window),
    speed_walking=speed_walking,
    speed_cycling=speed_cycling,
)

> **Note:** the `max_time` and `departure_time` window arguments need to be `datetime.timedelta` objects.

As described above, lots more configuration options are available, for example; percentiles other than the median travel time, the maximum number of public transit 'legs', and bicycle stress. See the the r5py reference documentation for more information: https://r5py.readthedocs.io/en/stable/reference/reference.html#reference.

Once the computer is available, we can then call `.compute_travel_times()` to estimate the travel time between all origin and destination pairs. This will return a `pandas.DataFrame` with the median travel times, by default this will be the `travel_time` column, in minutes.

In [None]:
with warnings.catch_warnings():
    warnings.simplefilter("ignore", category=FutureWarning)
    # compute the travel times
    tt_df = tt_computer.compute_travel_times()

> **Note:** The cell above uses a warning filter to suppress warnings that are thrown by `pandas` when using `r5py` v0.1.0 (which this example uses). It is a futures warning from pandas that can be safely ignored (it's resolved in newer versions of `r5py`). If you are using a newer version of `r5py`, you can remove the warning filter. The filter here just stops the output from being cluttered.

Now we can take a look at the travel time output! Lets look at the first few rows of the travel time matrix:

In [None]:
tt_df.head()

> **Note:** and `NaN` travel time values mean that the median travel time between that origin and destination pair is greater than the `max_travel_time` value.

We can also explore the whole travel time matrix:

In [None]:
tt_df.describe(include="all")

## 5. Visualise the Results

Given this is a geospatial problem, it's often much easier to see things on a map. Below is a helper function that will do that for us. Run the cell below to define the function:

In [None]:
def plot_tts_to_destination(
    dest_id: int,
    travel_times: pd.DataFrame,
    origins: gpd.GeoDataFrame,
    destinations: gpd.GeoDataFrame,
) -> folium.Map:
    """Helper function to visualise the travel times.

    Use the hover tooptip for more information. Origins are color coded by the
    median travel time to the destination requested. The destination is marked
    with a red flag. Grey origins indicate that there are no travel times to
    the destination (destination is too remote for the modality). These will
    display the travel time as `null` in the tooltip.

    Parameters
    ----------
    dest_id : int
        Destination ID.
    travel_times : pd.DataFrame
        Travel times output.
    origins : gpd.GeoDataFrame
        Origins geodataframe.
    destinations : gpd.GeoDataFrame
        Destinations geodataframe.

    Returns
    -------
    folium.Map
        Map of travel times to destination requested.

    Raises
    ------
    ValueError
        When:
        - the destination does not exist.
        - there are no travel times to the destination (destination is too
        remote for the modality).
    """
    # raise an error if the destination ID is not in the travel times
    if dest_id not in tt_df["to_id"].unique():
        raise ValueError(f"Destination ID {dest_id} not found")

    # filter the travel times to the destination
    plot_df = travel_times[travel_times["to_id"] == dest_id].copy()

    # raise an error if there are no travel times to the destination
    # handles case where the destination is unreachable
    if all(plot_df["travel_time"].isna()):
        raise ValueError(f"No travel times to destination ID {dest_id}")

    # join on the geometry data
    plot_gdf = origins.merge(
        plot_df,
        left_on="id",
        right_on="from_id",
    ).drop(columns=["id"])

    # visualise the travel times
    m = plot_gdf.explore(
        "travel_time",
        tiles="CartoDB positron",
        legend_kwds={"caption": "Median travel time (minutes)"},
    )

    # add the destination marker
    destination = destinations[destinations["id"] == dest_id][["geometry", "id"]]
    m = destination.explore(
        marker_type="marker",
        marker_kwds={
            "icon": folium.Icon(
                color="red",
                prefix="fa",
                icon="flag-checkered",
            )
        },
        m=m,
    )

    return m

Now we can use that function to visualise the travel times from all the PWCs to a supermarket of interest. Change `supermarket_id` to the `id` of the supermarket you are interested in, and run the cell below:

> Note: there are only 22 supermarkets in this data, so only IDs 0-21 are valid.

In [None]:
# change the supermarket id and see the travel times to that destination
supermarket_id = 2
plot_tts_to_destination(supermarket_id, tt_df, origins_gdf, destinations_gdf)

## 6. Conclusion

This notebook demonstrated the use of the `r5py` library to analyse transport networks and estimate travel times to supermarkets across the Newport, Wales area. It covered the steps from configuring the analysis, loading and preparing data, creating a transport network, computing travel time matrices, and visualizing the results. For more information on the `r5py` library, please refer to the [documentation](https://r5py.readthedocs.io/en/stable/).

## Further Experimentation

This example looked at estimating travel times to some supermarkets in the Newport area using public transit. What about:
- Other modes of transport, such as walking, cycling, or car? Maybe even multiple modes?
- How does this vary at different times of day?
- What about other percentiles of travel time (rather than median)? Maybe the 10th or 90th percentile are more useful when answer specific research questions?