In [None]:
import requests
import numpy as np
import pandas as pd
from geopy.distance import geodesic

# Calculating the shortest walking distance between a list of possible destinations and one starting point. 

Made to scale for larger data sets with many starting points and the same list of destinations using a local instance of OpenStreetMaps.

## Read Me

**To install a local instance of OSRM maps, used to calculate distances. You need to make sure to install Docker, open it and create a local host, download a local map, e.g., from https://extract.bbbike.org/ in the pbf format. Run following commands in the terminal (MacOS)**

**1. Navigate to the directory containing your OSM data if you're not already there:**

cd /path/to/your/osm/data

**2. Extract the OSM data for routing (replace 'denmark-latest.osm.pbf' with your file if different):**

docker run -t -v ${PWD}:/data osrm/osrm-backend osrm-extract -p /opt/foot.lua /data/denmark-latest.osm.pbf

**2. Prepare the data for faster routing:**

docker run -t -v ${PWD}:/data osrm/osrm-backend osrm-partition /data/denmark-latest.osrm

docker run -t -v ${PWD}:/data osrm/osrm-backend osrm-customize /data/denmark-latest.osrm

**3. Start the OSRM server:**

docker run -t -i -p 5001:5000 -v ${PWD}:/data osrm/osrm-backend osrm-routed --algorithm mld /data/denmark-latest.osrm


## Metro Station Example
Say you have a data frame with x,y- coordinates of the metro stations (metro_df) and another data frame with many houses (df). You want to calculate the closest metro station by walking distance. Start by defining the following functions.

In [None]:
def get_walking_distance_osrm(start_lat, start_lon, end_lat, end_lon):
    profile = 'foot'
    url = f"http://localhost:5001/route/v1/{profile}/{start_lon},{start_lat};{end_lon},{end_lat}?overview=false"
    
    response = requests.get(url)
    
    if response.status_code == 200:
        data = response.json()
        distance = data['routes'][0]['distance']
        return distance
    else:
        print("Failed to get distance")
        return np.inf  # Return infinity to indicate failure

def find_closest_station_walk_osrm(row, metro_df):
    closest_distance = np.inf
    closest_station_name = None
    
    start_point = (row['wgs84_lat'], row['wgs84_lon'])

    for i, station_row in metro_df.iterrows():
        station_point = (station_row['wgs84_lat'], station_row['wgs84_lon'])
        distance = get_walking_distance_osrm(start_point[0], start_point[1], station_point[0], station_point[1])
        
        if distance < closest_distance:
            closest_distance = distance
            closest_station_name = station_row['Metro Station Name']
    
    row['closest_station_walk'] = closest_station_name
    row['distance_to_closest_station_walk'] = closest_distance
    
    return row

You can apply the latter function with _(df.find_closest_station_walk_osrm(metro_df)_ and simply wait. If you can exclude some metro stations by zip code or another variable, insert a condition before looping through all stations, this can considerably decrease run-time. 