# What Is This?
Your friend and you want to meet at a location that's "optimal", i.e. minimises both of your effort in travelling there (see below for precise definitions). This program calculates candidate locations and routes to get there using the Google Maps API. 

**Note:** To Run the script yourself, you will need to create your own [account](https://developers.google.com/maps/documentation/distance-matrix) and generate an [API key](https://developers.google.com/maps/documentation/distance-matrix/get-api-key). Also keep in mind that you cannot send arbitrarily many free requests, see [pricing info](https://developers.google.com/maps/billing-and-pricing/overview) for details.

## Naive Approach
Take a list of cities, select those that are "between" my friend and me (see "bounding box" below), sort by population and select the most relevant, calculate train travel times from both of us to each city, select the city where the **one of us arriving latest arrives as early as possible** (metric 1). 

## Obvious Limitations & Ideas For Improvement
- distance matrix API function seems to have an upper limit for locations as input => we need to pre-filter locations to likely 
- stations or cities? 
  - if cities instead of stations used, this will cause inefficiency because the "middle" from the city dataset might be far away from the most easily accessible station in that city) 
  - stations have the disadvantage that it's harder to filter by population / connectedness => we should merge the two datasets below (e.g. by assigning each station to the city it's in)
- what do we mean by "minimising travelling effort"? 
  - distance matrix API seems to provide durations only! This means we're **not actually** minimising latest arrival in (1) but longer travel time => current fix is to generate top 5 solutions and compare arrival times  

In [1]:
import os 
import gmaps, googlemaps
import pandas as pd
import numpy as np
from datetime import datetime

In [2]:
client = googlemaps.Client(key=os.environ["GOOGLE_DISTMAT_APIKEY"])
gmaps.configure(api_key=os.environ["GOOGLE_DISTMAT_APIKEY"])

In [3]:
def coords(place_name):  # don't run this too often — free queries are limited
    geocoded = client.geocode(place_name)
    if len(geocoded) > 1:
        raise RuntimeError(f"Multiple places found for query '{place_name}': " + str(geocoded))
    if len(geocoded) == 0: 
        raise RunetimeError(f"No places found for query '{place_name}'!")
    return list(geocoded[0]["geometry"]["location"].values())

In [4]:
def bounding_box(locations, padding=0.01):  # expects list of coordinate tuples
    x = [l[0] for l in locations]; y = [l[1] for l in locations]
    return [(min(x)-padding, min(y)-padding), [max(x)+padding, min(y)-padding], 
            (max(x)+padding, max(y)+padding), [min(x)-padding, max(y)+padding]]

def in_box(loc, box):  # expects loc = [x,y], box = [bot left, bot right, top right, top left]
    return box[0][0] <= loc[0] <= box[2][0] and box[0][1] <= loc[1] <= box[2][1]

In [6]:
me = "Bonn Hbf"; friend = "Göttingen Hbf"
loc1 = coords(me); loc2 = coords(friend)

In [393]:
time_start = datetime(2023,11,25,9,0,0)
time_start.isoformat()

'2023-11-25T09:00:00'

In [262]:
box = bounding_box([loc1,loc2], 0.25)

European stations dataset from [Kaggle](https://www.kaggle.com/datasets/headsortails/train-stations-in-europe/)

In [263]:
all_stations = pd.read_csv("train_stations_europe.csv")
relevant_stations = all_stations[all_stations.apply(lambda x: in_box([x["latitude"],x["longitude"]], box),axis=1)]
station_locs = relevant_stations[["latitude","longitude"]].to_numpy()

World cities with more than 1000 inhabitants from [opendatasoft.com](https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/export/?disjunctive.cou_name_en&sort=name)

In [349]:
all_cities = pd.read_csv("all_cities.csv",sep=";")
relevant_cities = all_cities[(all_cities["Country Code"] == "DE") | (all_cities["Country Code"] == "NL")]
relevant_cities[["lat","lon"]] = relevant_cities["Coordinates"].str.split(", ", expand=True).astype(float)
relevant_cities = relevant_cities[relevant_cities.apply(lambda x: in_box([x["lat"],x["lon"]],box),axis=1)]
relevant_cities = relevant_cities.sort_values(by="Population", ascending=False)  # TODO: don't hardcode  

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  relevant_cities[["lat","lon"]] = relevant_cities["Coordinates"].str.split(", ", expand=True).astype(float)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  relevant_cities[["lat","lon"]] = relevant_cities["Coordinates"].str.split(", ", expand=True).astype(float)


In [350]:
relevant_cities = relevant_cities[:250]  # select top k by population 
city_locs = relevant_cities[["lat","lon"]].to_numpy()
relevant_cities[["Name","Population","lat","lon"]]

Unnamed: 0,Name,Population,lat,lon
2796,Köln,963395,50.93333,6.95000
132330,Essen,593085,51.45657,7.01228
36473,Dortmund,588462,51.51494,7.46600
4314,Bochum,385729,51.48165,7.21648
118853,Bochum-Hordel,380000,51.50168,7.17560
...,...,...,...,...
120007,Rath,11000,50.92379,7.09270
80609,Anröchte,10847,51.56667,8.33333
9267,Ruppichteroth,10749,50.84367,7.48409
130981,Reiskirchen,10734,50.60000,8.83333


Visualisation of the considered cities:

In [351]:
fig = gmaps.figure()
fig.add_layer(gmaps.marker_layer([loc1,loc2]))
#fig.add_layer(gmaps.symbol_layer(station_locs[:],fill_color='blue'))
fig.add_layer(gmaps.symbol_layer(city_locs[:],fill_color='blue'))
fig.add_layer(gmaps.drawing_layer(features=[
    gmaps.Polygon(box, stroke_color="blue", fill_color='blue')]))
fig

Figure(layout=FigureLayout(height='420px'))

Get distances from friend's and my location to every relevant city (**warning:** can cause many API calls, potentially limit city_locs before) 

In [394]:
chunk = 25  # seems to be maximum allowed (API error if larger)
dist_mat_list = [client.distance_matrix([loc1,loc2], locations, departure_time=time_start, mode="transit") 
                 for locations in np.array_split(city_locs, np.ceil(len(city_locs) / chunk))]

In [395]:
columns = np.concatenate([dist_mat["destination_addresses"] for dist_mat in dist_mat_list])
time_mat = np.concatenate([np.array([[route["duration"]["value"] if route["status"] == "OK" else np.inf
      for route in row["elements"]] for row in dist_mat["rows"]]) for dist_mat in dist_mat_list], axis=1)
time_texts = np.concatenate([np.array([[route["duration"]["text"] if route["status"] == "OK" else "impossible"
      for route in row["elements"]] for row in dist_mat["rows"]]) for dist_mat in dist_mat_list], axis=1)

Calculate best location using metric (1) from above (find 'num_res' best options):

In [396]:
num_res = 5
idx_k_best = np.argpartition(np.max(time_mat,axis=0), num_res)[:num_res]  
addrs_best = columns[idx_k_best]

In [397]:
time_texts[:,idx_k_best].T

array([['2 hours 52 mins', '2 hours 48 mins'],
       ['2 hours 55 mins', '2 hours 37 mins'],
       ['2 hours 38 mins', '2 hours 52 mins'],
       ['2 hours 57 mins', '2 hours 45 mins'],
       ['2 hours 44 mins', '2 hours 37 mins']], dtype='<U16')

In [398]:
locs_best = [coords(addr) for addr in addrs_best]
my_routes = [client.directions(me, addr, departure_time=time_start, mode="transit") for addr in addrs_best]
friends_routes = [client.directions(friend, addr, departure_time=time_start, mode="transit") for addr in addrs_best]

**Sanity check:** distmat optimal duration should equal directions duration for the same time/location. As we can see, this is not the case. Unfortunately, it's hard to tell where the difference comes from without knowing what exactly the API does internally. 

In [401]:
my_routes[0][0]

{'bounds': {'northeast': {'lat': 51.5006281, 'lng': 8.2833122},
  'southwest': {'lat': 50.732008, 'lng': 6.9088053}},
 'copyrights': 'Map data ©2023 GeoBasis-DE/BKG (©2009)',
 'legs': [{'arrival_time': {'text': '12:23\u202fPM',
    'time_zone': 'Europe/Berlin',
    'value': 1700911415},
   'departure_time': {'text': '9:31\u202fAM',
    'time_zone': 'Europe/Berlin',
    'value': 1700901080},
   'distance': {'text': '183 km', 'value': 182752},
   'duration': {'text': '2 hours 52 mins', 'value': 10335},
   'end_address': 'Karolingerstraße 9A, 59872 Meschede, Germany',
   'end_location': {'lat': 51.3501468, 'lng': 8.2833122},
   'start_address': 'Bonn Hauptbahnhof, Am Hauptbahnhof 1, 53111 Bonn, Germany',
   'start_location': {'lat': 50.7326926, 'lng': 7.0961231},
   'steps': [{'distance': {'text': '0.1 km', 'value': 112},
     'duration': {'text': '2 mins', 'value': 100},
     'end_location': {'lat': 50.732008, 'lng': 7.097128},
     'html_instructions': 'Walk to Bonn Hbf',
     'polyline

In [410]:
durations = np.array([[my_routes[i][0]["legs"][0]["duration"]["text"], 
           friends_routes[i][0]["legs"][0]["duration"]["text"], 
           my_routes[i][0]["legs"][-1]["arrival_time"]["text"].encode("ascii","ignore").decode(),
           friends_routes[i][0]["legs"][-1]["arrival_time"]["text"].encode("ascii","ignore").decode()] 
        for i in range(num_res)])
durations

array([['2 hours 52 mins', '2 hours 33 mins', '12:23PM', '11:48AM'],
       ['2 hours 55 mins', '2 hours 22 mins', '12:26PM', '11:37AM'],
       ['2 hours 39 mins', '2 hours 37 mins', '11:53AM', '12:28PM'],
       ['2 hours 58 mins', '2 hours 35 mins', '12:00PM', '11:44AM'],
       ['2 hours 44 mins', '2 hours 22 mins', '11:59AM', '12:59PM']],
      dtype='<U15')

**Note**: how even when durations are similar, arrival times can differ extremely! (see 'limitations' above). We now choose the best route by earliest arrival time.

In [404]:
arrival_times = np.array([[my_routes[i][0]["legs"][-1]["arrival_time"]["value"], 
                           friends_routes[i][0]["legs"][-1]["arrival_time"]["value"]] for i in range(num_res)])
idx_best = np.argmin(np.max(arrival_times,axis=1))
loc_best = locs_best[idx_best]; addr_best = addrs_best[idx_best]
my_route = my_routes[idx_best]; friends_route = friends_routes[idx_best]
idx_best, addr_best

(3, 'Stiftstraße 8, 59494 Soest, Germany')

In [405]:
def route_steps(route):
    res = []
    for step in route["steps"]:
        if "transit_details" in step:
            d = step["transit_details"]
            line = d["line"]["name"] if "name" in d["line"] else d["line"]["short_name"]
            t1 = d["departure_time"]["text"].encode("ascii","ignore").decode()
            t2 = d["arrival_time"]["text"].encode("ascii","ignore").decode()
            res.append(f'Take the {line} ({d["headsign"]}) at {t1} to {d["arrival_stop"]["name"]}, arrive {t2}')
        else: 
            res.append(step["html_instructions"])
            
    return res

In [413]:
(time_start.isoformat(), route_steps(my_route[0]["legs"][0]), route_steps(friends_route[0]["legs"][0]),list(durations[idx_best]))

('2023-11-25T09:00:00',
 ['Walk to Bonn Hbf',
  'Take the RE5 (Oberhausen Hbf) at 9:04AM to Düsseldorf, arrive 10:01AM',
  'Walk to Düsseldorf',
  'Take the RE13 (Hamm (Westf), Hauptbahnhof) at 10:12AM to Holzwickede, arrive 11:13AM',
  'Walk to Holzwickede',
  'Take the RB59 (Soest, Bahnhof) at 11:24AM to Bahnhof Soest, arrive 11:52AM',
  'Walk to Stiftstraße 8, 59494 Soest, Germany'],
 ['Take the RB85 (Paderborn Hbf) at 9:10AM to Paderborn Hbf, arrive 11:05AM',
  'Walk to Paderborn Hbf',
  'Take the RE11 (Düsseldorf Hbf) at 11:11AM to Bahnhof Soest, arrive 11:36AM',
  'Walk to Stiftstraße 8, 59494 Soest, Germany'],
 ['2 hours 58 mins', '2 hours 35 mins', '12:00PM', '11:44AM'])

In [407]:
fig = gmaps.figure()
fig.add_layer(gmaps.marker_layer([loc1,loc2,loc_best]))
# TODO: plot my_route and friends_route - they might differ from gmaps.directions.Diretions
fig.add_layer(gmaps.directions.Directions(loc1, loc_best, travel_mode='TRANSIT'))
fig.add_layer(gmaps.directions.Directions(loc2, loc_best, travel_mode='TRANSIT'))
fig

Figure(layout=FigureLayout(height='420px'))