---------
# Find Property Distance to Train Station
1. Find the closest station to every property in the coordinate system
3. Use the ORS API to find the route distance between property and its closest station
4. Store the data

Assumptions :
1. Assume that the closest satation in geographic coordinate system is the closest station in route distance
    
    This is because the API request is limitated, the algorithm to find the closest satation based on actual route distance seems not feasiable with more than 15 thousands property.

Limitations:    
1. Due to the assumptions, incrrect distance of some property that geographically close to station A, but actual route distance close to sation B is inevitable.
    
    However, this case some times happends for property that located in the middle of two stations. Therefore the error is acceptable.

#### Please your own ORS key

In [None]:
import pandas as pd
import geopandas as gdp
import re
import openrouteservice as ors
import folium
import time
client = ors.Client(key='your key')

--------
### Find the Geometrically Closest Station
Load the data

In [None]:
train_station_gdp = gdp.read_file("../data/raw/train_station/ll_gda2020/esrishape/whole_of_dataset/victoria/TRANSPORT/VIC_RAILWAY_STATIONS.shp")
train_station_gdp

Get the coordinate of train station from geometry Point

In [None]:
train_station_gdp['coord'] = [[train_station_gdp['geometry'].y[i], train_station_gdp['geometry'].x[i]] for i in range(0, len(train_station_gdp['geometry'].x))]

A function to find the distance between two point based on the geographical coordinates

In [None]:
from geopy.distance import geodesic
def distance(p1, p2):
    return geodesic(tuple(p1), tuple(p2)).m

Load property data

In [None]:
properties = pd.read_json("../data/raw/property_raw.json")
properties = properties.transpose()
properties = properties.reset_index(drop=True)
properties

Find the gergraphically closest station for each property

In [None]:
from collections import defaultdict
station_properties = defaultdict(list)
cls_stations = []
j = 0
for coor in properties["coordinates"]:
    cls_station_id = -1
    min_dist = 10000000000000000
    i = 0
    for station in train_station_gdp['coord']:
        if distance(coor, station) <= min_dist:
            min_dist = distance(coor, station)
            cls_station_id = i
        i += 1
    cls_stations.append(cls_station_id)
    station_properties[cls_station_id].append(j)
    j += 1
    
    

In [None]:
properties["closest station"] = pd.Series(data = cls_stations)
properties

---------
### Use ORS to Find Route Distance
Matrix Call

Input a array of coordinates (station and its closest propertys).

Return a matrix, contians all closest distance from propertys to the station.

In [None]:
# This block of code finds out the distance of the properties to Cloest Station
error_station = []
result = {}
for station in station_properties.keys():
    station_coord = list(reversed(train_station_gdp.iloc[station]['coord']))
    prop_coords = [list(reversed(properties.iloc[prop]['coordinates'])) for prop in station_properties[station]]
    coordinates = [station_coord] + prop_coords
    [[float("{:.7f}".format(y))] for [x,y] in coordinates]
    print(f"Start station {station} request request size {len(coordinates)}")
    try:
        matrix = client.distance_matrix(
            locations=coordinates,
            destinations = [0],
            profile='foot-walking',
            metrics=['distance'],
            validate=False,
        )
    except:
        error_station.append(station)
    time.sleep(1.5)
    print(f"end {station}th request")
    curr = [dist[0] for dist in matrix['distances'][1:]]
    result[station] = curr

Due to the internet error, some API call may fail, so redo these fialed API calls

In [None]:
for station in error_station:
    station_coord = list(reversed(train_station_gdp.iloc[station]['coord']))
    prop_coords = [list(reversed(properties.iloc[prop]['coordinates'])) for prop in station_properties[station]]
    coordinates = [station_coord] + prop_coords
    [[float("{:.7f}".format(y))] for [x,y] in coordinates]
    print(f"Start station {station} request request size {len(coordinates)}")
    try:
        matrix = client.distance_matrix(
            locations=coordinates,
            destinations = [0],
            profile='foot-walking',
            metrics=['distance'],
            validate=False,
        )
    except:
        error_station.append(station)
    time.sleep(1.5)
    print(f"end {station}th request")
    curr = [dist[0] for dist in matrix['distances'][1:]]
    result[station] = curr

---------
### Store the data

merge the distance and preoperty dataset

In [None]:
prop_dist = {}
for station in result.keys():
    props = station_properties[station]
    dists = result[station]
    for i in range(0, len(props)):
        prop_dist[props[i]] = dists[i]

In [None]:
lenth = 0
sum([len(x) for x in result.values()])

In [None]:
prop_dist_data = [prop_dist[i] for i in range(0, len(prop_dist))]

In [None]:
properties["proximity to train station"] = pd.Series(data = prop_dist_data)

In [None]:
properties.to_csv("../data/raw/properties_train_proximity.csv")