## GTFS
In order to compute accessibility based on GTFS data (General Transit Feed Specification, can by downloaded from ftp://199.203.58.18/), we will first perform basic processing of the data.
We are using the pandas library.

The code is based on the following structure of GTFS tables:
![GTFS Tables](./GTFS_tables.PNG)

In [1]:
import pandas as pd
import datetime
import numpy as np
import geopy.distance
from tqdm.auto import tqdm
tqdm.pandas()
import dask.dataframe as dd
from dask.multiprocessing import get
from geopy.point import Point

from dask.diagnostics import ProgressBar
ProgressBar().register()

# DATA_PATH = '../data/'
DATA_PATH = '../../input_data/GTFS-28-Oct-19/'
OUTPUT_PATH = '../../output_data/'

## Load Nodes

In [2]:
# Load nodes
NODES_PATH = 'morning_trips_nodes.pkl'
nodes_df = pd.read_pickle(OUTPUT_PATH + NODES_PATH)
nodes_df.head(3)

Unnamed: 0,trip_id,arrival_time,departure_time_stop,stop_id,stop_sequence,route_id,departure_time_trip_departure,stop_code,stop_lat,stop_lon,arrival,departure
18,14824097_271019,08:02:40,08:02:40,2356,2,1606,08:00:00,41476,32.793214,35.038925,2019-11-03 08:02:40,2019-11-03 08:02:40
36,19590744_271019,08:14:17,08:14:17,2356,2,16379,08:10:00,41476,32.793214,35.038925,2019-11-03 08:14:17,2019-11-03 08:14:17
61,24004495_271019,08:09:09,08:09:09,2356,2,4418,08:05:00,41476,32.793214,35.038925,2019-11-03 08:09:09,2019-11-03 08:09:09


## Compute transferable stops

In [3]:
stops_df = pd.read_csv(DATA_PATH + 'stops.txt')
stops_df.head(3)

Unnamed: 0,stop_id,stop_code,stop_name,stop_desc,stop_lat,stop_lon,location_type,parent_station,zone_id
0,1,38831,בי''ס בר לב/בן יהודה,רחוב:בן יהודה 76 עיר: כפר סבא רציף: קומה:,32.183939,34.917812,0,,6900
1,2,38832,הרצל/צומת בילו,רחוב:הרצל עיר: קרית עקרון רציף: קומה:,31.870034,34.819541,0,,469
2,3,38833,הנחשול/הדייגים,רחוב:הנחשול 30 עיר: ראשון לציון רציף: קומה:,31.984553,34.782828,0,,8300


In [4]:
stops_df.nunique()

stop_id           27989
stop_code         27553
stop_name         20556
stop_desc         17768
stop_lat          27231
stop_lon          27069
location_type         2
parent_station       87
zone_id            3895
dtype: int64

In [None]:
stops = stops_df[['stop_id', 'stop_lon', 'stop_lat']]

transferable_stops = {'source': [], 'dest': [], 
#                       'source_point': [], 'dest_point': [], 
                      'distance': []}

def get_stop_transfers(stop):
    current_point = geopy.Point(stop['stop_lon'], stop['stop_lat'])
    current_stop_id = stop['stop_id']
    dest_stops_df = pd.DataFrame(columns=['source', 'dest', 'distance'])
    def get_points_dist(s):
        # Get distance in meters
        p = geopy.Point(stop['stop_lon'], stop['stop_lat'])
        dist = geopy.distance.distance(p, current_point).m
        if dist <= 200:
            transferable_stops['source'].append(current_stop_id)
            transferable_stops['dest'].append(s['stop_id'])
    #         transferable_stops['source_point'].append(current_point)
    #         transferable_stops['dest_point'].append(p)
            transferable_stops['distance'].append(dist)

    stops.apply(lambda s: get_points_dist(s), axis=1)
    
    
d_stops = dd.from_pandas(stops, npartitions=30)
d_stops.map_partitions(lambda df: df.apply(lambda x: get_stop_transfers(x), axis=1)).compute(scheduler='threads') 

transferable_stops_df = pd.DataFrame.from_dict(transferable_stops)
transferable_stops_df.head()

In [56]:
transferable_stops_df.shape

(56010, 3)

In [59]:
transferable_stops_df.max()

source          1.0
dest        42005.0
distance        0.0
dtype: float64

## Let's compute for a test zone (6900, Kfar Saba)

In [5]:
test_stops_df = stops_df[stops_df['zone_id'] == 6900]
test_stops_df.head(3)

Unnamed: 0,stop_id,stop_code,stop_name,stop_desc,stop_lat,stop_lon,location_type,parent_station,zone_id
0,1,38831,בי''ס בר לב/בן יהודה,רחוב:בן יהודה 76 עיר: כפר סבא רציף: קומה:,32.183939,34.917812,0,,6900
613,745,39690,וייצמן/סוקולוב,רחוב:וייצמן 66 עיר: כפר סבא רציף: קומה:,32.176261,34.903576,0,,6900
614,747,39699,וייצמן/נורדאו,רחוב:וייצמן 186 עיר: כפר סבא רציף: קומה:,32.172781,34.918988,0,,6900


In [None]:
stops = test_stops_df[['stop_id', 'stop_lon', 'stop_lat']]

transferable_stops = {'source': [], 'dest': [], 
#                       'source_point': [], 'dest_point': [], 
                      'distance': []}

def get_stop_transfers(stop):
    current_point = Point(stop['stop_lon'], stop['stop_lat'])
    current_stop_id = stop['stop_id']
    dest_stops_df = pd.DataFrame(columns=['source', 'dest', 'distance'])
    def get_points_dist(s):
        # Get distance in meters
        p = geopy.Point(s['stop_lon'], s['stop_lat'])
#         print(type(p))
#         print(type(current_point))
        dist = geopy.distance.distance(p, current_point).m
        if dist <= 200:
            transferable_stops['source'].append(current_stop_id)
            transferable_stops['dest'].append(s['stop_id'])
    #         transferable_stops['source_point'].append(current_point)
    #         transferable_stops['dest_point'].append(p)
            transferable_stops['distance'].append(dist)

    stops.apply(lambda s: get_points_dist(s), axis=1)
    
    
d_stops = dd.from_pandas(stops, npartitions=30)
d_stops.map_partitions(lambda df: df.apply(lambda x: get_stop_transfers(x), axis=1)).compute(scheduler='threads') 

transferable_stops_df = pd.DataFrame.from_dict(transferable_stops)
transferable_stops_df.head()

In [None]:
transferable_stops_df = pd.DataFrame.from_dict(transferable_stops)
transferable_stops_df.head()

In [16]:
# Milson street Haifa
a = Point(32.831492, 34.983018)
b = Point(32.832032, 34.983195)

# TLV example
# 32.078797, 34.781148 to 32.078797, 34.781148 is 0.0
# a = Point(32.076684, 34.781313)
# b = Point(32.075101, 34.78157)

latitude, longitude, altitude = a
print(a)
print(b)
geopy.distance.distance(a, b).m

32 49m 53.3712s N, 34 58m 58.8648s E
32 49m 55.3152s N, 34 58m 59.502s E


62.137565373103996

## Compute Transfer Edges