## GTFS
In order to compute accessibility based on GTFS data (General Transit Feed Specification, can by downloaded from ftp://199.203.58.18/), we will first perform basic processing of the data.
We are using the pandas library.

The code is based on the following structure of GTFS tables:
![GTFS Tables](./GTFS_tables.PNG)

In [9]:
import pandas as pd
import datetime
import numpy as np

# DATA_PATH = '../data/'
DATA_PATH = '../data/GTFS-24-October-17/'

In [5]:
# OUTPUT_PATH = 'G:/danielle/accessibility_to_trains'
OUTPUT_PATH = 'D:/'

## Load Nodes

In [11]:
# Load nodes
NODES_PATH = 'morning_trips_nodes.pkl'
nodes_df = pd.read_pickle(NODES_PATH)
nodes_df.head(3)

Unnamed: 0,trip_id,arrival_time,departure_time_stop,stop_id,stop_sequence,route_id,departure_time_trip_departure,stop_code,stop_lat,stop_lon,arrival,departure
0,29454675_291017,08:57:00,08:57:00,37312,1,20950,08:57:00,17084,31.242886,34.798546,2017-10-29 08:57:00,2017-10-29 08:57:00
9,29454016_291017,08:08:00,08:08:00,37312,23,20985,05:38:00,17084,31.242886,34.798546,2017-10-29 08:08:00,2017-10-29 08:08:00
22,29453854_291017,08:19:00,08:19:00,37312,1,16488,08:19:00,17084,31.242886,34.798546,2017-10-29 08:19:00,2017-10-29 08:19:00


## Compute Direct Edges

In [12]:
# First we verify that the stops in each trip are consecutive

# tmp_df = nodes_df
tmp_df = nodes_df

def verify_consective(l):
    if sorted(list(l)) != list(range(min(l), max(l)+1)):
        print('Found non-consecutive stop sequence')
        raise Exception('NON CONSECUTIVE STOP SEQUENCE')

In [13]:
tmp_df[['trip_id', 'stop_sequence']].groupby('trip_id').apply(lambda x: verify_consective(x.stop_sequence))

No exception -> all stops are consecutive, we can resume

In [14]:
def create_direct_edges_for_trip(x):
    for index, node in x.iterrows():
        stop_seq = node['stop_sequence']
        # For the same trip we want to take the next node
        next_node = x[x['stop_sequence'] == stop_seq + 1]
        if next_node.shape[0] == 0:
            # This is the last node of the current trip, no outgoing edge
            continue
        assert next_node.shape[0] == 1
        next_node_index = next_node.index[0]

        w = ((next_node['departure'] - node['arrival']) / np.timedelta64(1, 's')).values[0]    
        direct_edges.add((index, next_node_index, w))

<span style="background-color: #F1D9F9">TODO: make this multiprocessed: group by trip_id, and then split the data to batches of groups (trips). For each group we will apply the 'create direct edges' using a pool. We need to make sure that directed_edges is thread/process-safe.</span>

In [15]:
# For each trip (grouping by trip), create edges for this trip's consecutive nodes.
from tqdm.auto import tqdm
tqdm.pandas()

direct_edges = set()
tmp_df[['trip_id', 'stop_sequence', 'arrival', 'departure']].groupby('trip_id').progress_apply(create_direct_edges_for_trip)

HBox(children=(IntProgress(value=0, max=10332), HTML(value='')))




In [16]:
len(direct_edges)

194359

In [17]:
import pickle

with open('direct_morning_edges_all_israel.pkl', 'wb') as f:
    pickle.dump(direct_edges, f)

In [18]:
nodes_df.shape

(204691, 12)