# SBB Network Generation

In this notebook we will be constructing graph <b>G</b> to model sbb network. For that we will be using first of all <b>current_next</b> dataframe described in data_preprocessing section as well as <b>filtered_stops</b> also descrived in data_preprocessing section.

To denote each stop/station we create <b>StationNode</b> datastructure, that has attributes such as station_id, departures, arrivals and walkable stations. Furthermore we define some helper functions of the StationNode to help us operate with it:

- station_id: id of station that the StatioNode corresponds to
- departures: dictionary of { departure_time: [(next_station_id, next_station_arrival, trip_id), ...], ... }
- arrivals: dictionary of { arrival_time: [(previous_station_id, prev_station_departure_time, trip_id), ...], ... }  
- walkable_stations (stations within 500m distance): dictionary of {next_stop_id: walk_duration_second, ...} 


In order to generate network we start with the Zurich main station (add it in <b>G</b>) and using <b>current_next</b> add all the stations accessible from Zurich with its times and trip_ids to the departures of Zurich main station node. Furthermore we also create a node for each new station discovered (add them in <b>G</b>) and add arrivals from Zurich to them. We also calculate nodes that are within 500m distance from Zurich and add them in walkable_stations dictionary and create new Nodes for them as well (deatails of datastructures of each attribute is described above). Once Zurich is fully processed we continue with other newly discovered nodes, processing each and every node exactly once and performing the same procedure as described for the Zurich main station.

Finally, <b>G</b> Looks like this:

<b>G</b> = { 

      stop_id1: StationNode1,
      stop_id2: StationNode2,
      ...
      
    }

For all stations accessible from Zurich HB within 15km radius. Final <b>G</b> is saved as '/user/lortkipa/graph_untested_classTuples.pkl' ready for route planning algorithm.


At the end of the notebook is a validation section. In order to test that graph is generated correctly, we consider a ground truth routes and make sure our SBB connections graph contains all the needed connections to end up with this ground truth route. The graphs passed all the tests and well documented details are given in sbb_network_generation.ipynb notebook.

Table of content for this notebook:

0. Helper Functions
1. Global Parameters
2. Graph Generation
3. Validation

In [1]:
%%configure
{"conf": {
    "spark.app.name": "my-awesome-group_final",
    "spark.driver.memory": "5g"
}}

ID,YARN Application ID,Kind,State,Spark UI,Driver log,Current session?
8387,application_1589299642358_2919,pyspark,idle,Link,Link,
8393,application_1589299642358_2925,pyspark,idle,Link,Link,
8403,application_1589299642358_2935,pyspark,idle,Link,Link,
8409,application_1589299642358_2941,pyspark,busy,Link,Link,
8412,application_1589299642358_2944,pyspark,idle,Link,Link,
8414,application_1589299642358_2946,pyspark,idle,Link,Link,
8417,application_1589299642358_2949,pyspark,idle,Link,Link,
8418,application_1589299642358_2950,pyspark,idle,Link,Link,
8419,application_1589299642358_2951,pyspark,idle,Link,Link,
8420,application_1589299642358_2952,pyspark,busy,Link,Link,


In [2]:
# Initialization
spark

Starting Spark application


ID,YARN Application ID,Kind,State,Spark UI,Driver log,Current session?
8437,application_1589299642358_2969,pyspark,idle,Link,Link,✔


FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

SparkSession available as 'spark'.


FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

<pyspark.sql.session.SparkSession object at 0x7f95dcc19690>

# 0. Imports/ Helper Functions 

In [3]:
import pandas as pd 
from math import sin, cos, sqrt, atan2, radians
from geopy import distance as dist
from pyspark.sql.functions import col
from heapq import *

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

In [4]:
# cell to communicate with hdfs
import subprocess, pickle

def run_cmd(args_list):
    """Run linux commands."""
    print('Running system command: {0}'.format(' '.join(args_list)))    
    proc = subprocess.Popen(args_list,                            
                            stdout=subprocess.PIPE,                            
                            stderr=subprocess.PIPE)    
    s_output, s_err = proc.communicate()    
    s_return =  proc.returncode
    return s_return, s_output, s_err


def save_hdfs(localPath, hdfsPath):
    
    (ret, out, err)= run_cmd(['hdfs','dfs','-put','-f', localPath, hdfsPath])
    if err:
        print(err)
    else:
        print('Success')
        
def read_hdfs(hdfsPath):
    
    (ret, out, err)= run_cmd(['hdfs','dfs','-cat', hdfsPath])
    if err:
        print(err)
    else:
        print('Success')
    return pickle.loads(out)

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

# 1. Global Parameters

In this notebook we will be constructing graph <b>G</b> to model sbb network. For that we will be using first of all <b>current_next</b> dataframe described in data_preprocessing section as well as <b>filtered_stops</b> also descrived in data_preprocessing section.

In [6]:
## Read Needed Files
current_next = spark.read.parquet('hdfs:/user/lortkipa/current_next_6_22_Pcor.parquet')
print(current_next.count())
print(current_next.take(2))
current_next.printSchema()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

331639
[Row(trip_id=u'1356.TA.26-31-j19-1.12.R', arrival_time=u'16:21:00', departure_time=u'16:21:00', stop_id=u'8591186', stop_sequence=1, pickup_type=u'0', drop_off_type=u'0', trip_id_2=u'1356.TA.26-31-j19-1.12.R', arrival_time_2=u'16:22:00', departure_time_2=u'16:22:00', stop_id_2=u'8591334', stop_sequence_2=2, stop_sequence_adjusted=1), Row(trip_id=u'1356.TA.26-31-j19-1.12.R', arrival_time=u'16:22:00', departure_time=u'16:22:00', stop_id=u'8591334', stop_sequence=2, pickup_type=u'0', drop_off_type=u'0', trip_id_2=u'1356.TA.26-31-j19-1.12.R', arrival_time_2=u'16:23:00', departure_time_2=u'16:23:00', stop_id_2=u'8591253', stop_sequence_2=3, stop_sequence_adjusted=2)]
root
 |-- trip_id: string (nullable = true)
 |-- arrival_time: string (nullable = true)
 |-- departure_time: string (nullable = true)
 |-- stop_id: string (nullable = true)
 |-- stop_sequence: integer (nullable = true)
 |-- pickup_type: string (nullable = true)
 |-- drop_off_type: string (nullable = true)
 |-- trip_id_2:

In [7]:
# get all the stop_ids
all_stops = set(current_next.select('stop_id_2').distinct().rdd.flatMap(lambda x: x).collect())
all_stops.update(current_next.select('stop_id').distinct().rdd.flatMap(lambda x: x).collect())
len(all_stops)

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

1405

In [8]:
# get stops file
filtered_stops = read_hdfs('/user/lortkipa/filtered_stops_Premoved.pkl')
print(len(filtered_stops))
filtered_stops.head(6)

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

Running system command: hdfs dfs -cat /user/lortkipa/filtered_stops_Premoved.pkl
Success
1583
    stop_id               stop_name  ... location_type parent_station
10  8502508  Spreitenbach, Raiacker  ...          None           None
14  8503078                Waldburg  ...          None           None
15  8503088           Zürich HB SZU  ...          None       8503088P
17  8503376    Ottikon b. Kemptthal  ...          None           None
29  8506895          Lufingen, Dorf  ...          None           None
51  8573729    Bonstetten, Isenbach  ...          None           None

[6 rows x 6 columns]

# 2. Public Transport Model

To denote each stop/station we create <b>StationNode</b> datastructure, that has attributes such as station_id, departures, arrivals and walkable stations. Furthermore we define some helper functions of the StationNode to help us operate with it:

- station_id: id of station that the StatioNode corresponds to
- departures: dictionary of { departure_time: [(next_station_id, next_station_arrival, trip_id), ...], ... }
- arrivals: dictionary of { arrival_time: [(previous_station_id, prev_station_departure_time, trip_id), ...], ... }  
- walkable_stations (stations within 500m distance): dictionary of {next_stop_id: walk_duration_second, ...} 


In order to generate network we start with the Zurich main station (add it in <b>G</b>) and using <b>current_next</b> add all the stations accessible from Zurich with its times and trip_ids to the departures of Zurich main station node. Furthermore we also create a node for each new station discovered (add them in <b>G</b>) and add arrivals from Zurich to them. We also calculate nodes that are within 500m distance from Zurich and add them in walkable_stations dictionary and create new Nodes for them as well (deatails of datastructures of each attribute is described above). Once Zurich is fully processed we continue with other newly discovered nodes, processing each and every node exactly once and performing the same procedure as described for the Zurich main station.

Finally, <b>G</b> Looks like this:

<b>G</b> = { 

      stop_id1: StationNode1,
      stop_id2: StationNode2,
      ...
      
    }

For all stations accessible from Zurich HB within 15km radius. Final <b>G</b> is saved as '/user/lortkipa/graph_untested_classTuples.pkl' ready for route planning algorithm.

In [9]:
class StationNode:
    """
    Class for station node, that will keep a list of arrival/departures given by trips as well
    as walking distances
    """
    
    def __init__(self, station_id):
    
        self.station_id = station_id
        self.departures = dict()
        self.arrivals = dict()  
        self.walkable_stations = dict()
 
    def add_walkable_station(self, stop_id, duration):
        """
        Add walkable station and duration to get there
        """
        if stop_id not in self.walkable_stations:
            self.walkable_stations[stop_id] = duration

    def add_arrival(self, time, arrival):
        """
        Arrivals are in the form of
        """
        if time not in self.arrivals:
            self.arrivals[time] = []
            
        self.arrivals[time].append(arrival)
        
    def add_departure(self, time, departure):
        """
        Departure are in the form of
        """
        if time not in self.departures:
            self.departures[time] = []
            
        self.departures[time].append(departure)
        
    def to_tuple(self):
        """
        Convert class object to tuple
        """
        return (self.station_id, self.departures, self.arrivals, self.walkable_stations)
    
    def from_tuple(self, class_tuple):
        """
        Read class fields from tuple
        """
        self.station_id = class_tuple[0]
        self.departures = class_tuple[1]
        self.arrivals = class_tuple[2]
        self.walkable_stations = class_tuple[3]
        
        return self
        
def dictnodes_tolist(station_nodes):
    """
    Convert dictionary of StationNodes to list of node tuples
    """
    tolist = []
    for stop_id in station_nodes:
        try:
            tolist.append(station_nodes[stop_id].to_tuple())
        except:
            raise Exception(stop_id)
        
    return tolist
        
def list_todictnodes(fromlist):
    """
    Convert list of node tuples into dictionary of StationNodes
    """
    todict = dict()
    for node_tuple in fromlist:
        todict[node_tuple[0]] = StationNode(node_tuple[0]).from_tuple(node_tuple)
        
    return todict

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

In [12]:
test = [dictnodes_tolist({'a':StationNode('a')}), dictnodes_tolist({'b':StationNode('b')})]

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

In [13]:
current_next.printSchema()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

root
 |-- trip_id: string (nullable = true)
 |-- arrival_time: string (nullable = true)
 |-- departure_time: string (nullable = true)
 |-- stop_id: string (nullable = true)
 |-- stop_sequence: integer (nullable = true)
 |-- pickup_type: string (nullable = true)
 |-- drop_off_type: string (nullable = true)
 |-- trip_id_2: string (nullable = true)
 |-- arrival_time_2: string (nullable = true)
 |-- departure_time_2: string (nullable = true)
 |-- stop_id_2: string (nullable = true)
 |-- stop_sequence_2: integer (nullable = true)
 |-- stop_sequence_adjusted: integer (nullable = true)

In [14]:
# test the class
s = StationNode('123')
s.add_arrival(1, 2)
s.add_departure(2, 3)
s.arrivals

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

{1: [2]}

In [None]:
def compute_walkable_stations(curr_stop_id, stops=filtered_stops, max_walk_dist=500):
    """
    Given current stop location (lat, lon) returns stations that are max 500m away
    """
    
    current_coord  = filtered_stops[filtered_stops['stop_id'] == curr_stop_id][[
        'stop_lat', 'stop_lon']].values[0]
    
    df_stops = filtered_stops[['stop_id', 'stop_lat', 'stop_lon']]
    df_stops['distance'] = df_stops.apply(lambda x: dist.distance(
                   (float(current_coord[0]), 
                   float(current_coord[1])),
                   (float(x['stop_lat']), 
                   float(x['stop_lon']))).km, axis=1)

    return df_stops[df_stops.apply(lambda x: x['distance'] <= 0.5, axis=1)]

compute_walkable_stations(u'8503000')

In [16]:
def construct_network(start_station, current_next, max_walk_dist=500, asClass=True):
    """
    This functions constructs a transport network starting from a given station, with given walking distance restrictions
    and current_next dataframe
    """
    
    # list of stations to be processed
    active_stations = [start_station]
    # set of already processed/visited stations
    visited_stations = set()
    visited_stations.add(start_station)
    
    # final network presented as {stop_id: corresponding_node, ...}
    station_nodes = dict()
    ind=0
    
    while  True:
        
        if ind == len(active_stations):
            break

        if (ind % 1000 == 0):
            print('processing %d-th station'%(ind+1))
        
        # read current stations and add it to the processed ones
        current_station_id = active_stations[ind]
        ind +=1
        

        # create a node corresponding to this station_id if it does not already exist
        if current_station_id not in station_nodes:
            if asClass:
                station_nodes[current_station_id] = StationNode(current_station_id)
            else:
                station_nodes[current_station_id] = (current_station_id, dict(), dict(), dict())

        # select next stations of a current station
        next_stations = current_next.where(col('stop_id') == current_station_id)   
        
        try:
            if next_stations is not None and next_stations.count() > 0:
                next_stations = next_stations.toPandas()
            else:
                continue
        except:
            print('oh no')
            print(current_station_id, ind)
            return 1,2
       
        #print(next_stations.count())
        #with pd.option_context('display.max_columns', 10):
            #print(next_stations.head())            

        current_departure_times = next_stations["departure_time"].values
        next_station_ids = next_stations["stop_id_2"].values
        next_station_arrivals = next_stations["arrival_time_2"].values
        next_station_departures = next_stations["departure_time_2"].values
        trip_id = next_stations["trip_id"].values
        
        
        # go over the next stations defined by routes
        for i in range(len(current_departure_times)):

            # add new departure
            if asClass:
                station_nodes[current_station_id].add_departure(
                    current_departure_times[i], (next_station_ids[i], next_station_arrivals[i], trip_id[i]))
            else:
                station_nodes[current_station_id][1][current_departure_times[i]] = (next_station_ids[i], 
                                                                                    next_station_arrivals[i], trip_id[i])

            # add new arrival for the next station
            if next_station_ids[i] not in station_nodes:
                if asClass:
                    station_nodes[next_station_ids[i]] = StationNode(next_station_ids[i])
                else:
                    station_nodes[next_station_ids[i]] = (next_station_ids[i], dict(), dict(), dict())
              
            if asClass:
                station_nodes[next_station_ids[i]].add_arrival(
                    next_station_arrivals[i], (current_station_id, current_departure_times[i], trip_id[i]))  
            else:
                station_nodes[next_station_ids[i]][2][next_station_arrivals[i]] = (current_station_id, 
                                                                                   current_departure_times[i], trip_id[i])
            
            # add next station to the list of future nodes if not already done
            if next_station_ids[i] not in visited_stations:
                active_stations.append(next_station_ids[i])
                visited_stations.add(next_station_ids[i])
    
        
        walkable_neighbours = compute_walkable_stations(current_station_id)
        
        for item, row in walkable_neighbours.iterrows():
            
            if row['stop_id'] not in visited_stations:
                active_stations.append(row['stop_id'])
                visited_stations.add(row['stop_id'])
                
            if row['stop_id'] not in station_nodes:
                if asClass:
                    station_nodes[row['stop_id']] = StationNode(row['stop_id'])
                else:
                    station_nodes[row['stop_id']] = (row['stop_id'], dict(), dict(), dict())
            
            # add new departure
            if asClass:
                station_nodes[current_station_id].add_walkable_station(row['stop_id'], row['distance'] * 60 * 20)
            else:
                station_nodes[current_station_id][3][row['stop_id']] = row['distance'] * 60 * 20
            
            # add new arrival
            #station_nodes[row['stop_id']].add_walkable_station(current_station_id, row['distance'] * 60 * 20)
            
     
    #print(ind)
    print(len(active_stations))
    print(len(visited_stations))
    return station_nodes, visited_stations

# start construnction network from zurich hb so that stations only accesible from there are contained
graph, visited_stations = construct_network(u'8503000', current_next)
print('After function')
print(len(set(visited_stations)))
print(len(graph))

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

processing 1-th station
processing 1001-th station
1450
1450
After function
1450
1450

In [17]:
#created_network_save = graph
with open('graph_untested_classTuples.pkl', 'wb') as handle:
    pickle.dump(dictnodes_tolist(graph), handle, protocol=pickle.HIGHEST_PROTOCOL)
# send to hdf
save_hdfs('graph_untested_classTuples.pkl','/user/{}/'.format('lortkipa'))

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

Running system command: hdfs dfs -put -f graph_untested_classTuples.pkl /user/lortkipa/
Success

In [None]:
print(len(graph))
current_next.where(col('stop_id') == '8503000').select('stop_id_2').distinct().count()

In [20]:
graph[u'8580301'].arrivals

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

{u'20:27:00': [(u'8588553', u'20:26:00', u'92.TA.26-765-j19-1.1.R')], u'12:29:00': [(u'8573205', u'12:28:00', u'112.TA.26-734-j19-1.4.R')], u'08:47:00': [(u'8573205', u'08:46:00', u'94.TA.26-737-j19-1.2.R')], u'15:57:00': [(u'8588553', u'15:56:00', u'67.TA.79-736-j19-1.5.H')], u'14:28:00': [(u'8588553', u'14:27:00', u'4.TA.26-765-j19-1.1.R')], u'18:11:00': [(u'8588553', u'18:10:00', u'171.TA.26-733-j19-1.3.H')], u'11:42:00': [(u'8588553', u'11:41:00', u'83.TA.79-736-j19-1.5.H')], u'14:32:00': [(u'8573205', u'14:31:00', u'61.TA.26-737-j19-1.2.R')], u'15:46:00': [(u'8573205', u'15:46:00', u'229.TA.26-765-j19-1.2.H'), (u'8573205', u'15:45:00', u'32.TA.79-736-j19-1.2.R')], u'06:26:00': [(u'8588553', u'06:25:00', u'106.TA.26-731-j19-1.1.H'), (u'8588553', u'06:25:00', u'132.TA.26-733-j19-1.3.H')], u'14:46:00': [(u'8573205', u'14:46:00', u'246.TA.26-765-j19-1.2.H'), (u'8573205', u'14:45:00', u'34.TA.79-736-j19-1.2.R')], u'18:48:00': [(u'8573205', u'18:48:00', u'111.TA.26-733-j19-1.2.R')], u'1

In [10]:
G = read_hdfs('/user/lortkipa/graph_untested_classTuples.pkl')
G = list_todictnodes(G)
#for stop_id in G:
    #G[stop_id] = StationNode(stop_id).from_tuple(G[stop_id])

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

Running system command: hdfs dfs -cat /user/lortkipa/graph_untested_classTuples.pkl
Success

## Validation

In order to test that graph is generated correctly, we will consider a ground truth routes and make sure our SBB connections graph contains all the needed connections to end up with this route:

start =  Zürich HB (8503000) to Zürich

end = Auzelg (8591049)

arrival = '12:30:00'



<b>Route:</b>

1) 20.TA.26-9-A-j19-1.2.H: 8503000:0:41/42 at 12:07:00 ~ 8503310:0:3 at 12:17:00

2) Walking: 8503310:0:3 ~ 8590620

3) 168.TA.26-12-A-j19-1.2.H: 8590620 at 12:23:00 ~ 8591049 at 12:29:00

In [52]:
######################################################################################################################
# 1) let's make sure that graph has all the info for 20.TA.26-9-A-j19-1.2.H: 
#                        8503000:0:41/42 at 12:07:00 ~ 8503310:0:3 at 12:17:00
#######################################################################################################################


# first let's print this trip from the modified dataframe that was used for Graph creation
print(current_next[current_next['trip_id'] == '20.TA.26-9-A-j19-1.2.H'][ \
             current_next['departure_time'] >= '12:07:00'][current_next['departure_time'] <= '12:18:00'].select(
    'trip_id', 'arrival_time', 'departure_time', 'stop_id', 'stop_sequence').collect())


print('\n')
print("As we can see 8503000 is stop 7 and 8503310 is stop 10 on this route")


print('We will now follow the graph connection by connection and make sure all is there with correct times')

print('Stop7 -> Stop8')
# stop 7 going to stop 8, '8503000' -> '8503020'
check1 = G['8503000'].departures['12:07:00']
for connection in check1:
    if connection[2] == '20.TA.26-9-A-j19-1.2.H':
        check1 = connection
print('8503000','12:07:00', check1)
assert check1[0] == '8503020'
assert check1[2] == '20.TA.26-9-A-j19-1.2.H'


print('Stop8 -> Stop9')
# stop 7 going to stop 8, '8503020' -> '8503006'
check1 = G['8503020'].departures['12:09:00']
for connection in check1:
    if connection[2] == '20.TA.26-9-A-j19-1.2.H':
        check1 = connection
print('8503020','12:09:00', check1)
assert check1[0] == '8503006'
assert check1[2] == '20.TA.26-9-A-j19-1.2.H'


print('Stop9 -> Stop10')
# stop 7 going to stop 8, '8503006' -> '8503310'
check1 = G['8503006'].departures['12:15:00']
for connection in check1:
    if connection[2] == '20.TA.26-9-A-j19-1.2.H':
        check1 = connection
print('8503006','12:15:00', check1)
assert check1[0] == '8503310'
assert check1[2] == '20.TA.26-9-A-j19-1.2.H'

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

[Row(trip_id=u'20.TA.26-9-A-j19-1.2.H', arrival_time=u'12:02:00', departure_time=u'12:07:00', stop_id=u'8503000', stop_sequence=7), Row(trip_id=u'20.TA.26-9-A-j19-1.2.H', arrival_time=u'12:09:00', departure_time=u'12:09:00', stop_id=u'8503020', stop_sequence=8), Row(trip_id=u'20.TA.26-9-A-j19-1.2.H', arrival_time=u'12:14:00', departure_time=u'12:15:00', stop_id=u'8503006', stop_sequence=9), Row(trip_id=u'20.TA.26-9-A-j19-1.2.H', arrival_time=u'12:17:00', departure_time=u'12:18:00', stop_id=u'8503310', stop_sequence=10)]


As we can see 8503000 is stop 7 and 8503310 is stop 10 on this route
We will now follow the graph connection by connection and make sure all is there with correct times
Stop7 -> Stop8
('8503000', '12:07:00', (u'8503020', u'12:09:00', u'20.TA.26-9-A-j19-1.2.H'))
Stop8 -> Stop9
('8503020', '12:09:00', (u'8503006', u'12:14:00', u'20.TA.26-9-A-j19-1.2.H'))
Stop9 -> Stop10
('8503006', '12:15:00', (u'8503310', u'12:17:00', u'20.TA.26-9-A-j19-1.2.H'))

In [44]:
######################################################################################################################
# 2) let's make sure that graph has Walking: 8503310:0:3 ~ 8590620 connection
#######################################################################################################################

print(G['8503310'].walkable_stations)
assert '8590620' in G['8503310'].walkable_stations

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

{u'8503310': 0.0, u'8590762': 562.7966106551638, u'8590742': 312.4211443238438, u'8590629': 213.79726829509306, u'8503340': 293.78730858511796, u'8590621': 403.730882131385, u'8590620': 69.52556591008263, u'8590622': 531.1270316987234}

In [41]:
######################################################################################################################
# 3) let's make sure that graph has all the info for 168.TA.26-12-A-j19-1.2.H: 8590620 at 12:23:00 ~ 8591049 at 12:29:00
#######################################################################################################################

# first let's print this trip from the modified dataframe that was used for Graph creation
print(current_next[current_next['trip_id'] == '168.TA.26-12-A-j19-1.2.H'][ \
             current_next['departure_time'] >= '12:23:00'][current_next['departure_time'] <= '12:29:00'].select(
    'trip_id', 'arrival_time', 'departure_time', 'stop_id', 'stop_sequence').collect())

print('\n')
print("As we can see 8590620 is stop 6 and 8591049 is stop 10 on this route")


print('We will now follow the graph connection by connection and make sure all is there with correct times')


print('Stop6 -> Stop7')
# stop 6 going to stop 7, '8590620' -> '8590626'
check1 = G['8590620'].departures['12:23:00']
for connection in check1:
    if connection[2] == '168.TA.26-12-A-j19-1.2.H':
        check1 = connection
print('8590620', '12:23:00', check1)
assert check1[0] == '8590626'
assert check1[2] == '168.TA.26-12-A-j19-1.2.H'


print('Stop7 -> Stop8')
# stop 7 going to stop 8, '8590626' -> '8591830'
check1 = G['8590626'].departures['12:24:00']
for connection in check1:
    if connection[2] == '168.TA.26-12-A-j19-1.2.H':
        check1 = connection
print('8590626','12:24:00', check1)
assert check1[0] == '8591830'
assert check1[2] == '168.TA.26-12-A-j19-1.2.H'

print('Stop8 -> Stop9')
# stop 8 going to stop 9, '8591830' -> '8591128'
check1 = G['8591830'].departures['12:26:00']
for connection in check1:
    if connection[2] == '168.TA.26-12-A-j19-1.2.H':
        check1 = connection
print('8591830', '12:26:00', check1)
assert check1[0] == '8591128'
assert check1[2] == '168.TA.26-12-A-j19-1.2.H'

print('Stop9 -> Stop10')
# stop 8 going to stop 9, '8591128' -> '8591049'
check1 = G['8591128'].departures['12:27:00']
for connection in check1:
    if connection[2] == '168.TA.26-12-A-j19-1.2.H':
        check1 = connection
print('8591128', '12:27:00', check1)
assert check1[0] == '8591049'
assert check1[2] == '168.TA.26-12-A-j19-1.2.H'

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

[Row(trip_id=u'168.TA.26-12-A-j19-1.2.H', arrival_time=u'12:23:00', departure_time=u'12:23:00', stop_id=u'8590620', stop_sequence=6), Row(trip_id=u'168.TA.26-12-A-j19-1.2.H', arrival_time=u'12:24:00', departure_time=u'12:24:00', stop_id=u'8590626', stop_sequence=7), Row(trip_id=u'168.TA.26-12-A-j19-1.2.H', arrival_time=u'12:26:00', departure_time=u'12:26:00', stop_id=u'8591830', stop_sequence=8), Row(trip_id=u'168.TA.26-12-A-j19-1.2.H', arrival_time=u'12:27:00', departure_time=u'12:27:00', stop_id=u'8591128', stop_sequence=9), Row(trip_id=u'168.TA.26-12-A-j19-1.2.H', arrival_time=u'12:29:00', departure_time=u'12:29:00', stop_id=u'8591049', stop_sequence=10)]


As we can see 8590620 is stop 6 and 8591049 is stop 10 on this route
We will now follow the graph connection by connection and make sure all is there with correct times
Stop6 -> Stop7
('8590620', '12:23:00', (u'8590626', u'12:24:00', u'168.TA.26-12-A-j19-1.2.H'))
Stop7 -> Stop8
('8590626', '12:24:00', (u'8591830', u'12:26:00', u'

## Testing nearly identical routs

u'1755.TA.26-781-j19-1.3.R'

u'1891.TA.26-781-j19-1.3.R'

8580449

In [59]:
current_next[current_next['trip_id'] == '1755.TA.26-781-j19-1.3.R'].select(
    'trip_id', 'arrival_time', 'departure_time', 'stop_id', 'stop_sequence').collect()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

[Row(trip_id=u'1755.TA.26-781-j19-1.3.R', arrival_time=u'12:17:00', departure_time=u'12:17:00', stop_id=u'8580449', stop_sequence=1), Row(trip_id=u'1755.TA.26-781-j19-1.3.R', arrival_time=u'12:18:00', departure_time=u'12:18:00', stop_id=u'8591063', stop_sequence=2), Row(trip_id=u'1755.TA.26-781-j19-1.3.R', arrival_time=u'12:19:00', departure_time=u'12:19:00', stop_id=u'8591256', stop_sequence=3), Row(trip_id=u'1755.TA.26-781-j19-1.3.R', arrival_time=u'12:20:00', departure_time=u'12:20:00', stop_id=u'8591172', stop_sequence=4), Row(trip_id=u'1755.TA.26-781-j19-1.3.R', arrival_time=u'12:21:00', departure_time=u'12:21:00', stop_id=u'8591318', stop_sequence=5), Row(trip_id=u'1755.TA.26-781-j19-1.3.R', arrival_time=u'12:22:00', departure_time=u'12:22:00', stop_id=u'8591225', stop_sequence=6), Row(trip_id=u'1755.TA.26-781-j19-1.3.R', arrival_time=u'12:23:00', departure_time=u'12:23:00', stop_id=u'8591128', stop_sequence=7), Row(trip_id=u'1755.TA.26-781-j19-1.3.R', arrival_time=u'12:24:00', d

In [62]:
current_next[current_next['trip_id'] == '1891.TA.26-781-j19-1.3.R'].select(
    'trip_id', 'arrival_time', 'departure_time', 'stop_id', 'stop_sequence').collect()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

[Row(trip_id=u'1891.TA.26-781-j19-1.3.R', arrival_time=u'12:17:00', departure_time=u'12:17:00', stop_id=u'8580449', stop_sequence=1), Row(trip_id=u'1891.TA.26-781-j19-1.3.R', arrival_time=u'12:18:00', departure_time=u'12:18:00', stop_id=u'8591063', stop_sequence=2), Row(trip_id=u'1891.TA.26-781-j19-1.3.R', arrival_time=u'12:19:00', departure_time=u'12:19:00', stop_id=u'8591256', stop_sequence=3), Row(trip_id=u'1891.TA.26-781-j19-1.3.R', arrival_time=u'12:20:00', departure_time=u'12:20:00', stop_id=u'8591172', stop_sequence=4), Row(trip_id=u'1891.TA.26-781-j19-1.3.R', arrival_time=u'12:21:00', departure_time=u'12:21:00', stop_id=u'8591318', stop_sequence=5), Row(trip_id=u'1891.TA.26-781-j19-1.3.R', arrival_time=u'12:22:00', departure_time=u'12:22:00', stop_id=u'8591225', stop_sequence=6), Row(trip_id=u'1891.TA.26-781-j19-1.3.R', arrival_time=u'12:23:00', departure_time=u'12:23:00', stop_id=u'8591128', stop_sequence=7), Row(trip_id=u'1891.TA.26-781-j19-1.3.R', arrival_time=u'12:24:00', d

There are some nearly identical (sometimes fully identical) routes existing. We looked at it to make sure routing algorithm was not doing mistakes and such identical routes really exits.