# 3.Predictive models

In this notebook, we build the prediction dataframes for transfer success probability using SBB Historical data. We then use the models to make a transfer success prediction function and test it.

The sbb is loaded,  filtered, and addapted to remove missing data, trips outside of the considered schedule, and stops ouside of zurich areas.

The goal is to extract delays average and standard deviation for a specific trip, station and time. 

Table of contents of this notebook:

1. [Helper functions](#1.-Helper-functions)

Define useful library and functions to be used

2. [Data preprocessing](#2.-Data-preprocessing)

The sbb is loaded,  filtered, and addapted to remove missing data, trips outside of the considered schedule, and stops ouside of zurich areas.

3. [Build prediction dataframe](#3.-Build-prediction-dataframe)

Extract delays average and standard deviation for different sets of parameters (options: StationId, TripId, DayPeriod). 

4. [Transfer probability function](#4.-Transfer-probability-function)

Create a function that, given transfer infomation such as arrival and departure details, compute a probability that the transfer succeeds

5. [Validation](#5.-Validation)

Test the function for a few values to see it it work properly


In [1]:
%%configure
{"conf": {
    "spark.app.name": "272648_final"
}}

ID,YARN Application ID,Kind,State,Spark UI,Driver log,Current session?
9244,application_1589299642358_3813,pyspark,idle,Link,Link,
9250,application_1589299642358_3819,pyspark,idle,Link,Link,
9286,application_1589299642358_3857,pyspark,idle,Link,Link,
9292,application_1589299642358_3866,pyspark,busy,Link,Link,
9294,application_1589299642358_3868,pyspark,idle,Link,Link,
9295,application_1589299642358_3869,pyspark,busy,Link,Link,
9300,application_1589299642358_3877,pyspark,idle,Link,Link,
9302,application_1589299642358_3879,pyspark,idle,Link,Link,
9303,application_1589299642358_3880,pyspark,dead,Link,Link,
9308,application_1589299642358_3885,pyspark,idle,Link,Link,


In [2]:
spark

Starting Spark application


ID,YARN Application ID,Kind,State,Spark UI,Driver log,Current session?
9159,application_1589299642358_3724,pyspark,idle,Link,Link,✔


FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

SparkSession available as 'spark'.


FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

<pyspark.sql.session.SparkSession object at 0x7fa05d508450>

## 1. Helper functions

In [9]:
import datetime
import pyspark.sql.functions as F
from pyspark.sql.functions import udf
import pandas as pd
from geopy import distance as dist

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

In [10]:
import subprocess, pickle

# Hdfs helper methods
def run_cmd(args_list):
    """Run linux commands."""
    print('Running system command: {0}'.format(' '.join(args_list)))    
    proc = subprocess.Popen(args_list,                            
                            stdout=subprocess.PIPE,                            
                            stderr=subprocess.PIPE)    
    s_output, s_err = proc.communicate()    
    s_return =  proc.returncode
    return s_return, s_output, s_err

def save_hdfs(localPath, hdfsPath):
    
    (ret, out, err)= run_cmd(['hdfs','dfs','-put','-f', localPath, hdfsPath])
    if err:
        print(err)
    else:
        print('Success')
        
def read_hdfs(hdfsPath):
    
    (ret, out, err)= run_cmd(['hdfs','dfs','-cat', hdfsPath])
    if err:
        print(err)
    else:
        print('Success')
    return pickle.loads(out)

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

## 2. Data preprocessing

In [5]:
# Load the zurich area stops
stops = read_hdfs('/user/lortkipa/filtered_stops_Premoved.pkl')

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

Running system command: hdfs dfs -cat /user/lortkipa/filtered_stops_Premoved.pkl
Success

In [6]:
# Load sbb dataset
sbb = spark.read.orc('/data/sbb/orc/istdaten')
sbb.printSchema()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

root
 |-- betriebstag: string (nullable = true)
 |-- fahrt_bezeichner: string (nullable = true)
 |-- betreiber_id: string (nullable = true)
 |-- betreiber_abk: string (nullable = true)
 |-- betreiber_name: string (nullable = true)
 |-- produkt_id: string (nullable = true)
 |-- linien_id: string (nullable = true)
 |-- linien_text: string (nullable = true)
 |-- umlauf_id: string (nullable = true)
 |-- verkehrsmittel_text: string (nullable = true)
 |-- zusatzfahrt_tf: string (nullable = true)
 |-- faellt_aus_tf: string (nullable = true)
 |-- bpuic: string (nullable = true)
 |-- haltestellen_name: string (nullable = true)
 |-- ankunftszeit: string (nullable = true)
 |-- an_prognose: string (nullable = true)
 |-- an_prognose_status: string (nullable = true)
 |-- abfahrtszeit: string (nullable = true)
 |-- ab_prognose: string (nullable = true)
 |-- ab_prognose_status: string (nullable = true)
 |-- durchfahrt_tf: string (nullable = true)

In [7]:
# Filter useless columns
df = sbb.select(
    F.to_timestamp(sbb['betriebstag'], "dd.MM.yyyy").alias('trip_date'),
    sbb['fahrt_bezeichner'].alias('trip_id'),
    sbb['produkt_id'].alias('transport_type'),
   # sbb['linien_id'].alias('train_number'),
    sbb['haltestellen_name'].alias('stop_name'),
    F.to_timestamp(sbb['ankunftszeit'], "dd.MM.yyyy HH:mm").alias('schedule_arrival_time'),
    F.to_timestamp(sbb['an_prognose'], "dd.MM.yyyy HH:mm:ss").alias('true_arrival_time'),
    sbb['an_prognose_status'].alias('status_arrival_time'),
    F.to_timestamp(sbb['abfahrtszeit'], "dd.MM.yyyy HH:mm").alias('schedule_departure_time'),
    F.to_timestamp(sbb['ab_prognose'], "dd.MM.yyyy HH:mm:ss").alias('true_departure_time'),
    sbb['ab_prognose_status'].alias('status_departure_time'),
    sbb['durchfahrt_tf'].alias('is_not_stopping')
)

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

We are going to devide days in 4 periods using our analysis in homework1: 
1. Morning rush hour: 7h -> 10h
2. Mid day period: 10h -> 16h
3. Evening rush hour: 16h -> 19h
4. Late day: 19h -> 23h

In [8]:
# Return a day category using trip hour
def get_day_period(hour):
    if (hour < 10):
        return 1
    elif (hour < 16):
        return 2
    elif (hour < 19):
        return 3
    else:
        return 4
    
from pyspark.sql.types import IntegerType
get_day_period_udf = F.udf(lambda h: get_day_period(h), IntegerType())
    
    
# Return stop id given stop name
@udf('string')
def get_stop_id(stop_name):
    return stops_name_to_id[stop_name]['stop_id']

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

In [9]:
# Filter the data with data with GESCHAETZ
df = df.filter((df['status_arrival_time'] == 'GESCHAETZT') & (df['status_departure_time'] == 'GESCHAETZT'))

# Consider only data with reasonable hours of the day: 7h -> 23h
df = df.filter((F.hour(df['schedule_arrival_time']) < 23) & (F.hour(df['schedule_departure_time']) > 6))

# Remove all data from trips happening in Saturday and Sunday (we will only consider May 13-May 17 2019 schedule so do not consider the week end).
df = df.withColumn('weekday', F.date_format(df['schedule_arrival_time'], 'u'))
df = df.filter(df['weekday'] < 6) # Keep only monday to friday

# Remove the trip is the train is not stopping
df = df.filter(df['is_not_stopping'] == 'false')

from pyspark.sql.types import LongType
# Add the delay columns
df = df.withColumn('arrival_delay', df['true_arrival_time'].cast(LongType()) - df['schedule_arrival_time'].cast(LongType()))
df = df.withColumn('departure_delay', df['true_departure_time'].cast(LongType()) - df['schedule_departure_time'].cast(LongType()))

# Remove outliers for delays (remove all delay greater than 1h)
df = df.filter((df['arrival_delay'] < 3600) & (df['arrival_delay'] > -3600) & (df['departure_delay'] < 3600) & (df['departure_delay'] > -3600))

# Remove data about useless stations
df = df.filter(df['stop_name'].isin(stops['stop_name'].values.tolist()))

# Add the day period column to the data
df = df.withColumn('day_period', get_day_period_udf(F.hour(df['schedule_departure_time'])))

# Add the stopId corresponding to the stop name.
stops_name_to_id = stops.reset_index().set_index('stop_name')[['stop_id']].T.to_dict()
df = df.withColumn('stop_id', get_stop_id(df['stop_name']))

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

In [10]:
# Remove now useless columns
df_cleaned = df.select('trip_date', 'weekday', 'trip_id', 'transport_type', 'stop_id', 'day_period', 'schedule_arrival_time', 'arrival_delay', 'schedule_departure_time', 'departure_delay')

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

We now have cleaned the historical data to keep only what we need. 

In [11]:
df_cleaned.show(5)

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

+-------------------+-------+--------------+--------------+-------+----------+---------------------+-------------+-----------------------+---------------+
|          trip_date|weekday|       trip_id|transport_type|stop_id|day_period|schedule_arrival_time|arrival_delay|schedule_departure_time|departure_delay|
+-------------------+-------+--------------+--------------+-------+----------+---------------------+-------------+-----------------------+---------------+
|2018-04-16 00:00:00|      1|85:11:1252:001|           Zug|8503000|         4|  2018-04-16 21:23:00|          -75|    2018-04-16 21:36:00|             67|
|2018-04-16 00:00:00|      1|85:11:1255:001|           Zug|8503000|         1|  2018-04-16 08:26:00|          132|    2018-04-16 08:37:00|            118|
|2018-04-16 00:00:00|      1|85:11:1509:003|           Zug|8503000|         1|  2018-04-16 07:30:00|           45|    2018-04-16 07:39:00|             29|
|2018-04-16 00:00:00|      1|85:11:1509:003|           Zug|8503016|   

## 3. Build prediction dataframe

In [12]:
# Build day period prediction dataframe (in case full / stop predictions are missing)
period_prediction_delays = df_cleaned.groupby('day_period').agg(F.mean('arrival_delay').alias('mean_arrival_delay'), F.stddev('arrival_delay').alias('std_arrival_delay'),
                                                              F.mean('departure_delay').alias('mean_departure_delay'), F.stddev('departure_delay').alias('std_departure_delay'))
period_prediction_delays = period_prediction_delays.collect()
period_prediction_df = pd.DataFrame(period_prediction_delays, columns=['day_period', 'mean_arrival_delay', 'std_arrival_delay', 'mean_departure_delay', 'std_departure_delay'])

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

In [None]:
# Build a stop prediction (in case full prediction missing)
stop_prediction_delays = df_cleaned.groupby('day_period', 'stop_id').agg(F.mean('arrival_delay').alias('mean_arrival_delay'), F.stddev('arrival_delay').alias('std_arrival_delay'),
                                                              F.mean('departure_delay').alias('mean_departure_delay'), F.stddev('departure_delay').alias('std_departure_delay'))
stop_prediction_delays = stop_prediction_delays.collect()
stop_prediction_df = pd.DataFrame(stop_prediction_delays, columns=['day_period', 'stop_id', 'mean_arrival_delay', 'std_arrival_delay', 'mean_departure_delay', 'std_departure_delay'])

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

In [None]:
# Build a full prediction (using trip_id, stop_id, and day_period)
full_prediction_delays = df_cleaned.groupby('trip_id' ,'stop_id', 'day_period').agg(F.mean('arrival_delay').alias('mean_arrival_delay'), F.stddev('arrival_delay').alias('std_arrival_delay'),
                                                              F.mean('departure_delay').alias('mean_departure_delay'), F.stddev('departure_delay').alias('std_departure_delay'))
full_prediction_delays = full_prediction_delays.collect()
full_prediction_df = pd.DataFrame(full_prediction_delays, columns = ['trip_id', 'stop_id','day_period', 'mean_arrival_delay', 'std_arrival_delay', 'mean_departure_delay', 'std_departure_delay'])

In [None]:
# Save predictions to pickle file
period_prediction_df.to_pickle('period_prediction_df.pkl')
stop_prediction_df.to_pickle('stop_prediction_df.pkl')
full_prediction_df.to_pickle('full_prediction_df.pkl')

In [None]:
# Save predictions to hdfs
save_hdfs('period_prediction_df.pkl','/user/lortkipa/')
save_hdfs('stop_prediction_df.pkl','/user/lortkipa/')
save_hdfs('full_prediction_df.pkl','/user/lortkipa/')

## 4. Transfer probability function

In [7]:
from geopy import distance as dist
import pandas as pd
from os import path
import math

# Compute the transfer success probability given transfer parameters
def transfer_success_probability(stop1_id, trip1_id, trip1_arrival_time, stop2_id, trip2_id, trip2_departure_time,  verbose=False, norm=True):
    # Extract time in seconds and day period from time string
    trip1_arr_seconds, trip1_day_period = get_time_info(trip1_arrival_time)
    trip2_dep_seconds, trip2_day_period = get_time_info(trip2_departure_time)

    schedule_difference = trip2_dep_seconds - trip1_arr_seconds
    
    # If the tansfer is more than an hour, return 1 (optimization)
    if schedule_difference > 3600: return 1
    
    minimum_transfer_time = 120
    walking_time = compute_walk_time(stop1_id, stop2_id)

    # Compute the transfer time surplus
    transfer_time_surplus = schedule_difference - walking_time - minimum_transfer_time

    # Exctract prediction parameters
    avg_arrival_delay, std_arrival_delay = get_prediction_data(stop1_id, trip1_id, trip1_day_period, is_arrival=True)
    
    if trip2_id == 'Terminus':
        avg_departure_delay = 0
        std_departure_delay = 0
        transfer_time_surplus += minimum_transfer_time # Do not require the mininum two minutes for terminus transfer
    else:
        avg_departure_delay, std_departure_delay = get_prediction_data(stop2_id, trip2_id, trip2_day_period, is_arrival=False)

    if verbose:
        print("Schedule difference: {}".format(schedule_difference))
        print("Walking time: {}".format(walking_time))
        print("Transfer_time_surplus: {}".format(transfer_time_surplus))
        print("Mean arrival delay: {}".format(avg_arrival_delay))
        print("Mean departure delay: {}".format(avg_departure_delay))
       
    # Compute the success probability
    if norm:
        return compute_uncertainty_norm(avg_arrival_delay, std_arrival_delay, avg_departure_delay, std_departure_delay, transfer_time_surplus)
    else:
        return compute_uncertainty(avg_arrival_delay, avg_departure_delay, transfer_time_surplus)

# Compute transfer probability for terminus arrival
def terminus_success_probability(last_stop_id,
                                 last_trip_id,
                                 last_trip_arrival_time,
                                 terminus_stop_id,
                                 terminus_time,
                                 verbose=False):
    return transfer_success_probability(last_stop_id,
                                        last_trip_id,
                                        last_trip_arrival_time,
                                        terminus_stop_id,
                                        'Terminus',
                                        terminus_time,
                                        verbose)

# Get period of day given trip hour
def get_day_period(hour):
    if hour < 10:
        return 1
    elif hour < 16:
        return 2
    elif hour < 19:
        return 3
    else:
        return 4


# Compute the duration to walk between two stops
def compute_walk_time(stop_id_1, stop_id_2):
    
    stop1 = stops[stops['stop_id'] == stop_id_1]
    
    if len(stop1) == 0:
        print('Stop1 is missing, can\'t compute walk time for {}'.format(stop_id_1))
        return 0
    
    lat1 = float(stop1['stop_lat'].values[0])
    lon1 = float(stop1['stop_lon'].values[0])
    

    stop2 = stops[stops['stop_id'] == stop_id_2]
    
    if len(stop2) == 0:
        print('Stop2 is missing, can\'t compute walk time for {}'.format(stop_id_2))
        return 0
    
    lat2 = float(stop2['stop_lat'].values[0])
    lon2 = float(stop2['stop_lon'].values[0])

    return dist.distance((lat1, lon1), (lat2, lon2)).km * 1200  # 1200 secondes to make 1 km (corresponds to 50m/min)

# Return day period and time in seconds from string time
def get_time_info(time_string):
    """
    Given time_string in the form of hh:mm:ss, computes total number of seconds
    """
    split = time_string.split(':')
    seconds = int(split[0]) * 3600 + int(split[1]) * 60 + int(split[2])
    day_period = get_day_period(int(split[0]))

    return seconds, day_period

# Load file load pickle / hdfs
def get_pred_file(hdfs_path):
    file_name = hdfs_path.split("/")[-1]
    if path.exists(file_name):
        return pd.read_pickle(file_name)
    else:
        loaded_df = pd.DataFrame(read_hdfs(hdfs_path))
        loaded_df.to_pickle(file_name)
        return pd.DataFrame(loaded_df)

# Extract prediction parameters from adapted prediction dataset
def get_prediction_data(stop_id, trip_id, day_period, is_arrival):

    delay_column = ['mean_arrival_delay', 'std_arrival_delay'] if is_arrival else ['mean_departure_delay','std_departure_delay']

    prediction_param = full_pred[(full_pred['stop_id'] == stop_id)
                                 & (full_pred['trip_id'] == trip_id)
                                 & (full_pred['day_period'] == day_period)][delay_column]

    if len(prediction_param) == 0:
        prediction_param = stop_pred[(stop_pred['stop_id'] == stop_id)
                                     & (stop_pred['day_period'] == day_period)][delay_column]

    if len(prediction_param) == 0:
        prediction_param = period_pred[(period_pred['day_period'] == day_period)][delay_column]

    return prediction_param.values[0]


# Compute uncertainty for exponential random variable approximation
def compute_uncertainty(avg_arr_delay, avg_dep_delay, transfer_time_surplus):
        
    if transfer_time_surplus <= 0:
        if avg_dep_delay == 0:
            return 0
        return avg_dep_delay / (avg_arr_delay + avg_dep_delay) * math.exp(transfer_time_surplus / avg_dep_delay)
    else:
        if avg_arr_delay == 0:
            return 1
        return 1 - avg_arr_delay / (avg_arr_delay + avg_dep_delay) * math.exp(-transfer_time_surplus / avg_arr_delay)

from scipy.stats import norm
# Compute uncertainty for normal random variable approximation
def compute_uncertainty_norm(avg_arr_delay, std_arr_delay, avg_dep_delay, std_dep_delay, transfer_time_surplus):
    
    avg_time_loss = avg_arr_delay - avg_dep_delay
    std_time_loss = math.sqrt(math.pow(avg_arr_delay,2)  + math.pow(avg_dep_delay,2))
    
    return norm.cdf(transfer_time_surplus, avg_time_loss, std_time_loss)

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

#### Parameters:

##### IN GENERAL:

- stop1_id: String - Id of the last arrival stop of current trip
- trip1_id: String - Id of the current trip
- trip1_arrival_time: String - Time of arrival at stop1 of trip1 (format: 'hh:mm:ss')

- stop2_id: String - Id of the first departure stop of the next trip
- trip2_id: String - Id of the next trip
- trip2_departure_time: String - Time of departure of trip2 at stop2 (format: 'hh:mm:ss').

##### FOR TERMINUS:

- last_stop_id: String - Id of the last arrival stop of current trip
- last_trip_id: String - Id of the current trip
- last_trip_arrival_time: String - Time of arrival at stop1 of trip1 (format: 'hh:mm:ss')

- terminus_stop_id: String - Id defined by the user as journey end station
- terminus_time: Time defined by the user as journey arrival time (format: 'hh:mm:ss')

#### Examples:

__Case A: Normal transfer__

__A.1. Normal transfer without walk__

[(u'8503006', '08:18:00', u'229.TA.26-14-j19-1.43.H', '00:03:00'), <br> 
(u'8503129', '08:29:00', u'51.TA.26-19-j19-1.16.H', '00:02:00'),]

*transfer_success_probability(<br> 
stop1_id = u'8503129',<br> 
trip1_id = u'229.TA.26-14-j19-1.43.H',<br> 
trip1_arrival_time = '08:21:00',<br> 
stop2_id = u'8503129',<br> 
trip2_id = u'51.TA.26-19-j19-1.16.H',<br> 
trip2_departure_time = '08:29:00')*

u'8503006' station is not the arrival station of trip1, it is it's departure. In this case, 
the arrival station of trip1 is the same as the departure station of trip2

__A.2. Normal transfer with walk__

[(u'8503305', '08:47:00', u'39.TA.26-7-A-j19-1.12.H', '00:03:00'), <br>
(u'8503304', '09:11:42', 'walk', '00:00:17'), <br>
(u'8575944', '09:14:00', u'156.TA.26-650-j19-1.7.H', '00:02:00')]

__Case B: Arrival at terminus__


__B.1. Terminus transfer without walk__

[(u'8503381', '09:34:00', u'162.TA.26-655-j19-1.12.R', '00:03:00'), <br>
(u'8503376', '10:00:00', 'Terminus', '00:00:00')]


terminus_success_probability(<br> 
last_stop_id = u'8503376',<br> 
last_trip_id = u'39.TA.26-7-A-j19-1.12.H',<br> 
last_trip_arrival_time = '09:37:00',<br> 
terminus_stop_id = u'8503376',<br> 
terminus_time = '10:00:00')

__B.1. Terminus transfer without walk__

[(u'8503381', '09:34:00', u'162.TA.26-655-j19-1.12.R', '00:03:00'), <br>
(u'8503376', '10:00:00', 'Terminus', '00:00:00')]


terminus_success_probability(<br> 
last_stop_id = u'8503376',<br> 
last_trip_id = u'39.TA.26-7-A-j19-1.12.H',<br> 
last_trip_arrival_time = '09:37:00',<br> 
terminus_stop_id = u'8503376',<br> 
terminus_time = '10:00:00')

__B.2. Terminus transfer with walk__

(u'8575944', '09:14:00', u'156.TA.26-650-j19-1.7.H', '00:02:00'), <br>
(u'8503374', '09:24:53', 'walk', '00:07:06'), <br>
(u'8503381',  '10:00:00', 'Terminus', '00:00:00')]

terminus_success_probability(<br> 
last_stop_id = u'8503374',<br> 
last_trip_id = u'156.TA.26-650-j19-1.7.H',<br> 
last_trip_arrival_time = '09:16:00',<br> 
terminus_stop_id = u'8503376',<br> 
terminus_time = '10:00:00')



## 5. Validation

In [11]:
# Prediction dataframes
period_pred = get_pred_file('/user/lortkipa/period_prediction_df.pkl')
stop_pred = get_pred_file('/user/lortkipa/stop_prediction_df.pkl')
full_pred = get_pred_file('/user/lortkipa/full_prediction_df.pkl')

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

Running system command: hdfs dfs -cat /user/lortkipa/period_prediction_df.pkl
Success
Running system command: hdfs dfs -cat /user/lortkipa/stop_prediction_df.pkl
Success
Running system command: hdfs dfs -cat /user/lortkipa/full_prediction_df.pkl
Success

In [17]:
# stops and their names
stops = read_hdfs('/user/lortkipa/filtered_stops_Premoved.pkl')

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

Running system command: hdfs dfs -cat /user/lortkipa/filtered_stops_Premoved.pkl
Success

#### We first try multiple schedule difference to see if the increase time for transfer has the right impact on probability

In [18]:
# One minute transfer
transfer_success_probability(stop1_id='8503304', trip1_id='39.TA.26-7-A-j19-1.12.H', trip1_arrival_time='08:50:00', 
                             stop2_id='8575944', trip2_id='156.TA.26-650-j19-1.7.H', trip2_departure_time='08:51:00', verbose=True)

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

Schedule difference: 60
Walking time: 17.1925633418
Transfer_time_surplus: -77.1925633418
Mean arrival delay: 92.391102873
Mean departure delay: 123.865744606
0.3836701579288434

In [19]:
# Five minutes transfer
transfer_success_probability(stop1_id='8503304', trip1_id='39.TA.26-7-A-j19-1.12.H', trip1_arrival_time='08:50:00', 
                             stop2_id='8575944', trip2_id='156.TA.26-650-j19-1.7.H', trip2_departure_time='08:55:00', verbose=True)

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

Schedule difference: 300
Walking time: 17.1925633418
Transfer_time_surplus: 162.807436658
Mean arrival delay: 92.391102873
Mean departure delay: 123.865744606
0.8956707945869421

In [20]:
# Ten minutes transfer
transfer_success_probability(stop1_id='8503304', trip1_id='39.TA.26-7-A-j19-1.12.H', trip1_arrival_time='08:50:00', 
                             stop2_id='8575944', trip2_id='156.TA.26-650-j19-1.7.H', trip2_departure_time='09:00:00', verbose=True)

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

Schedule difference: 600
Walking time: 17.1925633418
Transfer_time_surplus: 462.807436658
Mean arrival delay: 92.391102873
Mean departure delay: 123.865744606
0.9993096637863208

## Other tests

In [25]:
## Test for very far stations
transfer_success_probability(stop1_id='8503304', trip1_id='39.TA.26-7-A-j19-1.12.H', trip1_arrival_time='08:50:00', 
                             stop2_id='8575946', trip2_id='156.TA.26-650-j19-1.7.H', trip2_departure_time='09:00:00', verbose=True)

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

Schedule difference: 600
Walking time: 2782.60558465
Transfer_time_surplus: -2302.60558465
Mean arrival delay: 92.391102873
Mean departure delay: 123.865744606
3.357404360338517e-49

In [28]:
# Test for arrival time after departure time
transfer_success_probability(stop1_id='8503304', trip1_id='39.TA.26-7-A-j19-1.12.H', trip1_arrival_time='09:50:00', 
                             stop2_id='8503304', trip2_id='156.TA.26-650-j19-1.7.H', trip2_departure_time='09:00:00', verbose=True)

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

Schedule difference: -3000
Walking time: 0.0
Transfer_time_surplus: -3120.0
Mean arrival delay: 92.391102873
Mean departure delay: 128.27618165
4.5845366858185324e-85

In [26]:
# Test for terminus station
terminus_success_probability(last_stop_id = u'8503374',
                             last_trip_id = u'156.TA.26-650-j19-1.7.H',
                             last_trip_arrival_time = '09:16:00',
                             terminus_stop_id = u'8503376',
                             terminus_time = '10:00:00')

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

0.9999999999998929