<a id="top"></a>

The purpose of this notebook is to go through the process of making a sample prediction using the route model approach.

***

# Import Packages

In [1]:
import requests
import traceback
import json
import pickle
import datetime
import sys

***

<a id="contents"></a>
# Contents

- [1. Stop Travel Time %'s](#1)
- [2. Model Prediction](#2)
- [3. Total Dwell Time](#3)
- [4. Calculate Journey Time](#4)
- [5. Working with Google's Response](#5)
- [6. Streamline](#6)

***

# Chosen Route

UCD (N11 entrance) to Trinity College (Dawson Street).  
Route 39A, stops 768 to 793.

In [2]:
# set line and stops
line = '39A'
origin = '768'
dest = '793'

For this calculation we need the following variables:

- dest_pct
- origin_pct
- model_prediction
- total_dwelltime

***

<a id="1"></a>
# 1. Stop Travel Time %'s
[Back to contents](#contents)

## 1.1. Load Stop Time % Dictionary

In [3]:
# load stop time percent dictionary
f = open('/home/faye/Data-Analytics-CityRoute/data/stop_time_pct.json',)

stop_times = json.load(f)

## 1.2. Get Direction

In [4]:
# get direction of route
stops = list(stop_times[line]['outbound'].keys())

origin_index = stops.index(origin)
dest_index = stops.index(dest)

if origin_index < dest_index:
    direction = 'outbound'
else:
    direction = 'inbound'
    
print(f"Direction: {direction}")

Direction: outbound


In [5]:
all_stops = list(stop_times[line][direction].keys())

In [6]:
stops[origin_index:dest_index]

['768',
 '769',
 '770',
 '771',
 '772',
 '773',
 '774',
 '775',
 '776',
 '777',
 '779',
 '780',
 '781',
 '782',
 '783',
 '784',
 '785',
 '786']

## 1.3. Set % Variables

In [7]:
# set pct variables
# divide by 100 to convert to percentage
dest_pct = stop_times[line][direction][dest] / 100
origin_pct = stop_times[line][direction][origin] / 100

print(f"% of the total route for origin stop:      {origin_pct:.2}")
print(f"% of the total route for destination stop: {dest_pct:.2}")

% of the total route for origin stop:      0.027
% of the total route for destination stop: 0.27


<a id="2"></a>
# 2. Model Prediction
[Back to contents](#contents)

For the model we need the following parameters

parameters = [

    'HOUR',
    
    'DAYOFWEEK_Monday', <- dropped dummy
    'DAYOFWEEK_Tuesday',
    'DAYOFWEEK_Wednesday',
    'DAYOFWEEK_Thursday',
    'DAYOFWEEK_Friday',
    'DAYOFWEEK_Saturday',
    'DAYOFWEEK_Sunday',
       
    'MONTHOFSERVICE_January', <- dropped dummy
    'MONTHOFSERVICE_February',
    'MONTHOFSERVICE_March',
    'MONTHOFSERVICE_April',
    'MONTHOFSERVICE_May',
    'MONTHOFSERVICE_June',
    'MONTHOFSERVICE_July',
    'MONTHOFSERVICE_August',
    'MONTHOFSERVICE_September',
    'MONTHOFSERVICE_October',
    'MONTHOFSERVICE_November',
    'MONTHOFSERVICE_December',
    
    'IS_HOLIDAY_0', <- dropped dummy
    'IS_HOLIDAY_1',
    
    'humidity',
    'rain_1h',
    'temp',
    'wind_speed',
    
    'weather_main_Clear', <- dropped dummy
    'weather_main_Clouds',
    'weather_main_Drizzle',
    'weather_main_Fog',
    'weather_main_Mist',
    'weather_main_Rain',
    'weather_main_Smoke',
    'weather_main_Snow',
    
]

## 2.1. Load in Model

In [8]:
# set model name using line and direction
model_name = f"route_{line}_{direction}"

In [9]:
# set file path
file_path = f"/home/faye/Data-Analytics-CityRoute/route_models/{model_name}.pkl"

In [10]:
# load the model
linear_reg = pickle.load(open(file_path, 'rb'))

## 2.2. Set Calendar Variables

Take the user input as 09/08/2021 at 09:00

In [11]:
# set hour (from user input)
hour = 9

In [12]:
# set day
dayofweek = [0,0,0,0,0,0]

In [13]:
# set month
monthofservice = [0,0,0,0,0,0,0,0,0,0,0]

# month is August (at index 6)
monthofservice[6] = 1

In [14]:
# set is_holiday - selected day isn't a holiday
is_holiday = 0

## 2.3. Set Weather Variables

In [15]:
# Query OWM for forecasted weather
WEATHER = "http://api.openweathermap.org/data/2.5/weather?"
CITY = "Dublin,IE"
OW_APIkey = 'a13b4ad387112eb8e23757d9dcdbc27d'

curr_weather = requests.get(WEATHER, params={"q": CITY, "appid": OW_APIkey, "units": 'metric'})

weather = curr_weather.json()

In [16]:
# set continuous weather variables
humidity = weather['main']['humidity']
try:
    rain_1h = weather['rain']['1h']
except:
    rain_1h = 0
temp = weather['main']['temp']
wind_speed = weather['wind']['speed']

weather_cont = [humidity, rain_1h, temp, wind_speed]

In [17]:
# set weather_main
weather_main = [0, 0, 0, 0, 0, 0, 0]

main = weather['weather'][0]['main']

if main == 'Clouds':
    weather_main[0] = 1
elif main == 'Drizzle':
    weather_main[1] = 1
elif main == 'Fog':
    weather_main[2] = 1
elif main == 'Mist':
    weather_main[3] = 1
elif main == 'Rain':
    weather_main[4] = 1
elif main == 'Smoke':
    weather_main[5] = 1
else:
    weather_main[6] = 1

## 2.4. Make Model Prediction

In [18]:
# set parameters array 
parameters = []

parameters.append(hour)
parameters.extend(dayofweek)
parameters.extend(monthofservice)
parameters.append(is_holiday)
parameters.extend(weather_cont)
parameters.extend(weather_main)

In [19]:
# print parameters array
print("Model Input Parameters:")
print(parameters)

Model Input Parameters:
[9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 68, 0, 18.91, 6.26, 1, 0, 0, 0, 0, 0, 0]


In [20]:
# make prediction (returns predict time for entire route)
model_prediction = linear_reg.predict([parameters])[0]

print(f"Model prediciton: {model_prediction} seconds")

Model prediciton: 5182.206021712331 seconds


***

<a id="3"></a>
# 3. Total Dwell Time
[Back to contents](#contents)

In [21]:
# Load stop_dwelltimes dictionary
f = open('/home/faye/Data-Analytics-CityRoute/data/stop_dwelltimes.json',)

stop_dwelltimes = json.load(f)

In [22]:
# get list of route stops using line and direction
all_stops = list(stop_times[line][direction].keys())

In [23]:
# get all stops on selected journey using origin and dest indices (from above section 1.2.)
journey_stops = all_stops[origin_index:dest_index]

In [24]:
# calculate total dwelltime
total_dwelltime = 0
for stop in journey_stops:
    total_dwelltime += stop_dwelltimes[stop]

print(f"Total dwell time for selected journey: {total_dwelltime} seconds")

Total dwell time for selected journey: 254.26562730816823 seconds


***

<a id="4"></a>
# 4. Calculate Journey Time
[Back to contents](#contents)

For this calculation we need the following variables:

- dest_pct
- origin_pct
- model_prediction
- total_dwelltime

In [25]:
# calculate predicted journey time
predicted_journey_time = (dest_pct * model_prediction) - (origin_pct * model_prediction) + total_dwelltime

In [26]:
# print predicted journey time
print(predicted_journey_time)
print(datetime.timedelta(seconds=predicted_journey_time))

1514.8022271841405
0:25:14.802227


> If a user selected the route UCD to Trinity (stop 768 to 793 on line 39A) for Monday 9 August at 09:00 our model predicts it should take   
24 minutes and 26 seconds.  
A google maps prediction for the same route conditions returns an estimated journey time of 15 minutes.

#### Updated Calculation

(As Steph pointed out) Dwelltime would actually be accounted for in the model as it is trained on the whole trip time which includes the dwelltime at each stop.

In [27]:
# calculate predicted journey time
predicted_journey_time = (dest_pct * model_prediction) - (origin_pct * model_prediction)

In [28]:
# print predicted journey time
print(predicted_journey_time)
print(datetime.timedelta(seconds=predicted_journey_time))

1260.5365998759723
0:21:00.536600


***

<a id="5"></a>
# 5. Working with Google's Response
[Back to contents](#contents)

If we were to choose the same route google will provide us with:
- Start stop: UCD N11 Entrance
- End stop: Dawson Street, stop **793**
- Number of stops: **15**
- Route: **145**

>Can we get the start stop number?

Are stops unique to inbound/outbound?

In [29]:
# get list of stops
out_stops = list(stop_times['145']['outbound'].keys())
in_stops = list(stop_times['145']['inbound'].keys())

In [30]:
# check if stop 793 is in outbound
'793' in out_stops

False

In [31]:
# check if stop 793 is in inbound
'793' in in_stops

False

So I tried using what google gives us along with the stop_time_pct.json, using the route UCD to Trinity, which gives us the end stop 793 and route 145 and in that dictionary the route 145 doesn't have stop 793...

#### New Route

Phoenix Park to Ha'penny Bridge.  
- **Start stop:** park view, stop 1692  
- **End Stop:** bachelors walk  
- **Route**: 37  
- **#Stops**: 28  

In [32]:
# set variables
start_stop = 'park view, stop 1692'
end_stop = 'bachelors walk'
route = '37'
num_stops = 28

In [33]:
# get outbound and inbound stops
outbound_stops = list(stop_times[route]['outbound'].keys())
inbound_stops = list(stop_times[route]['inbound'].keys())

In [34]:
# get stop num
def get_stop_num(stop_str):
    # if string doesn't contain 'stop'
    if stop_str.find('stop') == -1:
        return False
    # else extract stop number
    else:
        stop_split = stop_str.split(' ')
        stop_num = stop_split[-1]
        return stop_num

In [35]:
# get direction
def get_direction(route, stop_num):
    # check outbound first
    outbound_stops = list(stop_times[route]['outbound'].keys())
    
    if stop_num in outbound_stops:
        return 'outbound'
    else:
        return 'inbound'

In [36]:
def get_start_stop_num(route, direction, num_stops, end_stop_num):
    stops = list(stop_times[route][direction].keys())
    end_index = stops.index(end_stop_num)
    start_index = end_index - num_stops
    
    return stops[start_index]

In [37]:
def get_end_stop_num(route, direction, num_stops, start_stop_num):
    stops = list(stop_times[route][direction].keys())
    start_index = stops.index(start_stop_num)
    end_index = start_index + num_stops
    
    return stops[end_index]

In [38]:
# get start stop num
start_stop_num = get_stop_num(start_stop)
start_stop_num

'1692'

In [39]:
# get end stop num
end_stop_num = get_stop_num(end_stop)
end_stop_num

False

In [40]:
# get direction of journey
direction = get_direction(route, start_stop_num)
direction

'inbound'

In [41]:
if start_stop_num == False:
    start_stop_num = get_start_stop_num(route, direction, num_stops, start_stop_num)
elif end_stop_num == False:
    end_stop_num = get_end_stop_num(route, direction, num_stops, start_stop_num)

#### Information available now

In [42]:
print(f"Route    : {route}")
print(f"Direciton: {direction}")
print(f"Start #  : {start_stop_num}")
print(f"End #    : {end_stop_num}")

Route    : 37
Direciton: inbound
Start #  : 1692
End #    : 315


#### For Route Model

We have all we need, we can now get the origin % and destination %

#### For Stop-to-Stop Model

We need a list of the stops on the journey

In [43]:
def get_journey_stops(route, direction, start_stop_num, end_stop_num):
    all_stops = list(stop_times[route][direction].keys())
    start_index = all_stops.index(start_stop_num)
    end_index = all_stops.index(end_stop_num)
    journey_stops = all_stops[start_index:end_index + 1]
    
    return journey_stops

In [44]:
# get list of stops on selected journey
stops_list = get_journey_stops(route, direction, start_stop_num, end_stop_num)
print(f"# stops on journey: {len(stops_list)}") # check length, should be #stop + 1 (the start stop)
print(stops_list)

# stops on journey: 29
['1692', '1693', '1694', '1695', '1696', '1697', '1698', '1699', '1700', '1701', '1702', '1703', '1704', '1705', '1706', '1707', '1708', '1709', '1528', '1710', '1711', '1712', '1713', '1714', '1715', '7453', '1478', '1479', '315']


> This checks out with the start and end stop numbers.

In [45]:
# if start_stop_num == False:
    # start_stop_num = get_start_stop_num
# elif end_stop_num == False:
    # end_stop_num = get_end_stop_num

In [46]:
# try get stop number for both
    # get_stop_num(start)
    # get_stop_num(end)
# get direction

***

<a id="6"></a>
# 6. Streamline
[Back to contents](#contents)

In [55]:
# Load stop time percent dictionary
file = open('/home/faye/Data-Analytics-CityRoute/data/stop_time_pct.json',)
stop_times = json.load(file)

# Function to get stop number
def get_stop_num(stop_str):
    """
    Function to extract stop number from string
    Returns False if no stop number in string
    """
    # if string doesn't contain 'stop'
    if stop_str.find('stop') == -1:
        return False
    # else extract stop number
    else:
        stop_split = stop_str.split(' ')
        stop_num = stop_split[-1]
        return stop_num


# Function to get start stop number
def get_start_stop_num(route, direction, num_stops, end_stop_num):
    """
    Function to get start stop num using stops list
    """
    stops = list(stop_times[route][direction].keys())
    end_index = stops.index(end_stop_num)
    start_index = end_index - num_stops
    
    return stops[start_index]


# Function to get end stop number
def get_end_stop_num(route, direction, num_stops, start_stop_num):
    """
    Function to get end stop num using sotps list
    """
    stops = list(stop_times[route][direction].keys())
    start_index = stops.index(start_stop_num)
    end_index = start_index + num_stops
    
    return stops[end_index]


# Function to get direction of journey
def get_direction(route, stop_num):
    """
    Function to return the direction of the journey
    Assumes a stop number is unique to inbound/outbound
    """
    outbound_stops = list(stop_times[route]['outbound'].keys())
    
    if stop_num in outbound_stops:
        return 'outbound'
    else:
        return 'inbound'


# Function to get stops on journey
def get_journey_stops(route, direction, start_stop_num, end_stop_num):
    """
    Function to return a list of stops for selected journey
    """
    all_stops = list(stop_times[route][direction].keys())
    start_index = all_stops.index(start_stop_num)
    end_index = all_stops.index(end_stop_num)
    journey_stops = all_stops[start_index:end_index + 1]
    
    return journey_stops

In [52]:
# set variables - works
start_stop = 'park view, stop 1692'
end_stop = 'bachelors walk'
route = '37'
num_stops = 28

In [49]:
# set variables - works
start_stop = "trinity college, south frederick's street"
end_stop = "frascati sc, stop 6334"
route = "4"
num_stops = 22

In [50]:
# set variables - route not found
start_stop = "ikea, stop 7698"
end_stop = "donnybrook (stadium)"
route = "155"
num_stops = 36

In [57]:
# set variables - works roughly
start_stop = "lucan comm college, stop 4620"
end_stop = "arran quay"
route = "25A"
num_stops = 29

In [60]:
# set variables - 
start_stop = "custom house, stop 407"
end_stop = "east wall road, stop 2508"
route = "151"
num_stops = 6

In [61]:
# check we have route
if route in list(stop_times.keys()):
    pass 
else:
    sys.exit("ERROR: Route not found - Use Google")
    

# get start stop num
start_stop_num = get_stop_num(start_stop)

# get end stop num
end_stop_num = get_stop_num(end_stop)

# incase either returned false (no stop in string)
if start_stop_num == False and end_stop_num == False:
    # return google's prediction - we were given no stop numbers
    sys.exit("ERROR: No stop numbers - Use Google")
    
elif start_stop_num == False:
    # get direction of journey
    direction = get_direction(route, end_stop_num) 
    # get start stop num
    start_stop_num = get_start_stop_num(route, direction, num_stops, end_stop_num)
elif end_stop_num == False:
    # get direction of journey
    direction = get_direction(route, start_stop_num)
    # get end stop num
    end_stop_num = get_end_stop_num(route, direction, num_stops, start_stop_num)

# if model_type == route:
# we have what we need 

# elif model_type == stop_to_stop
# get list of stops on selected journey
stops_list = get_journey_stops(route, direction, start_stop_num, end_stop_num)

In [62]:
print(f"Route    : {route}")
print(f"Direciton: {direction}")
print(f"Start #  : {start_stop_num}")
print(f"End #    : {end_stop_num}")
print(f"Length   : {len(stops_list) - (num_stops + 1) == 0}")

Route    : 151
Direciton: inbound
Start #  : 407
End #    : 2508
Length   : True


#### Information available now

#### For Route Model

We have all we need, we can now get the origin % and destination %

#### For Stop-to-Stop Model

We need a list of the stops on the journey

In [None]:
# get list of stops on selected journey
stops_list = get_journey_stops(route, direction, start_stop_num, end_stop_num)
print(f"# stops on journey: {len(stops_list)}") # check length, should be #stop + 1 (the start stop)
print(stops_list)

> This checks out with the start and end stop numbers.

In [None]:
# if start_stop_num == False:
    # start_stop_num = get_start_stop_num
# elif end_stop_num == False:
    # end_stop_num = get_end_stop_num

In [None]:
# try get stop number for both
    # get_stop_num(start)
    # get_stop_num(end)
# get direction

***

[Back to top](#top)