<h1> <a href="https://gtfs.org/">GTFS: General Public Transit Feed Specification</a></h1>

Around the world, public transit agencies make data available about their services, routes, and stops via a standardized data format called <a href="https://gtfs.org/">GTFS</a> (originally developed by Google). 

It has two parts: the static component contains information that changes rarely including locations of stops, routes and schedules. A new version of this static information is typically released every few months. Some agencies also provide a real-time component based on live GPS data from their buses, trains etc to provide up to the minute data about vehicle positions and arrival predictions - typically updated every 30 seconds.

This practical exercise will be based on only the static GTFS data.

Start by downloading the current GTFS schedule data for South East Queensland from:
https://gtfsrt.api.translink.com.au/ (https://gtfsrt.api.translink.com.au/GTFS/SEQ_GTFS.zip)

You will need to upload the following files to your Jupyter account in the cloud:
- <code>calendar.txt</code>
- <code>routes.txt</code>
- <code>stops.txt</code>
- <code>stop_times.txt</code>
- <code>trips.txt</code>

# Finding our way to the CBD via public transport
Our goal is to travel from where we live to the Bribane CBD via public transport.
We don't know where the closest stop is, we don't know which route the trains or buses follow and we don't know when those buses or trains will arrive. 

Once you have <code>stops.txt</code> uploaded to your Jupyter account, open it to view its contents.

In [2]:
# Start by reading stops.txt into a pandas data frame using read_csv method and set the stop_id column as the index

import pandas
stops = pandas.read_csv('stops.txt', index_col = 0)

# display its contents
stops

Unnamed: 0_level_0,stop_code,stop_name,stop_desc,stop_lat,stop_lon,zone_id,stop_url,location_type,parent_station,platform_code
stop_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,1.0,Herschel Street Stop 1 near North Quay,,-27.467834,153.019079,1,https://translink.com.au/stop/000001/gtfs/,0,,
10,10.0,Ann Street Stop 10 at King George Square,,-27.468003,153.023970,1,https://translink.com.au/stop/000010/gtfs/,0,,
100,100.0,Parliament Stop 94A Margaret St,,-27.473751,153.026745,1,https://translink.com.au/stop/000100/gtfs/,0,,
1000,1000.0,Handford Rd at Songbird Way,,-27.339069,153.043907,2,https://translink.com.au/stop/001000/gtfs/,0,,
10000,10000.0,Balcara Ave near Allira Cr,,-27.344106,153.024982,2,https://translink.com.au/stop/010000/gtfs/,0,,
...,...,...,...,...,...,...,...,...,...,...
place_pinesc,,The Pines Shopping Centre,,-28.134660,153.469767,,,1,,
place_inttbl,,Toombul Shopping Centre interchange,,-27.408269,153.059963,,,1,,
place_intuq,,UQ Chancellors Place,,-27.497970,153.011136,,,1,,
place_scuniv,,USC station,,-26.718756,153.062004,,,1,,


In [3]:
# There are thousands of stops across south east Queensland. Our first goal is to find some stops near to where we live.

# We start by determining the longitude and latitude of the property where we live.
# Open google maps https://www.google.com/maps and locate the property where you currently live.
# Put a pin on that location and make note of the longitude and latitude. 
# The longitude should be about 153 and the latitude about -27

my_longitude = -27.38380639217319
my_latitude = 152.9595649359856

In [4]:
# Next we need to be able to measure the distance from our property to each of the stops. 
# To measure the distance between two sets of  longitude and latitude, we need to use a formula, 
# such as the haversine formula (https://en.wikipedia.org/wiki/Haversine_formula) to determine the
# distance between two points on a sphere (since the earth is not flat).
# The earth is not a perfect sphere, it's radius varies at different points, but we approximate its radius as 6371 kilometres.

import math

# https://en.wikipedia.org/wiki/Haversine_formula
def haversine_distance(lon1, lat1, lon2, lat2):
      # convert decimal degrees to radians 
      lon1 = math.radians(lon1)
      lat1 = math.radians(lat1)
      lon2 = math.radians(lon2)
      lat2 = math.radians(lat2)
        
      # haversine formula 
      dlon = lon2 - lon1 
      dlat = lat2 - lat1 
      a =  math.sin(dlat/2)**2 +  math.cos(lat1) * math.cos(lat2) *  math.sin(dlon/2)**2
      c = 2 * math.asin( math.sqrt(a)) 
      r = 6371 # Radius of earth in kilometers.
      return c * r
    
# test case
haversine_distance(-27.467834, 153.019079, -27.371936, 153.099357) # should be about 13 kilometres

13.040284606719029

In [5]:
# We can then use the function to compute the distance from our specified longitude and latitude, to each stop

def near(stop_row, lon, lat) :
    return haversine_distance(lon, lat, stop_row.stop_lat, stop_row.stop_lon)

stops['dist_from_home'] = stops.apply(near, lon=my_longitude, lat=my_latitude, axis=1)
stops # see the new column ...

Unnamed: 0_level_0,stop_code,stop_name,stop_desc,stop_lat,stop_lon,zone_id,stop_url,location_type,parent_station,platform_code,dist_from_home
stop_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,1.0,Herschel Street Stop 1 near North Quay,,-27.467834,153.019079,1,https://translink.com.au/stop/000001/gtfs/,0,,,10.634240
10,10.0,Ann Street Stop 10 at King George Square,,-27.468003,153.023970,1,https://translink.com.au/stop/000010/gtfs/,0,,,10.993769
100,100.0,Parliament Stop 94A Margaret St,,-27.473751,153.026745,1,https://translink.com.au/stop/000100/gtfs/,0,,,11.627712
1000,1000.0,Handford Rd at Songbird Way,,-27.339069,153.043907,2,https://translink.com.au/stop/001000/gtfs/,0,,,10.373096
10000,10000.0,Balcara Ave near Allira Cr,,-27.344106,153.024982,2,https://translink.com.au/stop/010000/gtfs/,0,,,8.269263
...,...,...,...,...,...,...,...,...,...,...,...
place_pinesc,,The Pines Shopping Centre,,-28.134660,153.469767,,,1,,,93.667157
place_inttbl,,Toombul Shopping Centre interchange,,-27.408269,153.059963,,,1,,,11.423856
place_intuq,,UQ Chancellors Place,,-27.497970,153.011136,,,1,,,12.680094
place_scuniv,,USC station,,-26.718756,153.062004,,,1,,,66.873603


In [6]:
# We can then sort the stops by this new column using the sort_values method

nearby_stops = stops.sort_values('dist_from_home')
nearby_stops

Unnamed: 0_level_0,stop_code,stop_name,stop_desc,stop_lat,stop_lon,zone_id,stop_url,location_type,parent_station,platform_code,dist_from_home
stop_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
13131,13131.0,Collins Rd near Hawkes Ave,,-27.382441,152.960972,2,https://translink.com.au/stop/013131/gtfs/,0,,,0.206800
13122,13122.0,"Collins Rd at Hillenvale, stop 1",,-27.382313,152.956593,2,https://translink.com.au/stop/013122/gtfs/,0,,,0.362052
14060,14060.0,Bunya Rd near Arlington Dr,,-27.386607,152.956511,2,https://translink.com.au/stop/014060/gtfs/,0,,,0.438462
14049,14049.0,Bunya Rd near South Pine Rd,,-27.388012,152.960901,2,https://translink.com.au/stop/014049/gtfs/,0,,,0.442226
14061,14061.0,Bunya Rd near Arlington Dr,,-27.386742,152.956495,2,https://translink.com.au/stop/014061/gtfs/,0,,,0.448393
...,...,...,...,...,...,...,...,...,...,...,...
600498,600498.0,"Cooran station, platform 1",,-26.334098,152.822809,8,https://translink.com.au/stop/600498/gtfs/,0,place_crnsta,1,105.006026
place_trvsta,,Traveston station,,-26.321113,152.784072,,,1,,,106.961227
600499,600499.0,"Traveston station, platform 1",,-26.320895,152.783804,8,https://translink.com.au/stop/600499/gtfs/,0,place_trvsta,1,106.987754
place_gymsta,,Gympie North station,,-26.159390,152.682995,,,1,,,124.958849


In [7]:
# Let's choose the first of these stops and see which buses or trains are coming soon and where they are going to ...
our_stop_id = nearby_stops.index[0]
our_stop_id

'13131'

In [8]:
# Read stop_times.txt into a data frame using the read_csv method.
# set the data type of the stop_id column to type string by adding parameter: dtype={'stop_id':'str'}

stop_times = pandas.read_csv('stop_times.txt', dtype={'stop_id':'str'})

In [9]:
# View just those stop_time rows that match our stop_id

stop_times[stop_times.stop_id==our_stop_id]

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
58678,23809006-BT 22_23-32984,05:51:00,05:51:00,13131,1,0.0,0.0
63627,23809170-BT 22_23-32984,06:21:00,06:21:00,13131,1,0.0,0.0
73367,23809504-BT 22_23-32984,06:51:00,06:51:00,13131,1,0.0,0.0
86722,23810000-BT 22_23-32984,07:21:00,07:21:00,13131,1,0.0,0.0
98747,23810484-BT 22_23-32984,07:51:00,07:51:00,13131,1,0.0,0.0
259111,23816008-BT 22_23-32984,17:02:00,17:02:00,13131,11,0.0,0.0
271566,23816463-BT 22_23-32984,17:33:00,17:33:00,13131,11,0.0,0.0
277570,23816680-BT 22_23-32984,17:47:00,17:47:00,13131,11,0.0,0.0
287653,23817049-BT 22_23-32984,18:17:00,18:17:00,13131,11,0.0,0.0
295418,23817325-BT 22_23-32984,18:49:00,18:49:00,13131,11,0.0,0.0


In [10]:
# Not all of those trips we necessarily be coming today. 
# Transit agencies run different schedules on different days of the week, especially for weekends and public holidays.
# To learn about these service schedules we need to load the calendar.txt file into a data frame.
# Set the service_id column as the index and parse the two date columns as dates

services = pandas.read_csv('calendar.txt', index_col = 0, parse_dates=['start_date','end_date'])
services

Unnamed: 0_level_0,monday,tuesday,wednesday,thursday,friday,saturday,sunday,start_date,end_date
service_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
ATS 22_23-31002,1,1,1,1,1,0,0,2023-05-09,2023-05-09
ATS 22_23-33233,0,0,0,0,0,1,0,2023-05-13,2023-05-13
ATS 22_23-33230,0,0,0,0,0,0,1,2023-05-14,2023-05-14
ATS 22_23-30826,1,1,1,1,1,0,0,2023-05-15,2023-05-16
ATS 22_23-31170,1,1,1,1,1,0,0,2023-05-17,2023-05-17
...,...,...,...,...,...,...,...,...,...
TDEV 23_24-33292,0,0,0,0,0,0,1,2023-07-02,2023-07-02
WBS 22_23-33175,1,1,1,1,1,0,0,2023-05-09,2023-06-23
WBS 22_23-33173,0,0,0,0,0,1,0,2023-05-13,2023-06-24
WBS 22_23-33171,0,0,0,0,0,0,1,2023-05-14,2023-06-25


In [11]:
# Start by viewing only those services that run on this day of the week.
# So, for example, if today is a Thurdsday, then we require services.thursday == 1

services[services.thursday == 1]

Unnamed: 0_level_0,monday,tuesday,wednesday,thursday,friday,saturday,sunday,start_date,end_date
service_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
ATS 22_23-31002,1,1,1,1,1,0,0,2023-05-09,2023-05-09
ATS 22_23-30826,1,1,1,1,1,0,0,2023-05-15,2023-05-16
ATS 22_23-31170,1,1,1,1,1,0,0,2023-05-17,2023-05-17
ATS 22_23-31000,1,1,1,1,1,0,0,2023-05-22,2023-05-23
BBL 22_23-30064,1,1,1,1,1,0,0,2023-05-09,2023-06-23
BBL 23_24-33204,1,1,1,1,1,0,0,2023-06-26,2023-07-07
BCC 22_23-32979,1,1,1,1,0,0,0,2023-05-09,2023-05-11
BCC 22_23-33299,1,1,1,1,0,0,0,2023-05-15,2023-06-22
BITS 22_23-30132,1,1,1,1,1,0,0,2023-05-09,2023-06-23
BITS 23_24-33239,1,1,1,1,1,0,0,2023-06-26,2023-07-07


In [12]:
# We also need to ensure that today falls within the start_date and end_date period of that service.
# For that we need to know today's date ...
import pytz
timezone = pytz.timezone('Australia/Brisbane')
today = pandas.Timestamp.now(tz=timezone).tz_localize(None)

In [13]:
# Find the list of service_ids for services that run today and are within the service start and end dates

todays_services = services[(services.thursday == 1) & (services.start_date <= today) & (today <= services.end_date)].index
todays_services

Index(['BBL 22_23-30064', 'BCC 22_23-32979', 'BITS 22_23-30132',
       'CDC 22_23-33264', 'BT 22_23-32984', 'BT 22_23-32984-1111000',
       'BT 22_23-32984-0111100', 'CBL 22_23-32696', 'CBL 22_23-32696-0001000',
       'GCLR 22_23-31205', 'GUNM 23-33029', 'HBL 22_23-33245',
       'KBL 22_23-32836', 'LCBS 22_23-33016', 'LCBS 22_23-33016-0001000',
       'LOGC 22_23-30141', 'LBS 22_23-33186', 'MGB 22_23-33199',
       'PRT 22_23-32673', 'SBL 22_23-32953', 'SBL 22_23-32953-1111000',
       'SUN 22_23-33168', 'SUN 22_23-33168-1111000', 'SUN 22_23-33168-0001000',
       'TBS 22_23-32138', 'TDEV 22_23-33085', 'WBS 22_23-33175'],
      dtype='object', name='service_id')

In [14]:
# Next we need to learn which trips occur on those services, so we need to load trips.txt into a pandas data frame.
# Set the trip_id column as the index.

trips = pandas.read_csv('trips.txt', index_col = 2)
trips

Unnamed: 0_level_0,route_id,service_id,trip_headsign,direction_id,block_id,shape_id
trip_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
20987679-ATS 22_23-30826,R581-2458,ATS 22_23-30826,"City, Roma St station",1,,R5810051
20987680-ATS 22_23-30826,R581-2458,ATS 22_23-30826,"City, Roma St station",1,,R5810051
20987681-ATS 22_23-30826,R581-2458,ATS 22_23-30826,"City, Roma St station",1,,R5810051
20987682-ATS 22_23-30826,R581-2458,ATS 22_23-30826,"City, Roma St station",1,,R5810051
20987683-ATS 22_23-30826,R581-2458,ATS 22_23-30826,"City, Roma St station",1,,R5810051
...,...,...,...,...,...,...
24082264-WBS 22_23-33171,528-3072,WBS 22_23-33171,Springfield station,0,,5280045
24082265-WBS 22_23-33171,533-3072,WBS 22_23-33171,Orion Springfield Central Anti-Clockwise,1,,5330018
24082266-WBS 22_23-33171,527-3072,WBS 22_23-33171,Orion Springfield Central,1,,5270039
24082267-WBS 22_23-33171,526-3072,WBS 22_23-33171,Springfield Central,1,,5260024


In [15]:
# To test if a trip is part of a service, we can use the isin method
# trips.service_id.isin(todays_services)

# Find the list of trip_ids for those trips
todays_trips = trips[trips.service_id.isin(todays_services)].index

todays_trips

Index(['20268222-BBL 22_23-30064', '20268223-BBL 22_23-30064',
       '20268224-BBL 22_23-30064', '20268225-BBL 22_23-30064',
       '23797266-BCC 22_23-32979', '23797267-BCC 22_23-32979',
       '23797268-BCC 22_23-32979', '23797269-BCC 22_23-32979',
       '23797270-BCC 22_23-32979', '23797271-BCC 22_23-32979',
       ...
       '24085502-WBS 22_23-33175', '24085503-WBS 22_23-33175',
       '24085504-WBS 22_23-33175', '24085505-WBS 22_23-33175',
       '24085506-WBS 22_23-33175', '24085507-WBS 22_23-33175',
       '24085508-WBS 22_23-33175', '24085509-WBS 22_23-33175',
       '24085510-WBS 22_23-33175', '24085511-WBS 22_23-33175'],
      dtype='object', name='trip_id', length=18090)

In [16]:
# We can then use this list of trip ids to find stop times matching these trip ids.
# stop_times.trip_id.isin(todays_trips)

# Find all stop times that stop at our stop today.
stop_times[(stop_times.stop_id==our_stop_id) & (stop_times.trip_id.isin(todays_trips)) ]

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
58678,23809006-BT 22_23-32984,05:51:00,05:51:00,13131,1,0.0,0.0
63627,23809170-BT 22_23-32984,06:21:00,06:21:00,13131,1,0.0,0.0
73367,23809504-BT 22_23-32984,06:51:00,06:51:00,13131,1,0.0,0.0
86722,23810000-BT 22_23-32984,07:21:00,07:21:00,13131,1,0.0,0.0
98747,23810484-BT 22_23-32984,07:51:00,07:51:00,13131,1,0.0,0.0
259111,23816008-BT 22_23-32984,17:02:00,17:02:00,13131,11,0.0,0.0
271566,23816463-BT 22_23-32984,17:33:00,17:33:00,13131,11,0.0,0.0
277570,23816680-BT 22_23-32984,17:47:00,17:47:00,13131,11,0.0,0.0
287653,23817049-BT 22_23-32984,18:17:00,18:17:00,13131,11,0.0,0.0
295418,23817325-BT 22_23-32984,18:49:00,18:49:00,13131,11,0.0,0.0


In [17]:
# We aren't interested in trying to catch any trains or buses that have already departed, 
# so view only those stop times that have an arrival_time after the time now.

time_now = today.strftime('%H:%M:%S')

arriving_soon = stop_times[(stop_times.stop_id==our_stop_id) & (stop_times.trip_id.isin(todays_trips)) & (time_now <= stop_times.arrival_time)  ]
arriving_soon

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
259111,23816008-BT 22_23-32984,17:02:00,17:02:00,13131,11,0.0,0.0
271566,23816463-BT 22_23-32984,17:33:00,17:33:00,13131,11,0.0,0.0
277570,23816680-BT 22_23-32984,17:47:00,17:47:00,13131,11,0.0,0.0
287653,23817049-BT 22_23-32984,18:17:00,18:17:00,13131,11,0.0,0.0
295418,23817325-BT 22_23-32984,18:49:00,18:49:00,13131,11,0.0,0.0


In [18]:
# That's great, but we don't know where any of these trains or buses are going to ...
# So, we start by joining this stop_time data with the trips data frame
stops_with_trips = arriving_soon.join(trips, on='trip_id')
stops_with_trips

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type,route_id,service_id,trip_headsign,direction_id,block_id,shape_id
259111,23816008-BT 22_23-32984,17:02:00,17:02:00,13131,11,0.0,0.0,396-3057,BT 22_23-32984,"Arana Hills, Bunya Rd",1,,3960002
271566,23816463-BT 22_23-32984,17:33:00,17:33:00,13131,11,0.0,0.0,396-3057,BT 22_23-32984,"Arana Hills, Bunya Rd",1,,3960002
277570,23816680-BT 22_23-32984,17:47:00,17:47:00,13131,11,0.0,0.0,396-3057,BT 22_23-32984,"Arana Hills, Bunya Rd",1,,3960002
287653,23817049-BT 22_23-32984,18:17:00,18:17:00,13131,11,0.0,0.0,396-3057,BT 22_23-32984,"Arana Hills, Bunya Rd",1,,3960002
295418,23817325-BT 22_23-32984,18:49:00,18:49:00,13131,11,0.0,0.0,396-3057,BT 22_23-32984,"Arana Hills, Bunya Rd",1,,3960002


In [None]:
# We now have a trip_headsign column, which may help us determine where the bus or train is going
# We also now have a route_id, but it's not particularly meaningful.
# To get information about the route we need to join our stop_time and trip data with the route.txt data.

In [19]:
# Read routes.txt into a pandas data frame.
# Set the route_id column as the index
routes = pandas.read_csv('routes.txt', index_col = 0)

In [20]:
# Join our stop_time and route data frame with the routes data frame based on the 'route_id' column

full = arriving_soon.join(trips, on='trip_id').join(routes, on='route_id')
full

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type,route_id,service_id,trip_headsign,direction_id,block_id,shape_id,route_short_name,route_long_name,route_desc,route_type,route_url,route_color,route_text_color
259111,23816008-BT 22_23-32984,17:02:00,17:02:00,13131,11,0.0,0.0,396-3057,BT 22_23-32984,"Arana Hills, Bunya Rd",1,,3960002,396,Mitchelton - Arana Hills,,3,https://jp.translink.com.au/plan-your-journey/...,8DC63F,0
271566,23816463-BT 22_23-32984,17:33:00,17:33:00,13131,11,0.0,0.0,396-3057,BT 22_23-32984,"Arana Hills, Bunya Rd",1,,3960002,396,Mitchelton - Arana Hills,,3,https://jp.translink.com.au/plan-your-journey/...,8DC63F,0
277570,23816680-BT 22_23-32984,17:47:00,17:47:00,13131,11,0.0,0.0,396-3057,BT 22_23-32984,"Arana Hills, Bunya Rd",1,,3960002,396,Mitchelton - Arana Hills,,3,https://jp.translink.com.au/plan-your-journey/...,8DC63F,0
287653,23817049-BT 22_23-32984,18:17:00,18:17:00,13131,11,0.0,0.0,396-3057,BT 22_23-32984,"Arana Hills, Bunya Rd",1,,3960002,396,Mitchelton - Arana Hills,,3,https://jp.translink.com.au/plan-your-journey/...,8DC63F,0
295418,23817325-BT 22_23-32984,18:49:00,18:49:00,13131,11,0.0,0.0,396-3057,BT 22_23-32984,"Arana Hills, Bunya Rd",1,,3960002,396,Mitchelton - Arana Hills,,3,https://jp.translink.com.au/plan-your-journey/...,8DC63F,0


In [21]:
# Filter the output so that we only see the trip_id, arrival_time, route_short_name, route_long_name, trip_headsign
show = full[['trip_id','arrival_time', 'route_short_name', 'route_long_name', 'trip_headsign']]
show

Unnamed: 0,trip_id,arrival_time,route_short_name,route_long_name,trip_headsign
259111,23816008-BT 22_23-32984,17:02:00,396,Mitchelton - Arana Hills,"Arana Hills, Bunya Rd"
271566,23816463-BT 22_23-32984,17:33:00,396,Mitchelton - Arana Hills,"Arana Hills, Bunya Rd"
277570,23816680-BT 22_23-32984,17:47:00,396,Mitchelton - Arana Hills,"Arana Hills, Bunya Rd"
287653,23817049-BT 22_23-32984,18:17:00,396,Mitchelton - Arana Hills,"Arana Hills, Bunya Rd"
295418,23817325-BT 22_23-32984,18:49:00,396,Mitchelton - Arana Hills,"Arana Hills, Bunya Rd"


In [23]:
# Lets select one of those trips to explore precisely where it goes ...
our_trip_id = show.iloc[0,0]

In [24]:
# Find all stop_times for our trip_id (do not restrict to our stop_id)

my_stops = stop_times[stop_times.trip_id == our_trip_id]
my_stops

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
259101,23816008-BT 22_23-32984,16:51:00,16:51:00,5011,1,0.0,0.0
259102,23816008-BT 22_23-32984,16:54:00,16:54:00,2193,2,0.0,0.0
259103,23816008-BT 22_23-32984,16:55:00,16:55:00,2096,3,0.0,0.0
259104,23816008-BT 22_23-32984,16:56:00,16:56:00,10355,4,0.0,0.0
259105,23816008-BT 22_23-32984,16:57:00,16:57:00,2120,5,0.0,0.0
259106,23816008-BT 22_23-32984,16:57:00,16:57:00,761,6,0.0,0.0
259107,23816008-BT 22_23-32984,16:58:00,16:58:00,2741,7,0.0,0.0
259108,23816008-BT 22_23-32984,16:59:00,16:59:00,2208,8,0.0,0.0
259109,23816008-BT 22_23-32984,16:59:00,16:59:00,1207,9,0.0,0.0
259110,23816008-BT 22_23-32984,17:01:00,17:01:00,2661,10,0.0,0.0


In [25]:
# Unfortunately, these stop_ids don't mean anything to us,
# so we need to join this data with the stops data frame
# display only the arrival_time and stop_name
my_stops.join(stops, on='stop_id')[['arrival_time', 'stop_name']]

Unnamed: 0,arrival_time,stop_name
259101,16:51:00,Mitchelton Rail station
259102,16:54:00,"Brookside Shopping Centre station, platform B"
259103,16:55:00,"Osborne Rd near Northmore St, stop 52"
259104,16:56:00,"Osborne Rd at Osborne North, stop 51"
259105,16:57:00,"Camelia Ave at Camelia - Galeola, stop 50"
259106,16:57:00,"Camelia Ave at Violet St, stop 49"
259107,16:58:00,"Camelia Ave at Camelia/Nymphaea, stop 48"
259108,16:59:00,"Camelia Ave at Mirbelia St, stop 47"
259109,16:59:00,South Pine Rd near Basand St
259110,17:01:00,South Pine Rd near Plucks Rd


In [None]:
# Will this get us towards the Brisbane CBD? If not, explore some other options.