Goal of this example is to extract route details, stops, and schedule information for one trip of a particular route. 

In [1]:
import pandas as pd

I live in Seattle, and want to find the information about bus line '7' that goes next to where I live. Seattle bus service is provided by King County Metro (KCM). I've downloaded the GTFS files for KCM from https://transitfeeds.com/. Side note: https://transitfeeds.com/ has a lot of agencies' data and you may find the one you're looking for there. I've saved KCM data under ./data/kcm/ directory. Change the directory to where you stored your gtfs feeds

In [2]:
bus_num = "7"

In [3]:
dir_path="./data/kcm"

Let's load routes.txt (Remember that even though the file extension is .txt, it is actually a csv file)

In [4]:
routes_file=dir_path+"/routes.txt"

In [5]:
df = pd.read_csv(routes_file)

Let's examine the first few rows of the csv file

In [6]:
df.head()

Unnamed: 0,route_id,agency_id,route_short_name,route_long_name,route_desc,route_type,route_url,route_color,route_text_color
0,100001,KCM,1,,Kinnear - Downtown Seattle,3,http://metro.kingcounty.gov/schedules/001/n0.html,,
1,100002,KCM,10,,Capitol Hill - Downtown Seattle,3,http://metro.kingcounty.gov/schedules/010/n0.html,,
2,100003,KCM,101,,Renton Transit Center - Downtown Seattle,3,http://metro.kingcounty.gov/schedules/101/n0.html,,
3,100004,KCM,105,,Renton Highlands - Renton Transit Center,3,http://metro.kingcounty.gov/schedules/105/n0.html,,
4,100005,KCM,106,,Renton Transit Center - Skyway - Downtown Seattle,3,http://metro.kingcounty.gov/schedules/106/n0.html,,


As expected, routes.txt contains all the different routes agencies operate, their short name, and the verbal description of the path they display on buses. Let's explore this file a bit more:

How many routes are there?

In [7]:
df['route_id'].nunique()

215

Are all these routes operated by the same agency?

In [8]:
df['agency_id'].nunique()

4

Routes.txt has 4 agencies listed. What are they, and how many routes are operated by each agencies?

In [9]:
df['agency_id'].value_counts()

KCM    203
ST       8
KMD      2
EOS      2
Name: agency_id, dtype: int64

I didn't realize when I first started that it is possible to have information from various agencies in one GTFS feed.

Does the route for bus line 7 even exist in this routes.txt file?

In [10]:
if df.isin([bus_num]).any().any():
    print ("Yes, {} route is in the routes list".format(bus_num))
else:
    print("No. Route { } is not part of this routes list".format(bus_num))
        

Yes, 7 route is in the routes list


Great! Let's see what information we have about bus line 7 in this file.

In [11]:
print(df[df.values == bus_num].T)

                                                                132
route_id                                                     100263
agency_id                                                       KCM
route_short_name                                                  7
route_long_name                                                 NaN
route_desc           Prentice St - Rainier Beach - Downtown Seattle
route_type                                                        3
route_url         http://metro.kingcounty.gov/schedules/007/n0.html
route_color                                                     NaN
route_text_color                                                NaN


Let's examine this information. Check https://developers.google.com/transit/gtfs/reference/#routestxt for more information
route_id is the internal id used by King County Metro for this particular route
route_short_name is the bus line number
route_desc is what one would see on the bus board display for this route, along with the short name '7'
route_type shows '3' indicating this is a bus (as opposed to a train or a subway). 
route_url will take you to KCM's website where they display the schedule

Recall that a particular instance of a route is called a trip. Trip information is contained in trips.txt, and the link between these two files is the route_id. See the relations picture.

In [12]:
bus_route_id = df[df.values == bus_num]["route_id"]

In [13]:
print(bus_route_id)

132    100263
Name: route_id, dtype: int64


Let's load the trips.txt file. Again, remember this is a csv file with txt extension

In [14]:
trips_file = dir_path+"/trips.txt"

In [15]:
trips_df = pd.read_csv(trips_file)

In [16]:
bus_trips = trips_df[trips_df['route_id'].isin([bus_route_id])]

In [17]:
bus_trips.head()

Unnamed: 0,route_id,service_id,trip_id,trip_headsign,trip_short_name,direction_id,block_id,shape_id,peak_flag,fare_id
17964,100263,75804,41980848,Rainier Beach Via Rainier Ave S,LOCAL,0,5580995,21007003,0,101
17965,100263,75804,41980851,Rainier Beach Via Rainier Ave S,LOCAL,0,5580999,21007003,0,101
17966,100263,75804,41980853,Rainier Beach Via Rainier Ave S,LOCAL,0,5580998,21007008,0,101
17967,100263,75804,41980857,Rainier Beach Via Rainier Ave S,LOCAL,0,5581000,21007008,0,101
17968,100263,75804,41980862,Rainier Beach Via Rainier Ave S,LOCAL,0,5580995,21007003,0,101


As expected each line in the output has the same route_id, as we're examining bus 7 route. trip_id is a unique for each of the trips. service_id points to which kind of service this trip follows. We'll leave the rest aside for now.

Information about the schedule is given in stop_times.txt. The columns in stop_times.txt are : trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled. The one's we're currently interested in are the trip_id (to look up the trip_id we're interested in), stop_id (to figure out which stops are on this trip), arrival_time & departure_time for that stop. 

In [18]:
stop_times_file = dir_path+"/stop_times.txt"

In [19]:
stop_times_df = pd.read_csv(stop_times_file)

  interactivity=interactivity, compiler=compiler, result=result)


As always, let's examine the first few rows of this file.

In [20]:
stop_times_df.head()

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled
0,34745814,07:04:00,07:04:00,74232,1,,0,0,0.0
1,34745814,07:04:18,07:04:18,74731,10,,0,0,435.7
2,34745814,07:04:52,07:04:52,74734,13,,0,0,1284.7
3,34745814,07:05:44,07:05:44,85450,16,,0,0,2554.8
4,34745814,07:09:00,07:09:00,70516,47,,0,0,7366.2


Let's pick one trip of route 7, and map its times. For simplicity we'll pick the trip mentioned in the first row of 7 routes.

In [21]:
a_bus_trip = bus_trips.iloc[0]

In [22]:
print(a_bus_trip)

route_id                                    100263
service_id                                   75804
trip_id                                   41980848
trip_headsign      Rainier Beach Via Rainier Ave S
trip_short_name                              LOCAL
direction_id                                     0
block_id                                   5580995
shape_id                                  21007003
peak_flag                                        0
fare_id                                        101
Name: 17964, dtype: object


Let's extract the rows from stop_times dataframe that correspond to this particular trip.

In [23]:
a_bus_trip_stop_times = stop_times_df.loc[stop_times_df.trip_id == a_bus_trip.trip_id]

In [24]:
a_bus_trip_stop_times.head()

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled
652708,41980848,15:21:00,15:21:00,880,1,,0,0,0.0
652709,41980848,15:26:00,15:26:00,430,20,,0,0,2405.8
652710,41980848,15:28:00,15:28:00,450,25,,0,0,3493.3
652711,41980848,15:31:19,15:31:19,480,30,,0,0,4890.6
652712,41980848,15:33:32,15:33:32,500,35,,0,0,5826.1


Now that we've arrival, departure times of this trip, we need to figure out the names of the stops along the way. Stops for this particular trip are in the stop_id of this data. The stop names are stored in stops.txt. Let's load that file.

In [25]:
stops_txt = dir_path+"/stops.txt"

In [26]:
stops_df = pd.read_csv(stops_txt)

Let's examine the first few rows of this file.

In [27]:
stops_df.head()

Unnamed: 0,stop_id,stop_code,stop_name,stop_desc,stop_lat,stop_lon,zone_id,stop_url,location_type,parent_station,stop_timezone
0,1000,,Pine St & 9th Ave,,47.613415,-122.332138,21,,0,,America/Los_Angeles
1,10000,,NE 55th St & 43rd Ave NE,,47.668575,-122.283653,1,,0,,America/Los_Angeles
2,10005,,40th Ave NE & NE 51st St,,47.665886,-122.284897,1,,0,,America/Los_Angeles
3,10010,,NE 55th St & 39th Ave NE,,47.668579,-122.285667,1,,0,,America/Los_Angeles
4,10020,,NE 55th St & 37th Ave NE,,47.668579,-122.2883,1,,0,,America/Los_Angeles


Recall this file contains all the stops in the transit area. stop_id is the unique identifier for a stop. stop_name is what you would see at the bus stop signs. stop_lon, and stop_lat give the GPS coordinates for the stop. stop_timezone is the time zone in which the stop is. For now, we will leave the others aside.

We're only interested in stops that bus line 7 makes. Let's extract only those stops out.

In [28]:
stops = stops_df[stops_df.stop_id.isin(a_bus_trip_stop_times.stop_id)]

In [29]:
stop_names = stops_df[['stop_id','stop_name']]

So now we have the stop_ids, and arrival/departure time in one data frame, and the stop_ids, and stop_names in another. Let's combine them to get comprehensive information about the schedule for this particular trip.

In [30]:
a_bus_trip_stop_times.merge(stop_names, on='stop_id')

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled,stop_name
0,41980848,15:21:00,15:21:00,880,1,,0,0,0.0,Virginia St & 6th Ave
1,41980848,15:26:00,15:26:00,430,20,,0,0,2405.8,3rd Ave & Pine St
2,41980848,15:28:00,15:28:00,450,25,,0,0,3493.3,3rd Ave & Union St
3,41980848,15:31:19,15:31:19,480,30,,0,0,4890.6,3rd Ave & Marion St
4,41980848,15:33:32,15:33:32,500,35,,0,0,5826.1,3rd Ave & James St
5,41980848,15:36:02,15:36:02,515,44,,0,0,6881.3,3rd Ave S & S Main St
6,41980848,15:38:00,15:38:00,1471,50,,0,0,7709.5,S Jackson St & 5th Ave S
7,41980848,15:39:32,15:39:32,1480,55,,0,0,8564.5,S Jackson St & Maynard Ave S
8,41980848,15:40:40,15:40:40,1490,59,,0,0,9188.2,S Jackson St & 8th Ave S
9,41980848,15:43:00,15:43:00,8540,63,,0,0,10489.8,S Jackson St & 12th Ave S


Voila! There we have it! The schedule for trip 41980848 of bus line 7! 

Since we have the stop_ids, which means we have their GPS locations, it would be fun to take the above information, and to show case it in a map form, including the times! That would be for another time!