# SEPTA Data Project
#### William McKee
#### December 2017

SEPTA is a public agency responsible for the public transportation system in Philadelphia and its Pennsylvania suburbs.  SEPTA stands for Southeastern Pennsylvania Transportation Authority. 

This code analyzes the data set for SEPTA Bus and Rail lines downloaded from https://transitfeeds.com.  I downloaded the SEPTA Bus zip file and renamed gfts.zip to septa_bus_gfts.zip.  I downloaded the SEPTA Rail zip file and renamed gfts.zip to septa_rail_gfts.zip.

## Data Set Conversion

The code below checks the contents of both zip files, displays some zip file contents, and converts the files to csv format.

In [1]:
import zipfile
import csv
import os

def read_and_print_first_lines_from_zipped_file(zipfilename, limit):
    """
    Reads zip file and prints the first limit lines from each file contained in the zip file
    zipfilename = zip file name (such as 'example.zip')
    limit = number of lines to print in file
    """
    print()
    print("CONTENTS OF ZIP FILE " + zipfilename + ":")
    print()
    with zipfile.ZipFile(zipfilename, 'r') as z:
        file_name_list = sorted(z.namelist())
        for file in file_name_list:
            print(file)
            with z.open(file, 'r') as input_file:
                for line_number, line in enumerate(input_file):
                    if line_number > limit:
                        break
                    print(line)
            print()
    print()

# Loop through zip files
NUM_LINES = 5
ZIP_FILE_NAMES = ['septa_bus_gfts.zip', 'septa_rail_gfts.zip']
DIRECTORY_NAMES = []
for file in ZIP_FILE_NAMES:
    # Read the zip files and display some file contents
    read_and_print_first_lines_from_zipped_file(file, NUM_LINES)

    # Extract zip file contents
    directory_name = os.path.splitext(file)[0]
    DIRECTORY_NAMES.append(directory_name)
    with zipfile.ZipFile(file, 'r') as zip_ref:
        zip_ref.extractall(directory_name)

    # Convert txt files to csv files
    os.chdir(directory_name)
    for input_file in os.listdir('.'):
        with open(input_file, 'r') as in_file:
            stripped = (line.strip() for line in in_file)
            lines = (line.split(",") for line in stripped if line)
            output_file = os.path.splitext(input_file)[0] + ".csv"
            print("Convert " + input_file + " contents to " + output_file)
            with open(output_file, 'w', ) as out_file:
                writer = csv.writer(out_file, lineterminator = '\n')
                writer.writerows(lines)
            
    # Remove original text files
    for item in os.listdir('.'):
        if item.endswith(".txt"):
            os.remove(item)

    os.chdir('..')


CONTENTS OF ZIP FILE septa_bus_gfts.zip:

agency.txt
b'agency_name,agency_url,agency_timezone,agency_lang,agency_fare_url\r\n'
b'SEPTA,http://www.septa.org,America/New_York,EN,http://www.septa.org/fares/transit/index.html'

calendar.txt
b'service_id,monday,tuesday,wednesday,thursday,friday,saturday,sunday,start_date,end_date\r\n'
b'10,1,1,1,1,1,0,0,20170903,20180224\r\n'
b'11,0,0,0,0,0,0,0,20170903,20180224\r\n'
b'12,0,0,0,0,0,1,0,20170903,20180224\r\n'
b'13,0,0,0,0,0,0,1,20170903,20180224\r\n'
b'16,1,1,1,1,1,0,0,20170903,20180224\r\n'

calendar_dates.txt
b'service_id,date,exception_type\r\n'
b'10,20170904,2\r\n'
b'13,20170904,1\r\n'
b'16,20170904,2\r\n'
b'19,20170904,1\r\n'
b'22,20170904,2\r\n'

fare_attributes.txt
b'fare_id,price,currency_type,payment_method,transfers,transfer_duration\r\n'
b'1,2.50,USD,0,0,0\r\n'
b'2,3.50,USD,0,1,3600\r\n'
b'3,4.50,USD,0,2,3600\r\n'
b'13,7.00,USD,0,0,0\r\n'
b'14,8.00,USD,0,1,3600\r\n'

fare_rules.txt
b'fare_id,origin_id,destination_id\r\n'
b'1,1,1\

## Data Set Basics

We explore our CSV files for both bus and rail.  The groupings occur only on specific files which will reveal more useful information.

In [2]:
import pandas as pd

# Print sizes of CSV files
for directory in DIRECTORY_NAMES:
    os.chdir(directory)
    print("Looking at " + directory + " contents")
    print()
    for input_file in os.listdir('.'):
        print("Description of " + input_file + ":")
        data_set = pd.read_csv(input_file)
        print(data_set.shape)
        print()
    os.chdir('..')

Looking at septa_bus_gfts contents

Description of agency.csv:
(1, 5)

Description of calendar.csv:
(28, 10)

Description of calendar_dates.csv:
(168, 3)

Description of fare_attributes.csv:
(6, 6)

Description of fare_rules.csv:
(9, 3)

Description of routes.csv:
(139, 7)

Description of shapes.csv:
(570717, 4)

Description of stops.csv:
(13701, 8)

Description of stop_times.csv:
(3121225, 5)

Description of transfers.csv:
(1, 4)

Description of trips.csv:
(52000, 7)

Looking at septa_rail_gfts contents

Description of agency.csv:
(1, 6)

Description of calendar.csv:
(5, 10)

Description of calendar_dates.csv:
(2, 3)

Description of fare_attributes.csv:
(0, 6)

Description of fare_rules.csv:
(0, 5)

Description of routes.csv:
(13, 9)

Description of shapes.csv:
(180235, 4)

Description of stops.csv:
(155, 7)

Description of stop_times.csv:
(25082, 7)

Description of transfers.csv:
(3, 3)

Description of trips.csv:
(1711, 8)



In [3]:
# Groupby could provide useful information for some files
GROUPBY_FILES_FIELDS = {'shapes.csv': 'shape_id', 
                        'trips.csv': 'route_id', 
                        'stop_times.csv': 'trip_id'}
for directory in DIRECTORY_NAMES:
    os.chdir(directory)
    print("Looking at " + directory + " groupby contents")
    print()
    for input_file in os.listdir('.'):
        if (input_file in GROUPBY_FILES_FIELDS.keys()):
            this_field = GROUPBY_FILES_FIELDS[input_file]
            print("Description of " + input_file + " Groupings:")
            data_set = pd.read_csv(input_file)
            data_set_distinct = data_set.groupby(this_field)[this_field].count()
            print(data_set_distinct)
            print()
    os.chdir('..')

Looking at septa_bus_gfts groupby contents

Description of shapes.csv Groupings:
shape_id
203286     296
203287     296
203288     280
203290     195
203291     273
203292     188
203293     289
203305     211
203307     189
203308     204
203310     182
203311      95
203312     288
203313      84
203314      84
203315     338
203316     790
203317     896
203318     790
203319     900
203320     872
203322     305
203323     372
203324     387
203325     790
203326     306
203327     321
203329     381
203330     361
203332     814
          ... 
206251     328
206287     253
206288     276
206289     170
206290     249
206291     249
206293     235
206294     257
206305     251
206308     403
206309     578
206310     484
206311    1180
206312     496
206313    1086
206314     597
206315     409
206317     304
206318     381
206319    1001
206320     990
206321     408
206322     370
206323    1079
206324     457
206325     366
206326     884
206327    1337
206329    1419
208290    

## Train Routes

Here, I will get the data associated with various train routes (such as Trenton Rail Line for train #734).

In [4]:
# Train directory
os.chdir(DIRECTORY_NAMES[1])

# List route information
routes_data_set = pd.read_csv('routes.csv')

print("Train Routes Data Set")
print(routes_data_set.to_string(columns=['route_id', 'route_short_name', 'route_color'], index=False))

Train Routes Data Set
route_id          route_short_name route_color
    AIR              Airport Line      91456C
    CHE   Chestnut Hill East Line      94763C
    CHW   Chestnut Hill West Line      00B4B2
    LAN  Lansdale/Doylestown Line      775B49
    MED          Media/Elwyn Line      007CC8
    FOX            Fox Chase Line      FF823D
    NOR  Manayunk/Norristown Line      EE4C69
    PAO      Paoli/Thorndale Line      20825C
    CYN               Cynwyd Line      6F549E
    TRE              Trenton Line      F683C9
    WAR           Warminster Line      F7AF42
    WIL    Wilmington/Newark Line      8AD16B
    WTR         West Trenton Line      5D5EBC


In [5]:
# Find trips associated with different train lines
train_lines = ['TRE', 'WTR', 'FOX']

def get_line_data(route_id):
    '''
    Returns data set associated with a particular train or bus route
    route_id = train or bus line route Id code (such as TRE or 28)
    '''
    trips_data_set = pd.read_csv('trips.csv')
    trips_data_set = trips_data_set.loc[trips_data_set['route_id'] == route_id]
    return trips_data_set

def print_line_data(route_id):
    '''
    Prints data associated with a particular train or bus route
    route_id = train or bus line route Id code (such as TRE or 28)
    '''
    trips_data_set = get_line_data(route_id)

    print("Trips Data Set for line " + route_id + ":")
    print(trips_data_set.to_string(columns=['trip_id', 'service_id', 'trip_headsign', 'block_id', 'shape_id'], 
                                   index=False, justify='left'))

for line in train_lines:
    print_line_data(line)
    print()
    print()

Trips Data Set for line TRE:
trip_id         service_id trip_headsign              block_id  shape_id
 TRE_717_V77_M  M4                          Trenton   717        7701  
 TRE_723_V77_M  M4                          Trenton   723        7701  
  TRE_773_V5_M  M1                          Trenton   773        7701  
  TRE_705_V5_M  M1                          Trenton   705        7701  
 TRE_9741_V5_M  M1                          Trenton  9741        7701  
 TRE_711_V66_M  M3                          Trenton   711        7701  
 TRE_7218_V5_M  M1         Center City Philadelphia  7218      701007  
 TRE_708_V77_M  M4         Center City Philadelphia   708      701004  
 TRE_1766_V5_M  M1         Center City Philadelphia  1766      701005  
  TRE_774_V5_M  M1         Center City Philadelphia   774      701004  
 TRE_722_V77_M  M4         Center City Philadelphia   722      701004  
 TRE_7406_V5_M  M1         Center City Philadelphia  7406      701004  
 TRE_9737_V5_M  M1                

In [6]:
# Find trips associated with individual trains
train_trips = [[['TRE'], [ 734,  735], [ 'TRE_734_V5_M', 'TRE_735_V5_M']], 
               [['WTR'], [ 361, 6370], [ 'WTR_361_V5_M', 'WTR_6370_V55_M']], 
               [['FOX'], [8749, 7848], ['FOX_8749_V5_M', 'FOX_7848_V1_M']]]

def get_vehicle_data(route_id, vehicle_id):
    '''
    Returns data set associated with a particular train car or bus
    route_id   = train or bus line route Id code (such as TRE or 28)
    vehicle_id = vehicle Id code (such as 1308)
    '''
    # Get route data
    trips_data_set = get_line_data(route_id)
    
    # Get vehicle data
    vehicle_data_set = trips_data_set.loc[trips_data_set['block_id'] == vehicle_id]
    return vehicle_data_set
    
def print_vehicle_data(route_id, vehicle_id):
    '''
    Prints data associated with a particular train car or bus
    route_id   = train or bus line route Id code (such as TRE or 28)
    vehicle_id = vehicle Id code (such as 1308)
    '''
    # Get vehicle data
    vehicle_data_set = get_vehicle_data(route_id, vehicle_id)

    print("Trips Data Set for line " + route_id + " vehicle #" + str(vehicle_id) + ":")
    print(vehicle_data_set.to_string(columns=['route_id', 'service_id', 'trip_id', 'trip_headsign', 'block_id', 'shape_id'], 
                                     index=False, justify='left'))

for route_id_list, train_id_list, trip_id_list in train_trips:
    route_id = route_id_list[0]
    for train_id in train_id_list:
        print_vehicle_data(route_id, train_id)
        print()
        print()

Trips Data Set for line TRE vehicle #734:
route_id service_id trip_id        trip_headsign              block_id  shape_id
TRE      M3         TRE_734_V66_M  Center City Philadelphia  734       701004  
TRE      M1          TRE_734_V5_M  Center City Philadelphia  734       701004  
TRE      M4         TRE_734_V77_M  Center City Philadelphia  734       701004


Trips Data Set for line TRE vehicle #735:
route_id service_id trip_id        trip_headsign  block_id  shape_id
TRE      M1          TRE_735_V5_M  Trenton       735       7701    
TRE      M3         TRE_735_V66_M  Trenton       735       7701    
TRE      M4         TRE_735_V77_M  Trenton       735       7701


Trips Data Set for line WTR vehicle #361:
route_id service_id trip_id       trip_headsign              block_id  shape_id
WTR      M1         WTR_361_V5_M  Center City Philadelphia  361       327007


Trips Data Set for line WTR vehicle #6370:
route_id service_id trip_id         trip_headsign  block_id  shape_id
WTR      M

In [7]:
# Obtain the schedule for individual trains
def get_schedule(trip_id):
    '''
    Returns schedule data set associated with a particular trip
    trip_id    = trip Id code (such as TRE_734_V5_M)
    '''
    # Get stop times data set
    stop_times_data_set = pd.read_csv('stop_times.csv')
    stop_times_data_set = stop_times_data_set[stop_times_data_set['trip_id'] == trip_id]
    return stop_times_data_set
    
def print_schedule(trip_id):
    '''
    Prints schedule associated with a particular trip
    trip_id  = trip Id code (such as TRE_734_V5_M)
    '''
    # Get stop times data set
    stop_times_data_set = get_schedule(trip_id)

    print("Stop Times Data Set for trip " + str(trip_id) + ":")
    print(stop_times_data_set.to_string(columns=['trip_id', 'arrival_time', 'stop_id', 'stop_sequence'], 
                                       index=False, justify='left'))
        
for route_id_list, train_id_list, trip_id_list in train_trips:
    route_id = route_id_list[0]
    for train_id, trip_id in zip(train_id_list, trip_id_list):
        print_schedule(trip_id)
        print()
        print()

Stop Times Data Set for trip TRE_734_V5_M:
trip_id       arrival_time  stop_id  stop_sequence
TRE_734_V5_M  10:43:00     90701     1           
TRE_734_V5_M  10:50:00     90702     4           
TRE_734_V5_M  10:54:00     90703     6           
TRE_734_V5_M  10:58:00     90704     7           
TRE_734_V5_M  11:00:00     90705     8           
TRE_734_V5_M  11:02:00     90706     9           
TRE_734_V5_M  11:05:00     90707    11           
TRE_734_V5_M  11:09:00     90708    12           
TRE_734_V5_M  11:10:00     90709    13           
TRE_734_V5_M  11:13:00     90710    15           
TRE_734_V5_M  11:20:00     90711    17           
TRE_734_V5_M  11:33:00     90004    27


Stop Times Data Set for trip TRE_735_V5_M:
trip_id       arrival_time  stop_id  stop_sequence
TRE_735_V5_M  12:55:00     90007    18           
TRE_735_V5_M  13:01:00     90006    21           
TRE_735_V5_M  13:06:00     90005    23           
TRE_735_V5_M  13:10:00     90004    27           
TRE_735_V5_M  13:20:0

In [8]:
pd.options.mode.chained_assignment = None  # default='warn'

# Obtain the stops for individual trains
def get_stops(trip_id):
    '''
    Returns stops data set associated with a particular trip
    trip_id  = trip Id code (such as TRE_734_V5_M)
    '''
    # Get schedule data set
    stop_times_data_set = get_schedule(trip_id)
    
    # Obtain the list of stops for this train
    stop_sequence_ids = dict(zip(stop_times_data_set.stop_id, stop_times_data_set.stop_sequence))
    
    # Get stops data set
    stops_data_set = pd.read_csv('stops.csv')
    stops_data_set = stops_data_set[stops_data_set['stop_id'].isin(stop_sequence_ids.keys())]
    
    # Add sequence id to stops data set
    sequence_ids = []
    for stop_id in stops_data_set['stop_id']:
        sequence_ids.append(stop_sequence_ids[stop_id])
    stops_data_set['sequence_id'] = sequence_ids
    stops_data_set.sort_values('sequence_id',inplace=True)
    
    return stops_data_set


def print_stops(trip_id):
    '''
    Prints stops associated with a particular trip
    trip_id  = trip Id code (such as TRE_734_V5_M)
    '''
    # Get stops data set
    stops_data_set = get_stops(trip_id)
    
    print("Stops Data Set for trip " + str(trip_id) + ":")
    print(stops_data_set.to_string(columns=['sequence_id', 'stop_id', 'stop_name', 'stop_lat', 'stop_lon', 'zone_id'], 
                                   index=False, justify='left'))
    
for route_id_list, train_id_list, trip_id_list in train_trips:
    route_id = route_id_list[0]
    for train_id, trip_id in zip(train_id_list, trip_id_list):
        print_stops(trip_id)
        print()
        print()

Stops Data Set for trip TRE_734_V5_M:
sequence_id  stop_id stop_name                   stop_lat   stop_lon  zone_id
 1           90701                      Trenton  40.217778 -74.755000  NJ    
 4           90702          Levittown-Tullytown  40.140278 -74.816944   4    
 6           90703                      Bristol  40.104722 -74.854722   4    
 7           90704                      Croydon  40.093611 -74.906667   3    
 8           90705                    Eddington  40.083056 -74.933611   3    
 9           90706            Cornwells Heights  40.071667 -74.952222   3    
11           90707                   Torresdale  40.054444 -74.984444   3    
12           90708               Holmesburg Jct  40.032778 -75.023611   2    
13           90709                       Tacony  40.023333 -75.038889   2    
15           90710                   Bridesburg  40.010556 -75.069722   2    
17           90711    North Philadelphia Amtrak  39.997222 -75.155000   1    
27           90004        

In [9]:
# Obtain the shape of the routes for the individual trains
def get_shape_data(route_id, vehicle_id, trip_id):
    '''
    Returns data set associated with the route shape for a particular train car or bus
    route_id   = train or bus line route Id code (such as TRE or 28)
    vehicle_id = vehicle Id code (such as 1308)
    trip_id    = trip Id code (such as TRE_734_V5_M)
    '''
    # Get route data
    trips_data_set = get_line_data(route_id)
    
    # Get vehicle/trip data
    vehicle_trip_data_set = trips_data_set.loc[trips_data_set['block_id'] == vehicle_id]
    vehicle_trip_data_set = vehicle_trip_data_set.loc[vehicle_trip_data_set['trip_id'] == trip_id]
    
    # Obtain the route shape Id (assume one row in vehicle_trip_data_set)
    route_shape_id = vehicle_trip_data_set.iloc[0]['shape_id']
    
    # Get shapes data set
    shapes_data_set = pd.read_csv('shapes.csv')
    shapes_data_set = shapes_data_set[shapes_data_set['shape_id'] == route_shape_id]
    return shapes_data_set
    
def print_shape_data(route_id, vehicle_id, trip_id):
    '''
    Prints data associated with the route shape of a particular train car or bus
    route_id   = train or bus line route Id code (such as TRE or 28)
    vehicle_id = vehicle Id code (such as 1308)
    trip_id    = trip Id code (such as TRE_734_V5_M)
    '''
    # Get shapes data
    shapes_data_set = get_shape_data(route_id, vehicle_id, trip_id)

    print("Route Shapes Data Set for line " + route_id + " vehicle #" + str(vehicle_id) + " trip " + str(trip_id) + ":")
    print(shapes_data_set.to_string(index=False, justify='left'))
        #columns=['route_id', 'service_id', 'trip_id', 'trip_headsign', 'block_id', 'shape_id'], 
        #index=False, justify='left'))

for route_id_list, train_id_list, trip_id_list in train_trips:
    route_id = route_id_list[0]
    for train_id, trip_id in zip(train_id_list, trip_id_list):
        print_shape_data(route_id, train_id, trip_id)
        print()
        print()

Route Shapes Data Set for line TRE vehicle #734 trip TRE_734_V5_M:
shape_id  shape_pt_lat  shape_pt_lon  shape_pt_sequence
701004    40.217730    -74.754934        1             
701004    40.217708    -74.754959        2             
701004    40.217597    -74.755093        3             
701004    40.217559    -74.755134        4             
701004    40.217497    -74.755203        5             
701004    40.217469    -74.755234        6             
701004    40.217432    -74.755279        7             
701004    40.217396    -74.755324        8             
701004    40.217324    -74.755409        9             
701004    40.217242    -74.755505       10             
701004    40.217159    -74.755601       11             
701004    40.217152    -74.755609       12             
701004    40.217077    -74.755698       13             
701004    40.216994    -74.755795       14             
701004    40.216913    -74.755892       15             
701004    40.216830    -74.755988    

Route Shapes Data Set for line TRE vehicle #735 trip TRE_735_V5_M:
shape_id  shape_pt_lat  shape_pt_lon  shape_pt_sequence
7701      39.981274    -75.149438        1             
7701      39.981195    -75.149447        2             
7701      39.980945    -75.149470        3             
7701      39.980886    -75.149476        4             
7701      39.980801    -75.149485        5             
7701      39.980594    -75.149507        6             
7701      39.980545    -75.149512        7             
7701      39.980382    -75.149533        8             
7701      39.980297    -75.149543        9             
7701      39.980237    -75.149551       10             
7701      39.980223    -75.149553       11             
7701      39.980149    -75.149562       12             
7701      39.979814    -75.149604       13             
7701      39.979751    -75.149612       14             
7701      39.979719    -75.149616       15             
7701      39.979632    -75.149627    

Route Shapes Data Set for line WTR vehicle #361 trip WTR_361_V5_M:
shape_id  shape_pt_lat  shape_pt_lon  shape_pt_sequence
327007    40.257785    -74.815186        1             
327007    40.257702    -74.815233        2             
327007    40.257575    -74.815304        3             
327007    40.257404    -74.815399        4             
327007    40.257234    -74.815495        5             
327007    40.257125    -74.815556        6             
327007    40.256977    -74.815640        7             
327007    40.256881    -74.815694        8             
327007    40.256752    -74.815767        9             
327007    40.256716    -74.815787       10             
327007    40.256195    -74.816074       11             
327007    40.256138    -74.816105       12             
327007    40.255541    -74.816432       13             
327007    40.255491    -74.816459       14             
327007    40.255407    -74.816505       15             
327007    40.255108    -74.816667    

Route Shapes Data Set for line WTR vehicle #6370 trip WTR_6370_V55_M:
shape_id  shape_pt_lat  shape_pt_lon  shape_pt_sequence
4327      39.956596    -75.181676        1             
4327      39.956581    -75.181553        2             
4327      39.956574    -75.181501        3             
4327      39.956557    -75.181374        4             
4327      39.956549    -75.181311        5             
4327      39.956535    -75.181198        6             
4327      39.956526    -75.181129        7             
4327      39.956524    -75.181113        8             
4327      39.956523    -75.181109        9             
4327      39.956500    -75.180960       10             
4327      39.956480    -75.180842       11             
4327      39.956478    -75.180826       12             
4327      39.956450    -75.180669       13             
4327      39.956428    -75.180547       14             
4327      39.956407    -75.180433       15             
4327      39.956390    -75.180349 

Route Shapes Data Set for line FOX vehicle #8749 trip FOX_8749_V5_M:
shape_id  shape_pt_lat  shape_pt_lon  shape_pt_sequence
815007    40.076405    -75.083368        1             
815007    40.076259    -75.083395        2             
815007    40.076219    -75.083401        3             
815007    40.076178    -75.083408        4             
815007    40.076137    -75.083414        5             
815007    40.076096    -75.083420        6             
815007    40.076054    -75.083426        7             
815007    40.076011    -75.083433        8             
815007    40.075967    -75.083439        9             
815007    40.075923    -75.083445       10             
815007    40.075879    -75.083452       11             
815007    40.075834    -75.083460       12             
815007    40.075788    -75.083467       13             
815007    40.075741    -75.083473       14             
815007    40.075696    -75.083480       15             
815007    40.075694    -75.083481  

shape_id  shape_pt_lat  shape_pt_lon  shape_pt_sequence
4815      39.956596    -75.181676        1             
4815      39.956581    -75.181553        2             
4815      39.956574    -75.181501        3             
4815      39.956557    -75.181374        4             
4815      39.956549    -75.181311        5             
4815      39.956535    -75.181198        6             
4815      39.956526    -75.181129        7             
4815      39.956524    -75.181113        8             
4815      39.956523    -75.181109        9             
4815      39.956500    -75.180960       10             
4815      39.956480    -75.180842       11             
4815      39.956478    -75.180826       12             
4815      39.956450    -75.180669       13             
4815      39.956428    -75.180547       14             
4815      39.956407    -75.180433       15             
4815      39.956390    -75.180349       16             
4815      39.956375    -75.180276       17      

## Bus Routes

Now it's time to explore Bus Routes (such as the #28 Bus which runs in northeast Philadelphia).

In [10]:
# Bus directory
os.chdir('..')
os.chdir(DIRECTORY_NAMES[0])

# List route information
routes_data_set = pd.read_csv('routes.csv')

print("Routes Data Set")
print(routes_data_set.to_string(columns=['route_id', 'route_long_name', 'route_type'], index=False))

Routes Data Set
route_id                      route_long_name  route_type
      1             Parx Casino to 54th-City           3
     10          13th-Market to 63rd-Malvern           0
    101                  Media to 69th St TC           0
    102            Sharon Hill to 69th St TC           0
    103                Ardmore to 69th St TC           3
    104         West Chester U to 69th St TC           3
    105               Rosemont to 69th St TC           3
    106                  Paoli to 69th St TC           3
    107      Lawrence Park to 69th St TC-107           3
    108         UPS or Airport to 69th St TC           3
    109             Chester TC to 69th St TC           3
     11      13th-Market to Darby Trans Cntr           0
    110           Penn State U to 69th St TC           3
    111            Chadds Ford to 69th St TC           3
    112                   DCCC to 69th St TC           3
    113         Tri State Mall to 69th St TC           3
    114       

In [11]:
# Find trips associated with different bus lines
bus_lines = ['28', '70', '88']

for line in bus_lines:
    print_line_data(line)
    print()
    print()

Trips Data Set for line 28:
trip_id  service_id trip_headsign                      block_id  shape_id
45049    10          "Fern Rock Transportation Center"  1308     205690  
45050    10          "Fern Rock Transportation Center"  1312     205690  
45051    10          "Fern Rock Transportation Center"  1313     205690  
45052    10          "Fern Rock Transportation Center"  1215     205690  
45054    10          "Fern Rock Transportation Center"  1316     205690  
45055    10          "Fern Rock Transportation Center"  1315     205690  
45056    10          "Fern Rock Transportation Center"  1314     205690  
45057    10          "Fern Rock Transportation Center"  1312     205690  
45058    10          "Fern Rock Transportation Center"  1230     205690  
45059    10          "Fern Rock Transportation Center"  1315     205690  
45060    10          "Fern Rock Transportation Center"  1314     205690  
45061    10          "Fern Rock Transportation Center"  1312     205690  
45062    1

Trips Data Set for line 88:
trip_id  service_id trip_headsign                   block_id  shape_id
70378    10                       "Willits-Crispin"  2509     206099  
70379    10                       "Willits-Crispin"  2512     206099  
70380    10                       "Willits-Crispin"  2510     206099  
70381    10                "Holy Redeemer Hospital"  1552     206098  
70382    10                       "Willits-Crispin"  2503     206099  
70383    10                "Holy Redeemer Hospital"  2511     206098  
70384    10                "Holy Redeemer Hospital"  2509     206098  
70385    10                       "Willits-Crispin"  2507     206099  
70386    10                       "Holme-Pennypack"  2502     206097  
70387    10                       "Holme-Pennypack"  2507     206097  
70388    10                "Holy Redeemer Hospital"  1551     206098  
70389    10                       "Holme-Pennypack"  2501     206097  
70390    10                       "Holme-Pennypac

In [12]:
# Find trips associated with individual buses
bus_trips = [[['28'], ['1308', '1309'], [45116, 45115]], 
             [['70'], ['1225', '1226'], [67107, 67110]], 
             [['88'], ['2511', '2512'], [70443, 70445]]]

for route_id_list, bus_id_list, trip_id_list in bus_trips:
    route_id = route_id_list[0]
    for bus_id in bus_id_list:
        print_vehicle_data(route_id, bus_id)
        print()
        print()

Trips Data Set for line 28 vehicle #1308:
route_id  service_id  trip_id trip_headsign                      block_id  shape_id
28       10          45049    "Fern Rock Transportation Center"  1308     205690  
28       10          45068    "Fern Rock Transportation Center"  1308     205690  
28       10          45072    "Fern Rock Transportation Center"  1308     205690  
28       10          45076    "Fern Rock Transportation Center"  1308     205690  
28       10          45086                 "Torresdale-Cottman"  1308     205693  
28       10          45108                 "Torresdale-Cottman"  1308     205693  
28       10          45112                 "Torresdale-Cottman"  1308     205693  
28       10          45116                 "Torresdale-Cottman"  1308     205693  
28       11          45120    "Fern Rock Transportation Center"  1308     205690  
28       11          45123    "Fern Rock Transportation Center"  1308     205690  
28       11          45128    "Fern Rock Tra

In [13]:
# Get and print the bus schedules
for route_id_list, bus_id_list, trip_id_list in bus_trips:
    route_id = route_id_list[0]
    for bus_id, trip_id in zip(bus_id_list, trip_id_list):
        print_schedule(trip_id)
        print()
        print()

Stop Times Data Set for trip 45116:
trip_id arrival_time  stop_id  stop_sequence
45116    11:51:00       841      1          
45116    11:51:00     17921      2          
45116    11:51:00     23418      3          
45116    11:52:00     23419      4          
45116    11:52:00     23420      5          
45116    11:52:00     23421      6          
45116    11:53:00     23422      7          
45116    11:53:00     23423      8          
45116    11:54:00     23424      9          
45116    11:54:00     23425     10          
45116    11:55:00     23426     11          
45116    11:55:00     23427     12          
45116    11:55:00     23428     13          
45116    11:56:00     23429     14          
45116    11:56:00     23430     15          
45116    11:57:00     23431     16          
45116    11:57:00     23432     17          
45116    11:58:00     23433     18          
45116    11:58:00     23434     19          
45116    11:59:00     23435     20          
45116    11:59:00  

Stop Times Data Set for trip 67107:
trip_id arrival_time  stop_id  stop_sequence
67107    16:56:00       841      1          
67107    16:56:00     17921      2          
67107    16:57:00     17923      3          
67107    16:57:00     17925      4          
67107    16:57:00     24123      5          
67107    16:58:00     17926      6          
67107    16:58:00     17928      7          
67107    16:59:00     31433      8          
67107    17:00:00     22723      9          
67107    17:00:00     22724     10          
67107    17:01:00     22725     11          
67107    17:01:00     22726     12          
67107    17:02:00     32102     13          
67107    17:02:00     22727     14          
67107    17:03:00      1003     15          
67107    17:03:00     22728     16          
67107    17:03:00     26381     17          
67107    17:04:00     22729     18          
67107    17:05:00     22730     19          
67107    17:05:00     22731     20          
67107    17:06:00  

Stop Times Data Set for trip 70443:
trip_id arrival_time  stop_id  stop_sequence
70443    14:58:00      1117     1           
70443    14:58:00     22410     2           
70443    14:59:00     31660     3           
70443    14:59:00     25497     4           
70443    15:00:00      1116     5           
70443    15:01:00     22411     7           
70443    15:03:00     22412     8           
70443    15:03:00     22413     9           
70443    15:04:00     22414    10           
70443    15:05:00      1120    11           
70443    15:05:00      1120    12           
70443    15:05:00     22416    13           
70443    15:06:00     22417    14           
70443    15:06:00     22418    15           
70443    15:07:00     22419    16           
70443    15:07:00     22420    17           
70443    15:08:00     22421    18           
70443    15:09:00     22422    19           
70443    15:09:00     22423    20           
70443    15:10:00     22424    21           
70443    15:10:00  

In [14]:
pd.options.mode.chained_assignment = None  # default='warn'

# Get and print the bus routes
for route_id_list, bus_id_list, trip_id_list in bus_trips:
    route_id = route_id_list[0]
    for bus_id, trip_id in zip(bus_id_list, trip_id_list):
        print_stops(trip_id)
        print()
        print()

Stops Data Set for trip 45116:
sequence_id  stop_id stop_name                            stop_lat   stop_lon   zone_id
  1            841       Fern Rock Transportation Center  40.041940 -75.136970  1.0    
  2          17921                10th St & Champlost St  40.043383 -75.136596  1.0    
  3          23418                Champlost St & 11th St  40.043781 -75.138014  1.0    
  4          23419                  11th St & Spencer St  40.045146 -75.137808  1.0    
  5          23420                  11th St & Godfrey Av  40.046537 -75.137507  1.0    
  6          23421                   11th St & Medary Av  40.048287 -75.137570  1.0    
  7          23422                  11th St & Chelten Av  40.049473 -75.137400  1.0    
  8          23423                     11th St & 65th Av  40.051329 -75.136979  1.0    
  9          23424                     11th St & 66th Av  40.052890 -75.136629  1.0    
 10          23425                      11th St & Oak Ln  40.054844 -75.136207  1.0    
 

Stops Data Set for trip 45115:
sequence_id  stop_id stop_name                            stop_lat   stop_lon   zone_id
  1            841       Fern Rock Transportation Center  40.041940 -75.136970  1.0    
  2          17921                10th St & Champlost St  40.043383 -75.136596  1.0    
  3          23418                Champlost St & 11th St  40.043781 -75.138014  1.0    
  4          23419                  11th St & Spencer St  40.045146 -75.137808  1.0    
  5          23420                  11th St & Godfrey Av  40.046537 -75.137507  1.0    
  6          23421                   11th St & Medary Av  40.048287 -75.137570  1.0    
  7          23422                  11th St & Chelten Av  40.049473 -75.137400  1.0    
  8          23423                     11th St & 65th Av  40.051329 -75.136979  1.0    
  9          23424                     11th St & 66th Av  40.052890 -75.136629  1.0    
 10          23425                      11th St & Oak Ln  40.054844 -75.136207  1.0    
 

Stops Data Set for trip 67107:
sequence_id  stop_id stop_name                               stop_lat   stop_lon   zone_id
  1            841          Fern Rock Transportation Center  40.041940 -75.136970  1.0    
  2          17921                   10th St & Champlost St  40.043383 -75.136596  1.0    
  3          17923                     10th St & Spencer St  40.044946 -75.136224  1.0    
  4          17925                     10th St & Godfrey Av  40.046132 -75.135995  1.0    
  5          24123          Godfrey Av & Franklin St - MBNS  40.046093 -75.134588  1.0    
  6          17926                      Godfrey Av & 7th St  40.045963 -75.132223  1.0    
  7          17928                      Godfrey Av & 6th St  40.045843 -75.130568  1.0    
  8          31433                       5th St & Medary Av  40.046248 -75.128261  1.0    
  9          22723                 5th St & Chelten Av - FS  40.047372 -75.128032  1.0    
 10          22724                         5th St & 64th Av

Stops Data Set for trip 67110:
sequence_id  stop_id stop_name                         stop_lat   stop_lon   zone_id
 1             841    Fern Rock Transportation Center  40.041940 -75.136970  1.0    
 2           17921             10th St & Champlost St  40.043383 -75.136596  1.0    
 3           17923               10th St & Spencer St  40.044946 -75.136224  1.0    
 4           17925               10th St & Godfrey Av  40.046132 -75.135995  1.0    
 5           24123    Godfrey Av & Franklin St - MBNS  40.046093 -75.134588  1.0    
 6           17926                Godfrey Av & 7th St  40.045963 -75.132223  1.0    
 7           17928                Godfrey Av & 6th St  40.045843 -75.130568  1.0    
 8           31433                 5th St & Medary Av  40.046248 -75.128261  1.0    
 9           22723           5th St & Chelten Av - FS  40.047372 -75.128032  1.0    
10           22724                   5th St & 64th Av  40.048915 -75.127694  1.0    
11           22725                

Stops Data Set for trip 70445:
sequence_id  stop_id stop_name                                        stop_lat   stop_lon   zone_id
53           31771                      Willits Rd & Crispin Dr - FS  40.054835 -75.010202  1.0    
54           22440                      Holme Av & Willits Rd & - FS  40.056676 -75.013703  1.0    
55            1123                             Holme Av & Convent Av  40.056667 -75.019001  1.0    
56           27762                           Holme Av & Longford St   40.056632 -75.022124  1.0    
57           30757                         Holme Av & Arthur St - FS  40.056644 -75.025790  1.0    
58           22441                              Holme Av & Holme Cir  40.056642 -75.028096  1.0    
59           22442                             Welsh Rd & Colfax St   40.054608 -75.028143  1.0    
60           22443                              Welsh Rd & Wilson St  40.052222 -75.027340  1.0    
61           22444                         Welsh Rd & Winchester Av  

In [15]:
# Get and print the bus route shapes
for route_id_list, bus_id_list, trip_id_list in bus_trips:
    route_id = route_id_list[0]
    for bus_id, trip_id in zip(bus_id_list, trip_id_list):
        print_shape_data(route_id, bus_id, trip_id)
        print()
        print()

Route Shapes Data Set for line 28 vehicle #1308 trip 45116:
shape_id  shape_pt_lat  shape_pt_lon  shape_pt_sequence
205693    40.041995    -75.136952     224              
205693    40.043530    -75.136653     225              
205693    40.043628    -75.137396     226              
205693    40.043735    -75.138210     227              
205693    40.045245    -75.137862     228              
205693    40.046633    -75.137564     229              
205693    40.046727    -75.137546     230              
205693    40.046806    -75.137536     231              
205693    40.046865    -75.137528     232              
205693    40.046937    -75.137525     233              
205693    40.047014    -75.137531     234              
205693    40.047697    -75.137594     235              
205693    40.048286    -75.137652     236              
205693    40.048339    -75.137653     237              
205693    40.048403    -75.137644     238              
205693    40.048708    -75.137610     239   

Route Shapes Data Set for line 28 vehicle #1309 trip 45115:
shape_id  shape_pt_lat  shape_pt_lon  shape_pt_sequence
205693    40.041995    -75.136952     224              
205693    40.043530    -75.136653     225              
205693    40.043628    -75.137396     226              
205693    40.043735    -75.138210     227              
205693    40.045245    -75.137862     228              
205693    40.046633    -75.137564     229              
205693    40.046727    -75.137546     230              
205693    40.046806    -75.137536     231              
205693    40.046865    -75.137528     232              
205693    40.046937    -75.137525     233              
205693    40.047014    -75.137531     234              
205693    40.047697    -75.137594     235              
205693    40.048286    -75.137652     236              
205693    40.048339    -75.137653     237              
205693    40.048403    -75.137644     238              
205693    40.048708    -75.137610     239   

Route Shapes Data Set for line 70 vehicle #1225 trip 67107:
shape_id  shape_pt_lat  shape_pt_lon  shape_pt_sequence
206047    40.041995    -75.136952       1              
206047    40.043530    -75.136653       2              
206047    40.043862    -75.136588       3              
206047    40.044292    -75.136485       4              
206047    40.045032    -75.136279       5              
206047    40.046068    -75.136063       6              
206047    40.046157    -75.136043       7              
206047    40.046220    -75.136023       8              
206047    40.046264    -75.135908       9              
206047    40.046234    -75.135823      10              
206047    40.046214    -75.135741      11              
206047    40.046198    -75.135649      12              
206047    40.046197    -75.135626      13              
206047    40.046192    -75.135528      14              
206047    40.046158    -75.134756      15              
206047    40.046158    -75.134733      16   

Route Shapes Data Set for line 70 vehicle #1226 trip 67110:
shape_id  shape_pt_lat  shape_pt_lon  shape_pt_sequence
206046    40.041995    -75.136952       1              
206046    40.043530    -75.136653       2              
206046    40.043862    -75.136588       3              
206046    40.044292    -75.136485       4              
206046    40.045032    -75.136279       5              
206046    40.046068    -75.136063       6              
206046    40.046157    -75.136043       7              
206046    40.046220    -75.136023       8              
206046    40.046264    -75.135908       9              
206046    40.046234    -75.135823      10              
206046    40.046214    -75.135741      11              
206046    40.046198    -75.135649      12              
206046    40.046197    -75.135626      13              
206046    40.046192    -75.135528      14              
206046    40.046158    -75.134756      15              
206046    40.046158    -75.134733      16   

Route Shapes Data Set for line 88 vehicle #2511 trip 70443:
shape_id  shape_pt_lat  shape_pt_lon  shape_pt_sequence
206110    40.109084    -75.080432      76              
206110    40.109469    -75.079321      77              
206110    40.109617    -75.078920      78              
206110    40.109809    -75.078500      79              
206110    40.109908    -75.078294      80              
206110    40.109981    -75.078159      81              
206110    40.110104    -75.077948      82              
206110    40.110336    -75.077601      83              
206110    40.110884    -75.076773      84              
206110    40.111006    -75.076589      85              
206110    40.111483    -75.075873      86              
206110    40.112041    -75.075043      87              
206110    40.112731    -75.074011      88              
206110    40.112926    -75.073689      89              
206110    40.113673    -75.072737      90              
206110    40.113828    -75.072485      91   

Route Shapes Data Set for line 88 vehicle #2512 trip 70445:
shape_id  shape_pt_lat  shape_pt_lon  shape_pt_sequence
206104    40.055186    -75.010847      80              
206104    40.055727    -75.011678      81              
206104    40.055821    -75.011820      82              
206104    40.056071    -75.012198      83              
206104    40.056273    -75.012503      84              
206104    40.056297    -75.012626      85              
206104    40.056412    -75.012815      86              
206104    40.056526    -75.013019      87              
206104    40.056618    -75.013173      88              
206104    40.056613    -75.013779      89              
206104    40.056609    -75.014206      90              
206104    40.056607    -75.016139      91              
206104    40.056605    -75.016588      92              
206104    40.056599    -75.018653      93              
206104    40.056598    -75.019081      94              
206104    40.056598    -75.019406      95   