# Overview

This Jupyter Notebook takes in GTFS data and then combines and adjusts the data in order for the MyBus tool.  Run all cells to output these files:

* `lines.json` - This file is a list of line numbers to be used in the MyBus tool (to select lines).
* `[line-number].json` - A file for each line number that lists unique stops across all trips for that line.

As of 5/28/21 data should be output for 141 lines.

__TODO: Output files with all matching `stop_id`s for each unique `line` + `stop_name` combination__

## Contents

[Load GTFS Data](#load-gtfs-data)

[Output Lines](#output-lines)



In [2]:
import pandas as pd
import numpy as np

DATA_INPUT_PATH = 'data-input/'
DATA_OUTPUT_PATH = 'data-output/'

<a id="load-gtfs-data"></a>
# Load GTFS Data

## __`routes.txt`__

GTFS files are pulled from: https://gitlab.com/LACMTA/gtfs_bus

The data used here is from version hash `9a71d665`.

* `route_id`
* `route_short_name`

## __`stops.txt`__

* `stop_id`
* `stop_name`
* `stop_lat`
* `stop_lng`

## __`stop_times.txt`__

* `trip_id`
* `stop_id`
* `stop_sequence`
* `stop_headsign`

In [3]:
lines = pd.read_csv(DATA_INPUT_PATH + 'routes.txt', 
    usecols={'route_id', 'route_short_name'},
    dtype={'route_id':'string', 'route_short_name':'string'})

lines.head(10)

Unnamed: 0,route_id,route_short_name
0,2-13139,2
1,4-13139,4
2,10-13139,10/48
3,14-13139,14/37
4,16-13139,16
5,18-13139,18
6,20-13139,20
7,28-13139,28
8,30-13139,30
9,33-13139,33


In [4]:
stops = pd.read_csv(DATA_INPUT_PATH + 'stops.txt',
    usecols=['stop_id','stop_name','stop_lat','stop_lon'],
    dtype={'stop_id':'string','stop_name':'string','stop_lat':'float64','stop_lon':'float64'})
stops.head(5)

Unnamed: 0,stop_id,stop_name,stop_lat,stop_lon
0,1,Paramount / Slauson,33.973248,-118.113113
1,3,Jefferson / 10th,34.025471,-118.328402
2,6,120th / Augustus F Hawkins,33.924696,-118.242222
3,7,120th / Martin Luther King Hospital,33.924505,-118.240369
4,12,15054 Sherman Way,34.201075,-118.461953


In [5]:
stop_times = pd.read_csv(DATA_INPUT_PATH + 'stop_times.txt',
    usecols=['trip_id','stop_id','stop_sequence','stop_headsign'],
    dtype={'trip_id':'string','stop_id':'string','stop_sequence':'int64','stop_headsign':'string'})
stop_times.head(5)

Unnamed: 0,trip_id,stop_id,stop_sequence,stop_headsign
0,52088401-DEC20-D02CAR-1_Weekday,10246,1,611 - Vernon Station
1,52088401-DEC20-D02CAR-1_Weekday,10248,2,611 - Vernon Station
2,52088401-DEC20-D02CAR-1_Weekday,9371,3,611 - Vernon Station
3,52088401-DEC20-D02CAR-1_Weekday,9350,4,611 - Vernon Station
4,52088401-DEC20-D02CAR-1_Weekday,9351,5,611 - Vernon Station


<a id="output-lines"></a>
# Output Lines

Output file as: `lines.json`.

`.to_json()` method

* Only works on a DataFrame
* Outputs data by column - use `orient='records'` to output by record

Fields in output:

* `route_id` - route number plus HASTUS version. For lines with sister routes, will only list first line.
  * Ex: `10-13139`
* `route_short_name` - route number, includes sister routes.
  * Ex: `10/48`

Modifications:

* The Silver Line and Orange Line do not have `route_short_name` values so those have to be manually added.
* The L Line Shuttle and the two Dodger Stadium Express Shuttles will be removed since they're temporary services.
* Lines with sister routes may need to be separated to treat each as separate lines.  Unless... a rider can stay on a single vehicle and end up on the other line.  In that case we would want to combine the stops for both lines so they are selectable from the landing page.

## MyBus Usage

Landing Page - Line Select Dropdown

* `route_id` - use as button value, pass this to the results page as a URL parameter
* `route_short_name` - user-friendly text for the dropdown (just a number)

Results Page - Header

* `route_short_name` - use as H1



In [6]:
# Add route_short_name values for the 901 (Orange Line) and 910/950 (Silver Line)
lines.loc[lines["route_id"] == '910-13139', 'route_short_name'] = '910/950'
lines.loc[lines["route_id"] == '901-13139', 'route_short_name'] = '901'

# Line 16/17 is listed as only '16' in the GTFS data even though headsigns show 17.
# Add line 17 back in.
lines.loc[lines["route_id"] == '16-13139', 'route_short_name'] = '16/17'

# Add back in lines: 177, 244, 489, 788.
lines.loc[len(lines.index)] = ['177', '177'] # no stops
lines.loc[len(lines.index)] = ['244', '244'] # has stops
lines.loc[len(lines.index)] = ['489', '489'] # has stops
lines.loc[len(lines.index)] = ['788', '788'] # no stops

# Remove the entries for the Dodger Express and L Line (Gold) Shuttle
lines = lines.loc[~lines["route_id"].isin(['DSE-HG'])]
lines = lines.loc[~lines["route_id"].isin(['DSE-US'])]
lines = lines.loc[~lines["route_id"].isin(['854-13139'])]

# Separate out the sister lines.
lines_separated = lines.loc[lines['route_short_name'].str.contains('/'), 'route_short_name'].values

for i, l in enumerate(lines_separated):
    id = lines.loc[lines['route_short_name'] == l]['route_id'].values[0]
    slash = l.find('/')
    line1 = l[:slash]
    line2 = l[slash+1:]
    
    lines = lines.loc[~lines["route_id"].isin([id])]
    newlines = pd.DataFrame([[id, line1], [id, line2]], columns=['route_id', 'route_short_name'])
    lines = lines.append(newlines, ignore_index=True)

# cast route_short_name to int32 so that we can sort by their integer value
lines = lines.astype({'route_short_name': 'int32'}).sort_values('route_short_name')
lines.head(10)


Unnamed: 0,route_id,route_short_name
0,2-13139,2
1,4-13139,4
109,10-13139,10
111,14-13139,14
113,16-13139,16
114,16-13139,17
2,18-13139,18
3,20-13139,20
4,28-13139,28
5,30-13139,30


In [7]:
# Create an array of all valid lines 
# Use this to iterate through all lines listed on MyBus
# As of 5/30/21, this is 141 lines.
lines_array = lines.loc[:, 'route_short_name'].values
print(lines_array)
print(len(lines_array))

[  2   4  10  14  16  17  18  20  28  30  33  35  37  38  40  45  48  51
  52  53  55  60  62  66  68  70  71  76  78  79  81  83  90  91  92  94
  96 102 105 106 108 110 111 115 117 120 125 127 128 130 150 152 154 155
 158 161 162 163 164 165 166 167 169 175 176 177 180 181 183 200 201 202
 204 205 206 207 209 210 211 212 215 217 218 222 224 230 232 233 234 236
 237 239 240 242 243 244 245 246 251 252 256 258 260 264 265 266 267 268
 344 460 487 489 501 534 550 577 601 602 603 605 611 656 665 685 686 687
 704 720 733 734 744 750 754 757 770 780 788 794 901 910 950]
141


In [8]:
# create a dataframe of this array:
df_lines_array = pd.DataFrame(lines_array, columns=['lines'])
df_lines_array

Unnamed: 0,lines
0,2
1,4
2,10
3,14
4,16
...,...
136,788
137,794
138,901
139,910


In [6]:
######################################################
# OUTPUT lines.json
######################################################
lines.to_json(DATA_OUTPUT_PATH + "lines.json",orient='records')

# if we remove "-13139" from route_id it will be identical to route_short_name
#lines.route_id = lines.route_id.str.replace('-13139','')

0        2
1        4
109     10
111     14
113     16
114     17
2       18
3       20
4       28
5       30
6       33
115     35
112     37
116     38
7       40
8       45
110     48
117     51
118     52
9       53
10      55
11      60
12      62
13      66
14      68
15      70
16      71
17      76
119     78
120     79
18      81
19      83
121     90
122     91
20      92
21      94
22      96
23     102
24     105
25     106
26     108
27     110
28     111
29     115
30     117
31     120
32     125
33     127
34     128
35     130
123    150
36     152
37     154
38     155
39     158
40     161
125    162
126    163
41     164
42     165
43     166
44     167
45     169
46     175
47     176
105    177
127    180
128    181
48     183
49     200
50     201
51     202
52     204
53     205
54     206
55     207
56     209
57     210
129    211
58     212
130    215
59     217
60     218
61     222
62     224
63     230
64     232
65     233
66     234
67     236
131    237

# Combine Lines & Stops Data

Merge `stop_times` and `stops` using a LEFT JOIN on `stop_id`.  For each stop on a line, this will show that stop's name and lat/lng.

Use the `lines_and_stops` dataframe to generate a file for each line that lists all unique stops for that line.

In [9]:
lines_and_stops = pd.merge(stop_times, stops, how="left", on="stop_id")
lines_and_stops.head(5)

#lines_and_stops[lines_and_stops['stop_headsign'].str.contains('^48\s', regex=True)]
#lines_and_stops[lines_and_stops['stop_headsign'].str.contains('^794\s', regex=True)]

Unnamed: 0,trip_id,stop_id,stop_sequence,stop_headsign,stop_name,stop_lat,stop_lon
0,52088401-DEC20-D02CAR-1_Weekday,10246,1,611 - Vernon Station,Vernon / Long Beach,34.00405,-118.242837
1,52088401-DEC20-D02CAR-1_Weekday,10248,2,611 - Vernon Station,Vernon / Morgan,34.00404,-118.244926
2,52088401-DEC20-D02CAR-1_Weekday,9371,3,611 - Vernon Station,Compton / Vernon,34.00363,-118.247907
3,52088401-DEC20-D02CAR-1_Weekday,9350,4,611 - Vernon Station,Compton / 46th,34.001767,-118.247915
4,52088401-DEC20-D02CAR-1_Weekday,9351,5,611 - Vernon Station,Compton / 48th,33.999487,-118.247918


In [10]:
# Create a simplified version of the lines_and_stops dataframe
# Replace the stop_headsign column with the line number extracted into a 'line' column

simple_lines_stops = lines_and_stops[['stop_id','stop_headsign','stop_name']].copy()
simple_lines_stops['line'] = np.nan
counter = 1

for line in lines_array:
    line_regex = '^' + str(line) + '\s'
    simple_lines_stops.loc[simple_lines_stops['stop_headsign'].str.contains(line_regex), 'line'] = line
    
    counter += 1
    if counter % 10 == 0:
        print(str(counter) + ' lines processed')

simple_lines_stops.head()

10 lines processed
20 lines processed
30 lines processed
40 lines processed
50 lines processed
60 lines processed
70 lines processed
80 lines processed
90 lines processed
100 lines processed
110 lines processed
120 lines processed
130 lines processed
140 lines processed


Unnamed: 0,stop_id,stop_headsign,stop_name,line
0,10246,611 - Vernon Station,Vernon / Long Beach,611.0
1,10248,611 - Vernon Station,Vernon / Morgan,611.0
2,9371,611 - Vernon Station,Compton / Vernon,611.0
3,9350,611 - Vernon Station,Compton / 46th,611.0
4,9351,611 - Vernon Station,Compton / 48th,611.0


In [11]:
simple_lines_stops.replace('', float("NaN"), inplace=True)
simple_lines_stops.dropna(subset = ['line'], inplace=True)
simple_lines_stops.line = simple_lines_stops.line.astype('int')

# stop data for 139 lines (lines 177 and 788 are not in the GTFS)
simple_lines_stops = simple_lines_stops[['stop_id','stop_name','line']].copy()

simple_lines_stops

# simple_lines_stops
# dtypes____________
# stop_id   string
# line      int

Unnamed: 0,stop_id,stop_name,line
0,10246,Vernon / Long Beach,611
1,10248,Vernon / Morgan,611
2,9371,Compton / Vernon,611
3,9350,Compton / 46th,611
4,9351,Compton / 48th,611
...,...,...,...
1790800,13422,Figueroa / Exposition,200
1790801,13424,Figueroa / State,200
1790802,13437,Figueroa / 39th,200
1790803,2738,Martin Luther King Jr / Figueroa,200


# Looking at Stops Data

Questions:

* How many records are there in `lines_and_stops` for a particular line?
  * Use REGEX matching on `stop_headsign`. Will need to use an OR operator for sisters lines because each has distinct headsign values.
* Of those records, how many unique `trip_id`s are there?
* Of those records, how many unique `stop_name`s are there?
* What is the highest `stop_sequence` value for that line?

## Stops Data Findings

Line 2

* 31,094 stop times along Line 2
* 377 unique trips
* 92 stops MAX within a single trip
* 377 x 92 = 34,684 - this means there are some trips with fewer than 92 stops
* 123 unique stop names - this means trips do not all contain the same set of stops

Line 10/48

* ??

Highest `stop_sequence`

* Line 90
* 136 is the highest value
* Line 90 has 158 total unique stops.

In [8]:
####################
#  Line 2          #
####################

# All values for Line 2 = 31,094
#lines_and_stops[lines_and_stops['stop_headsign'].str.contains('^2\s', regex=True)]
 
# Unique stop names for Line 2 = 123
#lines_and_stops[lines_and_stops['stop_headsign'].str.contains('^2\s', regex=True)].stop_name.unique()

# Unique trip_ids for Line 2 = 377
#lines_and_stops[lines_and_stops['stop_headsign'].str.contains('^2\s', regex=True)].trip_id.unique()

# Line 2 stops sorted by highest stop_sequence value = 92
#lines_and_stops[lines_and_stops['stop_headsign'].str.contains('^2\s', regex=True)].sort_values('stop_sequence', ascending=False).head(10)

In [9]:
####################
#  Lines 10/48     #
####################

# REGEX: ^(10\s|48\s)

# All values for Line 10 = 12,534 rows
# All values for Line 10/48 = 16,524 rows
#lines_and_stops[lines_and_stops['stop_headsign'].str.contains('^10\s', regex=True)]
#lines_and_stops[lines_and_stops['stop_headsign'].str.contains('^(10\s|48\s)', regex=True)]

# Unique trip_ids for Line 10 = 231
# Unique trip_ids for Line 10/48 = 316
#lines_and_stops[lines_and_stops['stop_headsign'].str.contains('^10\s', regex=True)].trip_id.unique()
#lines_and_stops[lines_and_stops['stop_headsign'].str.contains('^(10\s|48\s)', regex=True)].trip_id.unique()

# Unique stop names for Line 10 = 117
# Unique stop names for Line 10/48 = 132
#lines_and_stops[lines_and_stops['stop_headsign'].str.contains('^10\s', regex=True)].stop_name.unique()
#lines_and_stops[lines_and_stops['stop_headsign'].str.contains('^(10\s|48\s)', regex=True)].stop_name.unique()

# Line 10/48 stops sorted by highest stop_sequence value = 102
#lines_and_stops[lines_and_stops['stop_headsign'].str.contains('^(10\s|48\s)', regex=True)].sort_values('stop_sequence', ascending=False).head(10)

In [10]:
# Line 90 has highest stop_sequence value of 136
#lines_and_stops.sort_values('stop_sequence', ascending=False).head(10)

# Line 90 has 158 unique stop names
#lines_and_stops[lines_and_stops['stop_headsign'].str.contains('^90\s', regex=True)].stop_name.unique()

# Output All Stops For Each Line

Use `.drop_duplicates()` to generate a new dataframe with only unique values.  Using `.[column name].unique()` outputs a StringArray and we need a DataFrame in order to call `.to_json()`.

Sort results by `stop_name` because we're combining multiple trips that may each have their own set of stops along their routes.

Fields in output:

* `stop_id`
* `stop_name`

## Usage

Landing Page - Stop Select Dropdowns

* `stop_id` - use as button value, pass this to the results page as a URL parameter
* `stop_name` - user-friendly text for the dropdown

## Method

Loop through data to generate a separate file for each line.  Each file will contain all unique `stop_name`s for that line.  Lines are matched using the `stop_headsign` field.  Only one line number shows up at a time in the `stop_headsign`.

* Create an array of all the `route_short_name` values which should match the line numbers within `stop_headsign`.
* For each of those line numbers, find the rows in `lines_and_stops` that contain that line number within `stop_headsign`.
* From those values, drop duplicate `stop_name`s to create a list of all unique stops for that line.
* Sort the values so the `stop_name`s are in alphabetical order and output the results to JSON files.

## Note

The initial version of the generated data only took the first instance of `stop_name` (using `drop_duplicates`).  However, the following factors make it so that all `stop_id`s for a `stop_name` should be included in the data:

* There can be multiple `stop_id`s that match a single `stop_name`.
* The selection dropdown on MyBus uses `stop_name` and we don't offer a way to differentiate between `stop_id`s.
* The stop_changes data does not include `stop_name`s that match the GTFS data, mostly just `stop_id`.

## TODO

* Check `stop_name`s for any abbreviations that should be corrected.

In [11]:
######################################################
# OUTPUT all stops as *line*.json
# Drops duplicates based on stop_name
######################################################

line_regex = ''

for line in lines_array:
    line_regex = '^' + str(line) + '\s'
    line_filename = DATA_OUTPUT_PATH + 'stops/' + str(line) + '.json'

    dedupped_stops = lines_and_stops[lines_and_stops['stop_headsign'].str.contains(line_regex, regex=True)].drop_duplicates(subset='stop_name')
    dedupped_stops[['stop_id','stop_name']].sort_values('stop_name').to_json(line_filename, orient='records')

    print('Line ' + line_filename + ' created')

141 lines
[  2   4  10  14  16  17  18  20  28  30  33  35  37  38  40  45  48  51
  52  53  55  60  62  66  68  70  71  76  78  79  81  83  90  91  92  94
  96 102 105 106 108 110 111 115 117 120 125 127 128 130 150 152 154 155
 158 161 162 163 164 165 166 167 169 175 176 177 180 181 183 200 201 202
 204 205 206 207 209 210 211 212 215 217 218 222 224 230 232 233 234 236
 237 239 240 242 243 244 245 246 251 252 256 258 260 264 265 266 267 268
 344 460 487 489 501 534 550 577 601 602 603 605 611 656 665 685 686 687
 704 720 733 734 744 750 754 757 770 780 788 794 901 910 950]
Line data-output/stops/2.json created
Line data-output/stops/4.json created
Line data-output/stops/10.json created
Line data-output/stops/14.json created
Line data-output/stops/16.json created
Line data-output/stops/17.json created
Line data-output/stops/18.json created
Line data-output/stops/20.json created
Line data-output/stops/28.json created
Line data-output/stops/30.json created
Line data-output/stops/33.jso

In [111]:
headsign_lines = simple_lines_stops.line.unique()
print(len(headsign_lines))
print(sorted(headsign_lines))

df_headsign_lines = pd.DataFrame(headsign_lines, columns=['lines']).sort_values(by=['lines']).reset_index()

# LEFT = routes.txt but modified
# RIGHT = stop_headsigns
lines_merged = pd.merge(df_lines_array, df_headsign_lines, how='left', indicator=True)
lines_merged[lines_merged._merge != 'both']

# 177 and 788 both are not in the GTFS and don't have stops
# that's why they aren't in the stop_headsigns column.


139
[2, 4, 10, 14, 16, 17, 18, 20, 28, 30, 33, 35, 37, 38, 40, 45, 48, 51, 52, 53, 55, 60, 62, 66, 68, 70, 71, 76, 78, 79, 81, 83, 90, 91, 92, 94, 96, 102, 105, 106, 108, 110, 111, 115, 117, 120, 125, 127, 128, 130, 150, 152, 154, 155, 158, 161, 162, 163, 164, 165, 166, 167, 169, 175, 176, 180, 181, 183, 200, 201, 202, 204, 205, 206, 207, 209, 210, 211, 212, 215, 217, 218, 222, 224, 230, 232, 233, 234, 236, 237, 239, 240, 242, 243, 244, 245, 246, 251, 252, 256, 258, 260, 264, 265, 266, 267, 268, 344, 460, 487, 489, 501, 534, 550, 577, 601, 602, 603, 605, 611, 656, 665, 685, 686, 687, 704, 720, 733, 734, 744, 750, 754, 757, 770, 780, 794, 901, 910, 950]


Unnamed: 0,lines,index,_merge
65,177,,left_only
136,788,,left_only


In [12]:
######################################################
# OUTPUT all stops as *line*.json
# Output files with all matching `stop_id`s for each unique `line` + `stop_name` combination
######################################################

# Group line+stop data by unique line + stop combinations
# Create a new column that aggregates the unique associated stop_ids
grouped_stops = simple_lines_stops.groupby(['line', 'stop_name'])
unique_grouped_stops = grouped_stops['stop_id'].unique()

unique_grouped_stops = unique_grouped_stops.reset_index()

unique_grouped_stops['stop_id_agg'] = ''

def aggregate_stop_id(row):
    count = 0
    result = ''
    for id in row.stop_id:
        if count > 0:
            result += '|'
        result += id
        count += 1
    return result

unique_grouped_stops.stop_id_agg = unique_grouped_stops.apply(aggregate_stop_id, axis=1)

aggregated_grouped_stops = unique_grouped_stops[['line','stop_name','stop_id_agg']].copy()

aggregated_grouped_stops

# unique_grouped_stops.to_json(DATA_OUTPUT_PATH + 'unique-grouped-stops.json', orient='records')


Unnamed: 0,line,stop_name,stop_id_agg
0,2,Alvarado / Montana,3360
1,2,Alvarado / Sunset,3362
2,2,Broadway / 12th,15598
3,2,Broadway / 1st,4767
4,2,Broadway / 3rd,13227
...,...,...,...
10894,950,Pacific / 7th,5410|5411
10895,950,Spring / 1st - City Hall,11917
10896,950,Spring / Temple,12416
10897,950,USC Medical Ctr Busway Station,15029|5048


In [372]:
# unique_agg_stops = aggregated_grouped_stops.drop_duplicates(subset = ['line','stop_id_agg'])
# unique_agg_stops
# no difference

Unnamed: 0,line,stop_name,stop_id_agg
0,2,Alvarado / Montana,3360
1,2,Alvarado / Sunset,3362
2,2,Broadway / 12th,15598
3,2,Broadway / 1st,4767
4,2,Broadway / 3rd,13227
...,...,...,...
10894,950,Pacific / 7th,5410|5411
10895,950,Spring / 1st - City Hall,11917
10896,950,Spring / Temple,12416
10897,950,USC Medical Ctr Busway Station,15029|5048


In [18]:
# Output the aggregated stop_ids to files by line 

line_regex = ''
count = 0

for line in lines_array:
    line_regex = '^' + str(line) + '\s'
    line_filename = DATA_OUTPUT_PATH + 'stops-agg/' + str(line) + '.json'

    # no de-dupping necessary because the data was already grouped by line + stop_name
    stop_by_line = aggregated_grouped_stops[aggregated_grouped_stops.line == line]
    stop_by_line = stop_by_line.rename(columns={'stop_id_agg':'stop_id'})
    stop_by_line[['stop_id','stop_name']].sort_values('stop_name').to_json(line_filename, orient='records')

    print('Line ' + line_filename + ' created' + ' (' + str(len(stop_by_line)) + ')')
    count += 1

print(str(count) + ' files created.')

Line data-output/stops-agg/2.json created (123)
Line data-output/stops-agg/4.json created (129)
Line data-output/stops-agg/10.json created (117)
Line data-output/stops-agg/14.json created (47)
Line data-output/stops-agg/16.json created (80)
Line data-output/stops-agg/17.json created (84)
Line data-output/stops-agg/18.json created (94)
Line data-output/stops-agg/20.json created (103)
Line data-output/stops-agg/28.json created (124)
Line data-output/stops-agg/30.json created (90)
Line data-output/stops-agg/33.json created (115)
Line data-output/stops-agg/35.json created (55)
Line data-output/stops-agg/37.json created (31)
Line data-output/stops-agg/38.json created (54)
Line data-output/stops-agg/40.json created (125)
Line data-output/stops-agg/45.json created (88)
Line data-output/stops-agg/48.json created (49)
Line data-output/stops-agg/51.json created (108)
Line data-output/stops-agg/52.json created (105)
Line data-output/stops-agg/53.json created (106)
Line data-output/stops-agg/55.js

# Random Scratch Code Below

In [12]:
# route_sn_column = lines.loc[:, 'route_short_name']
# route_sn_array = route_sn_column.values
# lines_adjusted = []

# for i, line in enumerate(route_sn_array):
#     slash = line.find('/')

#     if slash > 0:
#         lines_adjusted.append(line[:slash])
#         lines_adjusted.append(line[slash+1:])
#         continue
#     else:
#         lines_adjusted.append(line)

# # 139 lines
# print(lines_adjusted)
# print('\nTotal number of lines after split: ', len(lines_adjusted))

In [13]:
# route_id_array = route_id_column.values
# route_num_array = []

# for i, line in enumerate(route_id_array):
#     route_num_array.append('^' + route_id_array[i].replace('-13139','') + '\s')

# print(route_id_array)
# print(route_num_array)

In [14]:
#lines_and_stops[lines_and_stops.stop_id == '3360']