Skip to content

Usage: 4.1. Using Network: Accessing Data

Kasia Kozlowska edited this page Nov 24, 2022 · 12 revisions

Using the Network object

Available as a jupyter notebook or wiki page.

Let's read in a sample MATSim network into GeNet's Network object.

from genet import read_matsim
import os
import pyproj

path_to_matsim_network = '../example_data/pt2matsim_network'

network = os.path.join(path_to_matsim_network, 'network.xml')
schedule = os.path.join(path_to_matsim_network, 'schedule.xml')
vehicles = os.path.join(path_to_matsim_network, 'vehicles.xml')
n = read_matsim(
    path_to_network=network, 
    epsg='epsg:27700', 
    path_to_schedule=schedule, 
    path_to_vehicles=vehicles
)
# # you don't need to read the vehicles file, but doing so ensures all vehicles
# # in the schedule are of the expected type and the definition of the vehicle
# # is preserved

n.print()
Graph info: Name: 
Type: MultiDiGraph
Number of nodes: 1662
Number of edges: 3166
Average in degree:   1.9049
Average out degree:   1.9049 
Schedule info: Schedule:
Number of services: 9
Number of routes: 68
Number of stops: 118

Summary

The network summary report can be accessed using the summary_report method

n.summary_report()
2022-08-25 14:51:20,883 - Creating a summary report





{'network': {'network_graph_info': {'Number of network links': 1662,
   'Number of network nodes': 3166},
  'modes': {'Modes on network links': {'artificial', 'bus', 'car', 'pt'},
   'Number of links by mode': {'artificial': 3,
    'car': 3161,
    'pt': 153,
    'bus': 182}},
  'osm_highway_tags': {'Number of links by tag': {'living_street': 7,
    'tertiary_link': 2,
    'trunk': 213,
    'unclassified': 1027,
    'tertiary': 326,
    'residential': 758,
    'secondary_link': 2,
    'primary_link': 5,
    'service': 2,
    'primary': 619,
    'secondary': 185,
    'trunk_link': 17}}},
 'schedule': {'schedule_info': {'Number of services': 9,
   'Number of routes': 68,
   'Number of stops': 118},
  'modes': {'Modes in schedule': {'bus'},
   'Services by mode': {'bus': 9},
   'PT stops by mode': {'bus': 45}},
  'accessibility_tags': {'Stops with tag bikeAccessible': 0,
   'Unique values for bikeAccessible tag': set(),
   'Stops with tag carAccessible': 0,
   'Unique values for carAccessible tag': set()}}}

The data saved on the edges or nodes of the graph can be nested. There are a couple of convenient methods that summarise the schema of the data found on the nodes and links. If data=True, the output also shows up to 5 unique values stored in that location.

n.node_attribute_summary(data=True)
attribute
├── id: ['3085005043', '200047', '852019112', '107824', '14790693']
├── x: [528387.4250512555, 528391.4406755936, 528393.2742107178, 528396.6287644263, 528396.3513181042]
├── y: [181547.5850354673, 181552.72935927223, 181558.10532352765, 181559.970402835, 181562.0370527053]
├── lon: [-0.15178558709839862, -0.135349787087776, -0.122919287085967, -0.13766218709633904, -0.14629008709559344]
├── lat: [51.52643403323907, 51.51609983324067, 51.51595583324104, 51.5182034332405, 51.52410423323943]
└── s2_id: [5221390710015643649, 5221390314367946753, 5221366508477440003, 5221390682291777543, 5221390739236081673]
n.link_attribute_summary(data=False)
attribute
├── id
├── from
├── to
├── freespeed
├── capacity
├── permlanes
├── oneway
├── modes
├── s2_from
├── s2_to
├── attributes
│   ├── osm:way:access
│   ├── osm:way:highway
│   ├── osm:way:id
│   ├── osm:way:name
│   ├── osm:relation:route
│   ├── osm:way:lanes
│   ├── osm:way:oneway
│   ├── osm:way:tunnel
│   ├── osm:way:psv
│   ├── osm:way:vehicle
│   ├── osm:way:traffic_calming
│   ├── osm:way:junction
│   └── osm:way:service
└── length

Once you see the general schema for the data stored on nodes and links, you may decide to look at or perform analysis on all of the data stored in the netowrk under a particular key. A GeNet network has two methods which generate a pandas.Series object, which stores the nodes or links data present at the specified key, indexed by the same index as the nodes or links.

s2_id = n.node_attribute_data_under_key('s2_id')
s2_id
101982       5221390329378179879
101986       5221390328605860387
101990       5221390304444511271
101991       5221390303978897267
101992       5221390304897644929
                    ...         
983839058    5221390693831817171
99936        5221390297975475113
99937        5221390299484831045
99940        5221390294354743413
99943        5221390298004852605
Length: 1662, dtype: int64
n.link_attribute_data_under_key('freespeed').head()
1       4.166667
10      4.166667
100     4.166667
1000    4.166667
1001    4.166667
dtype: float64

Or you can access nested data,

n.link_attribute_data_under_key({'attributes': 'osm:way:lanes'}).head()
1007    2
1008    2
1037    2
1038    2
1039    2
dtype: object

You can also build a pandas.DataFrame out of several keys.

n.link_attribute_data_under_keys(['freespeed', {'attributes': 'osm:way:highway'}]).head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
freespeed attributes::osm:way:highway
1 4.166667 unclassified
10 4.166667 unclassified
100 4.166667 unclassified
1000 4.166667 residential
1001 4.166667 residential

Extracting links of interest

The function below gathers link ids which satisfy conditions to arbitrary level of nested-ness. It also allows quite flexible conditions---below we require that the link value at data['attributes']['osm:way:highway'] == 'primary', where data is the data dictionary stored on that link.

from genet import graph_operations
links = n.extract_links_on_edge_attributes(
    conditions= {'attributes': {'osm:way:highway': 'primary'}},
)
links[:5]
['1007', '1008', '1023', '1024', '103']
len(links)
619

Note, it is possible to set data in long format, specifying the JAVA class of the data stored, i.e.

{'id': '1007',
 'from': '4356572310',
 'to': '5811263955',
 'attributes': {'osm:way:highway': {'name': 'osm:way:highway',
   'class': 'java.lang.String',
   'text': 'primary'},
  'osm:way:id': {'name': 'osm:way:id',
   'class': 'java.lang.Long',
   'text': '589660342'},
  'osm:way:lanes': {'name': 'osm:way:highway',
   'class': 'java.lang.String',
   'text': 'primary'},
  'osm:way:name': {'name': 'osm:way:name',
   'class': 'java.lang.String',
   'text': 'Shaftesbury Avenue'},
  'osm:way:oneway': {'name': 'osm:way:oneway',
   'class': 'java.lang.String',
   'text': 'yes'}},
 'length': 13.941905154249884}

This is useful if you want to force the data to be saved to MATSim XML file with that specific data type.

In that case, to find primary highway links, you would instead set the following condition:

links = n.extract_links_on_edge_attributes(
    conditions= {'attributes': {'osm:way:highway': {'text': 'primary'}}},
)

Below we now require that the link value at data['attributes']['osm:way:highway'] in ['primary', 'something else']. There is nothing in the data that has such tags, so the output is the same.

links = n.extract_links_on_edge_attributes(
    conditions= {'attributes': {'osm:way:highway': ['primary', 'something else']}},
)
links[:5]
['1007', '1008', '1023', '1024', '103']
len(links)
619

We can also pass a list of conditions. In this case it makes sense for us to specify how multiple conditions should be handled. We can do it via

  • how=all - all conditions need to be met
  • how=any - at least one condition needs to be met

It is set to any as default.

links = n.extract_links_on_edge_attributes(
    conditions= [{'attributes': {'osm:way:highway': 'primary'}},
                 {'attributes': {'osm:way:highway': 'something else'}}],
    how=any
)
links[:5]
['1007', '1008', '1023', '1024', '103']
len(links)
619
links = n.extract_links_on_edge_attributes(
    conditions= [{'attributes': {'osm:way:highway': 'primary'}},
                 {'attributes': {'osm:way:highway': 'something else'}}],
    how=all
)
links[:5]
[]

As expected, no links satisfy both data['attributes']['osm:way:highway'] == 'primary' and data['attributes']['osm:way:highway'] == 'something else'.

Below, we give an example of subsetting a numeric boundary. We find links where 0 <= 'freespeed' <= 20.

links = n.extract_links_on_edge_attributes(
    conditions = {'freespeed': (0,20)},
)
links[:5]
['1', '10', '100', '1000', '1001']
len(links)
2334

Finally, we can define a function that will handle the condition for us. The function should take the value expected at the key in the data dictionary and return either True or False.

For example, below we give an example equivalent to our first example of data['attributes']['osm:way:highway']['text'] == 'primary' but using a function we defined ourselves to handle the condition.

def highway_primary(value):
    return value == 'primary'

links = n.extract_links_on_edge_attributes(
    conditions= {'attributes': {'osm:way:highway': highway_primary}},
)
links[:5]
['1007', '1008', '1023', '1024', '103']
len(links)
619

This allows for really flexible subsetting of the network based on data stored on the edges. Another example, similar to the numeric boundary, but this time we only care about the upper bound and we make it a strict inequality.

def below_20(value):
    return value < 20

links = n.extract_links_on_edge_attributes(
    conditions= {'freespeed': below_20},
)
links[:5]
['1', '10', '100', '1000', '1001']
len(links)
2334

Modal convenience methods

n.links_on_modal_condition('bus')[:5]
['1021', '1023', '1024', '1079', '1105']

nodes_on_modal_condition will return nodes connected to the links satisfying the modal condition.

n.nodes_on_modal_condition(['car', 'bus'])[:5]
['852019112', '107824', '14790693', '21651810', '1166234800']

Spatial convenience methods

For spatial extraction conditions you have a choice of:

_ = n.to_geodataframe()
gdf_nodes, gdf_links = _['nodes'], _['links']
region = '48761ad71,48761ad723,48761ad724c,48761ad73c,48761ad744,48761ad75d3,48761ad75d5,48761ad765,48761ad767,48761ad76c,48761ad774,48761ad779,48761ad77b,48761ad783,48761ad784c,48761ad7854,48761ad794,48761ad79c,48761ad7a4,48761ad7ac,48761ad7b1,48761ad7bc'
_nodes = n.nodes_on_spatial_condition(region)[:5]
len(_nodes)
5
gdf_nodes.plot(), gdf_nodes[gdf_nodes['id'].isin(_nodes)].plot()
(<matplotlib.axes._subplots.AxesSubplot at 0x7fd8786c1050>,
 <matplotlib.axes._subplots.AxesSubplot at 0x7fd87879d550>)

png

png

geojson = '../example_data/Fitzrovia_polygon.geojson'

# here the area is too small for any routes to be within it
_links = n.links_on_spatial_condition(geojson, how='intersect')
len(_links)
270
gdf_links.plot(), gdf_links[gdf_links['id'].isin(_links)].plot()
(<matplotlib.axes._subplots.AxesSubplot at 0x7fd8781b8490>,
 <matplotlib.axes._subplots.AxesSubplot at 0x7fd8790d0cd0>)

png

png

from shapely.geometry import Polygon

region = Polygon([
    (-0.1487016677856445, 51.52556684350165), (-0.14063358306884766, 51.5255134425896),
    (-0.13865947723388672, 51.5228700191647), (-0.14093399047851562, 51.52006622056997),
    (-0.1492595672607422, 51.51974577545329), (-0.1508045196533203, 51.52276321095246),
    (-0.1487016677856445, 51.52556684350165)])

_links = n.links_on_spatial_condition(region, how='within')
len(_links)
227
gdf_links.plot(), gdf_links[gdf_links['id'].isin(_links)].plot()
(<matplotlib.axes._subplots.AxesSubplot at 0x7fd878005510>,
 <matplotlib.axes._subplots.AxesSubplot at 0x7fd8780d5650>)

png

png

Using the Schedule object

Schedule is a representation of public transit and is a part of any genet.Network, it is initiated as empty. A Network can exist and still be valid with an empty Schedule. Earlier we read a MATSim transit schedule.

A Schedule is comprised of a number of nested objects. Each Schedule has a number of Services. Each Service is made up a number of Routes. A Route is defined by an ordered list of Stop objects. Every Service should, logically, have at least two Routes, one going in one direction and another going back. Each Route also hold information about the trips, their timing and offsets arriving and departing at the Stops.

We can look at quick stats:

n.schedule.print()
Schedule:
Number of services: 9
Number of routes: 68
Number of stops: 118

Or we can plot the Schedule object. A Schedule on its' own does not have information about the Network, even if it has refrences to it via network routes in the Route objects. Thus calling a plot method on a Schedule will result in a plot of connections between stops for all Routes within all Services. To plot the network routes of the Schedule we use the plot method for the Network object which holds that Schedule.

# n.schedule.plot()

Summary

Schedules can get large and complicated. GeNet includes methods similar to ones presented for Network objects. This time, instead of inspecting data stored on links and edges of a graph, we summarise data held for Stops, Routes and Services in the Schedule.

n.schedule.stop_attribute_summary(data=False)
attribute
├── services
├── routes
├── id
├── x
├── y
├── epsg
├── name
├── lon
├── lat
├── s2_id
├── linkRefId
├── isBlocking
└── stopAreaId
n.schedule.route_attribute_summary(data=True)
attribute
├── route_short_name: ['N55', 'N5', '113', 'N20', '134']
├── mode: ['bus']
├── arrival_offsets: ['00:01:52', '00:02:18', '00:01:34', '00:03:48', '00:01:10']
├── departure_offsets: ['00:01:52', '00:02:18', '00:01:34', '00:03:48', '00:01:10']
├── route_long_name: ['']
├── id: ['VJea6046f64f85febf1854290fb8f76e921e3ac96b', 'VJf6055fdf9ef0dd6d0500b6c11adcfdd4d10655dc', 'VJ5b511605b1e07428c2e0a7d676d301c6c40dcca6', 'VJ85c23573d670bab5485618b0c5fddff3314efc89', 'VJ28a8a6a4ab02807a4fdfd199e5c2ca0622d34d0c']
├── trips
│   ├── trip_id: ['VJcc2e00b98a2837e18c555477c6e44ca2efe332e7_10:49:00', 'VJ0d5c884e960469ac2ced50a704e57d965da26018_17:20:56', 'VJc239057734e457e3ba45979b2d87a019b62742da_20:51:13', 'VJ5c2b1116530ef2e405c69e0bb12dfeaca4c08b24_16:54:00', 'VJe165350c77c2d832b595c5c02cf61a9291d87f88_19:13:00']
│   ├── trip_departure_time: ['01:24:00', '22:53:08', '13:59:00', '18:35:00', '16:56:56']
│   └── vehicle_id: ['veh_1757_bus', 'veh_885_bus', 'veh_1919_bus', 'veh_935_bus', 'veh_1935_bus']
├── route: ['87', '485', '1180', '2867', '3155']
├── await_departure: [True]
└── ordered_stops: ['490000252KA.link:1437', '490000235P.link:15', '490002124ZZ.link:1172', '490000091G.link:1242', '490000173RG.link:2614']
n.schedule.service_attribute_summary(data=True)
attribute
├── id: ['20274', '15234', '18915', '12430', '18853']
└── name: ['N55', 'N5', '113', 'N20', '134']

Again, similarly to Network objects, we can generate pandas.DataFrames for chosen attributes of Stops, Routes and Services. These dataframes are indexed by the index of the object you query, i.e. Stop ID, Route ID or Service ID. During intantiation of a Schedule object, Route and Service indices are checked and forced to be unique, reindexing them as neccessary.

n.schedule.stop_attribute_data(keys=['lat', 'lon', 'name']).head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
lat lon name
490000235X.link:834 51.516685 -0.128096 Tottenham Court Road Station (Stop X)
490000235YB.link:574 51.516098 -0.134044 Oxford Street Soho Street (Stop YB)
490014214HE.link:3154 51.515923 -0.135392 Wardour Street (Stop OM)
490010689KB.link:981 51.515472 -0.139893 Great Titchfield Street Oxford Circus Station...
490000235V.link:3140 51.516380 -0.131929 Tottenham Court Road Station (Stop V)
n.schedule.route_attribute_data(keys=['route_short_name', 'mode']).head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
route_short_name mode
VJ375a660d47a2aa570aa20a8568012da8497ffecf N55 bus
VJ812fad65e7fa418645b57b446f00cba573f2cdaf N55 bus
VJ6c64ab7b477e201cae950efde5bd0cb4e2e8888e N55 bus
VJea6046f64f85febf1854290fb8f76e921e3ac96b 94 bus
VJf6055fdf9ef0dd6d0500b6c11adcfdd4d10655dc 94 bus
n.schedule.service_attribute_data(keys='name', index_name='service_id').head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
name
service_id
20274 N55
12430 205
15234 134
18915 N5
18853 N8

Each trip in the schedule has a vehicle assigned to it. By default, each trip will have a unique vehicle, but this can be changed by the user (have a look in modification notebook). Each vehicle is linked to a type. Each schedule begins with types based off of a config genet/configs/vehicles/vehicle_definitions.yml, the user may like to point to their own config file or set those values through the Schedule object.

n.schedule.vehicles['veh_2331_bus']
{'type': 'Bus'}
n.schedule.vehicle_types['Bus']['capacity']['standingRoom']['persons'] = 5
n.schedule.vehicle_types['Bus']
{'capacity': {'seats': {'persons': '70'}, 'standingRoom': {'persons': 5}},
 'length': {'meter': '18.0'},
 'width': {'meter': '2.5'},
 'accessTime': {'secondsPerPerson': '0.5'},
 'egressTime': {'secondsPerPerson': '0.5'},
 'doorOperation': {'mode': 'serial'},
 'passengerCarEquivalents': {'pce': '2.8'}}

There exists a method to check that all vehicles are linked to a vehicle type which exists in the schedule.

n.schedule.validate_vehicle_definitions()
True

trips_to_dataframe is a useful method to extract all of the trips, their departures and vehicle IDs associated with the trips in the schedule. Trip ids need not be unique, route IDs provide a secondary index. Associated service IDs are also given for convenience. There is another method set_trips_dataframe which takes this dataframe and applies changes to all route trips based on the data in the dataframe. This means you can generate this DataFrame as shown below, manipulate trips (delete them, add new ones), change their departure times or change their vehicle ids to be shared for differnt trips, perhaps on some temporal logic and as long as the dataframe has the same schema, you can use it to set new trips in the schedule. This will appear in the changelog as a route level modify event. More info on this can be found in the Modifying Network notebook or wiki page.

n.schedule.trips_to_dataframe(gtfs_day='20210101').head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
route_id mode service_id trip_id trip_departure_time vehicle_id
0 VJ375a660d47a2aa570aa20a8568012da8497ffecf bus 20274 VJ2cdccea96e0e3e6a53a968bcb132941415d6d7c9_04:... 2021-01-01 04:53:00 veh_2331_bus
1 VJ375a660d47a2aa570aa20a8568012da8497ffecf bus 20274 VJ375a660d47a2aa570aa20a8568012da8497ffecf_03:... 2021-01-01 03:53:00 veh_2332_bus
2 VJ375a660d47a2aa570aa20a8568012da8497ffecf bus 20274 VJ3b9d77d2ef200b21c8048fea5eedc2d2788a1b94_01:... 2021-01-01 01:54:00 veh_2333_bus
3 VJ375a660d47a2aa570aa20a8568012da8497ffecf bus 20274 VJ79974c386a39426e06783650a759828438432aa4_05:... 2021-01-01 05:23:00 veh_2334_bus
4 VJ375a660d47a2aa570aa20a8568012da8497ffecf bus 20274 VJa09c394b71031216571d813a6266c83f2d30bf0a_04:... 2021-01-01 04:23:00 veh_2335_bus

Headways

You can generate a dataframe with headway information for all trips and services

n.schedule.trips_headways().head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
route_id mode service_id trip_id trip_departure_time vehicle_id headway headway_mins
0 VJ06420fdab0dfe5c8e7f2f9504df05cf6289cd7d3 bus 12430 VJ70cdcef7ccba9c599c70f89bdf8b10852e33bb04_11:... 1970-01-01 11:15:42 veh_409_bus 0 days 00:00:00 0.0
1 VJ06420fdab0dfe5c8e7f2f9504df05cf6289cd7d3 bus 12430 VJ126aa65811277b9774ae127ff819495441bc4e75_11:... 1970-01-01 11:24:42 veh_392_bus 0 days 00:09:00 9.0
2 VJ06420fdab0dfe5c8e7f2f9504df05cf6289cd7d3 bus 12430 VJ0d3b026c4060cd0325803e488a965a5ab91fd4c0_11:... 1970-01-01 11:32:42 veh_390_bus 0 days 00:08:00 8.0
3 VJ06420fdab0dfe5c8e7f2f9504df05cf6289cd7d3 bus 12430 VJ4155b3d5d916db07a50061ae1c15b24ecfc2f96f_11:... 1970-01-01 11:41:42 veh_401_bus 0 days 00:09:00 9.0
4 VJ06420fdab0dfe5c8e7f2f9504df05cf6289cd7d3 bus 12430 VJc9a308474ed72f769664413e686f3447613c5b3a_11:... 1970-01-01 11:49:42 veh_425_bus 0 days 00:08:00 8.0

You can also generate a dataframe with summary information about headways for each route in the schedule

n.schedule.headway_stats().head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
service_id route_id mode mean_headway_mins std_headway_mins max_headway_mins min_headway_mins trip_count
0 12430 VJ06420fdab0dfe5c8e7f2f9504df05cf6289cd7d3 bus 8.688889 1.378771 10.0 0.0 45.0
1 12430 VJ06cd41dcd58d947097df4a8f33234ef423210154 bus 115.333333 266.361909 659.0 0.0 6.0
2 12430 VJ0f3c08222de16c2e278be0a1bf0f9ea47370774e bus 9.851064 8.032485 63.0 0.0 47.0
3 12430 VJ15419796737689e742962a625abcf3fd5b3d58b1 bus 22.928571 75.682049 409.0 0.0 28.0
4 12430 VJ235c8fca539cf931b3c673f9b056606384aff950 bus 24.433333 86.248512 481.0 0.0 30.0

In another notebook on modification, you can find information about generating new trips to replace the old using headway information. This is useful when creating scenario networks.

Extracting Stops, Routes, Services of interest

There are times when we need to extract Service, Route or Stop IDs depending on some logic. Building conditions works exactly the same as for links and nodes of genet.Network which was presented exhaustively above. Here we present some examples. There are separate methods for Service, Route or Stop objects that return the IDs of these objects if they satisfy the conditions given by the user. Note, attribute_summary methods presented above help in building these conditions.

In general

n.schedule.extract_service_ids_on_attributes(
    conditions={'name': 'N55'})
['20274']
n.schedule.extract_route_ids_on_attributes(
    conditions=[{'mode': 'bus'}, {'route_short_name': 'N55'}], how=all)[:5]
['VJ375a660d47a2aa570aa20a8568012da8497ffecf',
 'VJ812fad65e7fa418645b57b446f00cba573f2cdaf',
 'VJ6c64ab7b477e201cae950efde5bd0cb4e2e8888e']
def oxford_street_in_name(attribs):
    if 'Oxford Street' in attribs:
        return True
    else:
        return False

n.schedule.extract_stop_ids_on_attributes(
    conditions={'name': oxford_street_in_name})[:5]
['490000235YB.link:574',
 '490000235P.link:15',
 '490000173W.link:1868',
 '490000235Z.link:15',
 '490000235Z']

There are several common extraction logics we might need. They relate to modes and spatial and temporal logic. Below we go through some convenience methods for those.

Modal

Below are convenience methods for extracting object IDs based on the modes they are related to. Note that only Route objects actually hold information about their mode of transport. When we extract Service of mode x, we pick services whose at least one route is of that mode. Similarly with Stops, we extract those used by routes of that mode.

n.schedule.services_on_modal_condition(modes='bus')[:5]
['20274', '15234', '12430', '18853', '18915']
n.schedule.routes_on_modal_condition(modes=['bus', 'rail'])[:5]
['VJ375a660d47a2aa570aa20a8568012da8497ffecf',
 'VJ812fad65e7fa418645b57b446f00cba573f2cdaf',
 'VJ6c64ab7b477e201cae950efde5bd0cb4e2e8888e',
 'VJea6046f64f85febf1854290fb8f76e921e3ac96b',
 'VJf6055fdf9ef0dd6d0500b6c11adcfdd4d10655dc']
n.schedule.stops_on_modal_condition(modes='bus')[:5]
['490000235X.link:834',
 '490000235YB.link:574',
 '490014214HE.link:3154',
 '490010689KB.link:981',
 '490000235V.link:3140']

Spatial

For spatial extraction conditions, similarly to the Network object, you have a choice of:

Again, methods exist for Service, Route or Stop objects seperately.

from shapely.geometry import Polygon

region = Polygon([
    (-0.1487016677856445, 51.52556684350165), (-0.14063358306884766, 51.5255134425896),
    (-0.13865947723388672, 51.5228700191647), (-0.14093399047851562, 51.52006622056997),
    (-0.1492595672607422, 51.51974577545329), (-0.1508045196533203, 51.52276321095246),
    (-0.1487016677856445, 51.52556684350165)])

n.schedule.services_on_spatial_condition(region)
['12430']

There are two options for Service and Route objects. They can either intersect the area, meaning at least one of their Stops lie in the specified area, or be within this area.

geojson = '../example_data/Fitzrovia_polygon.geojson'

# here the area is too small for any routes to be within it
n.schedule.routes_on_spatial_condition(geojson, how='within')
[]
# a lot of routes intersect it however
n.schedule.routes_on_spatial_condition(geojson, how='intersect')[:5]
['VJ06420fdab0dfe5c8e7f2f9504df05cf6289cd7d3',
 'VJeae6e634f8479e0b6712780d5728f0afca964e64',
 'VJ15419796737689e742962a625abcf3fd5b3d58b1',
 'VJf8e38a73359b6cf743d8e35ee64ef1f7b7914daa',
 'VJ06cd41dcd58d947097df4a8f33234ef423210154']
hex_region = '48761ad71,48761ad723,48761ad724c,48761ad73c,48761ad744,48761ad75d3,48761ad75d5,48761ad765,48761ad767,48761ad76c,48761ad774,48761ad779,48761ad77b,48761ad783,48761ad784c,48761ad7854,48761ad794,48761ad79c,48761ad7a4,48761ad7ac,48761ad7b1,48761ad7bc'
n.schedule.stops_on_spatial_condition(hex_region)
['490000091G.link:1242',
 '490000091H.link:1912',
 '490000091F',
 '490000091E',
 '490000091G',
 '490000091H',
 '9400ZZLUGPS2',
 '490013600C']

Temporal

These methods are under construction. A useful one in the meantime is presented below. It generates a pandas.DataFrame of departure and arrival times between all stops for all trips.

n.schedule.trips_with_stops_to_dataframe(gtfs_day='20200101').head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
service_name service_id arrival_time from_stop_name to_stop route_id from_stop departure_time mode to_stop_name route_name trip_id vehicle_id
0 205 12430 2020-01-01 16:35:25 Euston Square (Stop P) 4900020147W.link:2634 VJ06420fdab0dfe5c8e7f2f9504df05cf6289cd7d3 490000078P.link:1383 2020-01-01 16:33:42 bus University College Hosp Warren Street Stn (Sto... 205 VJ03f4f8905d6dc7868242f3fd29828ee9b366a906_16:... veh_388_bus
1 205 12430 2020-01-01 16:37:08 University College Hosp Warren Street Stn (Sto... 490000252V.link:1182 VJ06420fdab0dfe5c8e7f2f9504df05cf6289cd7d3 4900020147W.link:2634 2020-01-01 16:35:25 bus Warren Street Station (Stop V) 205 VJ03f4f8905d6dc7868242f3fd29828ee9b366a906_16:... veh_388_bus
2 205 12430 2020-01-01 16:38:51 Warren Street Station (Stop V) 490000091G.link:1242 VJ06420fdab0dfe5c8e7f2f9504df05cf6289cd7d3 490000252V.link:1182 2020-01-01 16:37:08 bus Great Portland Street (Stop G) 205 VJ03f4f8905d6dc7868242f3fd29828ee9b366a906_16:... veh_388_bus
3 205 12430 2020-01-01 16:40:34 Great Portland Street (Stop G) 490000191B.link:305 VJ06420fdab0dfe5c8e7f2f9504df05cf6289cd7d3 490000091G.link:1242 2020-01-01 16:38:51 bus Regent's Park (Stop B) 205 VJ03f4f8905d6dc7868242f3fd29828ee9b366a906_16:... veh_388_bus
4 205 12430 2020-01-01 16:42:17 Regent's Park (Stop B) 490007807W.link:2922 VJ06420fdab0dfe5c8e7f2f9504df05cf6289cd7d3 490000191B.link:305 2020-01-01 16:40:34 bus Harley Street (Stop L) 205 VJ03f4f8905d6dc7868242f3fd29828ee9b366a906_16:... veh_388_bus

Accessing Stop, Route, Service objects

Once you extract IDs of interest, you can access these objects. You can also modify them, check out the Modify Network notebook for usage examples.

Each Service is indexed and can be accessed by its' ID. It also has a plot method.

n.schedule.service_ids()[:5]
['20274', '12430', '15234', '18915', '18853']
service = n.schedule['12430']
service.print()
Service ID: 12430
Name: 205
Number of routes: 12
Number of stops: 11
# service.plot()

Similarly, each Route is indexed and can be accessed by its' id. It also has a plot method.

n.schedule.route_ids()[:5]
['VJ375a660d47a2aa570aa20a8568012da8497ffecf',
 'VJ812fad65e7fa418645b57b446f00cba573f2cdaf',
 'VJ6c64ab7b477e201cae950efde5bd0cb4e2e8888e',
 'VJea6046f64f85febf1854290fb8f76e921e3ac96b',
 'VJf6055fdf9ef0dd6d0500b6c11adcfdd4d10655dc']
route = n.schedule.route('VJ948e8caa0f08b9c6bf6330927893942c474b5100')
route.print()
Route ID: VJ948e8caa0f08b9c6bf6330927893942c474b5100
Name: 205
Number of stops: 5
Number of trips: 10
# route.plot()

Finally, each Stop is indexed too, and can be accessed by its' id.

stop = n.schedule.stop('490007807E.link:1154')
stop.print()
Stop ID: 490007807E.link:1154
Projection: epsg:27700
Lat, Lon: 51.52336503, -0.14951799
linkRefId: 1154