# MatSim specific validation

This page goes through validation checks available in GeNet. Available as a jupyter notebook or wiki page.

You can generate a validation report for the genet Network encompassing validity of the network, schedule and routing (of the transit services in the schedule on the network). It aims to provide a good collection of checks known to have affected MatSim simulations in the past. The report is a simple dictionary with keys: `graph`, `schedule` and `routing`.

In [1]:
# read sample network
from genet import read_matsim
import os

path_to_matsim_network = '../example_data/pt2matsim_network'

network = os.path.join(path_to_matsim_network, 'network.xml')
schedule = os.path.join(path_to_matsim_network, 'schedule.xml')
vehicles = os.path.join(path_to_matsim_network, 'vehicles.xml')
n = read_matsim(
    path_to_network=network, 
    epsg='epsg:27700', 
    path_to_schedule=schedule, 
    path_to_vehicles=vehicles
)
# you don't need to read the vehicles file, but doing so ensures all vehicles
# in the schedule are of the expected type and the definition of the vehicle
# is preserved
n.print()

Graph info: Name: 
Type: MultiDiGraph
Number of nodes: 1662
Number of edges: 3166
Average in degree:   1.9049
Average out degree:   1.9049 
Schedule info: Schedule:
Number of services: 9
Number of routes: 68
Number of stops: 118


In [2]:
report = n.generate_validation_report()

2022-10-19 14:00:13,074 - Checking validity of the Network
2022-10-19 14:00:13,074 - Checking validity of the Network graph
2022-10-19 14:00:13,075 - Defaulting to checking graph connectivity for modes: ['car', 'walk', 'bike']. You can change this by passing a `modes_for_strong_connectivity` param
2022-10-19 14:00:13,075 - Checking network connectivity for mode: car
2022-10-19 14:00:13,130 - The graph for mode: car has: 1 connected components, 0 sinks/dead_ends and 0 sources/unreachable nodes.
2022-10-19 14:00:13,131 - Checking network connectivity for mode: walk
2022-10-19 14:00:13,140 - The graph for mode: walk has: 0 connected components, 0 sinks/dead_ends and 0 sources/unreachable nodes.
2022-10-19 14:00:13,140 - Checking network connectivity for mode: bike
2022-10-19 14:00:13,149 - The graph for mode: bike has: 0 connected components, 0 sinks/dead_ends and 0 sources/unreachable nodes.
2022-10-19 14:00:13,194 - Checking link values for `freespeed`
2022-10-19 14:00:13,245 - Checking

## Graph

The `graph` section:
- describes strongly connected components of the modal subgraphs, for modes that agents in MATSim need to find routes on: `car`, and `walk` and `bike` if you are allowing agents to route on the network for those modes. 
- checks for isolated nodes (nodes that are not connected to anything, which can arise when deleting links for a network scenario)
- checks for links with attributes with values that are problematic such as:
   - fractions
   - infinity
   - zero
   - negative
   - none
- flags links of length 1km or longer 

In [3]:
from pprint import pprint
pprint(report['graph'])

{'graph_connectivity': {'bike': {'number_of_connected_subgraphs': 0,
                                 'problem_nodes': {'dead_ends': [],
                                                   'unreachable_node': []}},
                        'car': {'number_of_connected_subgraphs': 1,
                                'problem_nodes': {'dead_ends': [],
                                                  'unreachable_node': []}},
                        'walk': {'number_of_connected_subgraphs': 0,
                                 'problem_nodes': {'dead_ends': [],
                                                   'unreachable_node': []}}},
 'isolated_nodes': {'nodes': [], 'number_of_nodes': 0},
 'link_attributes': {'fractional_attributes': {'length': {'link_ids': ['3151'],
                                                          'number_of': 1,
                                                          'percentage': 0.0003158559696778269}},
                     'infinite_attributes': {},
     

## Schedule

The `schedule` section describes correctness of the schedule on three levels:
    
- `schedule_level`: Overall look at the schedule validity. A `Schedule` is valid if:
    - all of its' services are valid
    - its' services are uniquely indexed
    
    Schedule `has_valid_services` if all services within the schedule are deemed valid. The invalid services are 
    flagged in `invalid_services` and the invalid stages of schedule validity are flagged in `invalid_stages`.
    
    At this level we also report checks on:
        - `headways` (zero values)
        - `speeds` (zero and infinite values)
    
- `service_level`: Provides a look at validity of services within the schedule. It is indexed by service ids. Each
`Service` is valid if:
    - each of its' routes is valid
    - its' routes are uniquely indexed
    
    A service `has_valid_routes` if all routes within the service are deemed valid. The invalid routes are 
    flagged in `invalid_routes` and the invalid stages of service validity are flagged in `invalid_stages`.

- `route_level`: Provides a look at validity of each route within each service indexed by service id and route id
(or service id and the indexin the `Service.routes` list if not uniquely indexed). Each `Route` is valid if it
    - has more than one `Stop`
    - has correctly ordered route (the stops (their link reference ids) and links a route refers to are in the same 
    order)
    - arrival and departure offsets are correct (each stop has one and they are correctly ordered temporally)
    - does not have self loops (there are no trips such as: Stop A -> Stop A)
    
    If a route satisfies the above `is_valid_route` is `True`. If not, the `invalid_stages` flag where the route
    did not satisfy validity conditions.
    
    At this level, for each Route we also report `headway_stats`: 
        - `mean_headway_mins` 
        - `std_headway_mins`
        - `max_headway_mins`
        - `min_headway_mins`
    
- `vehicle_level`: Looks at the validity of vehicle definitions and their uses. Checks that there is a valid definition for each vehicle, which consists of the following components:
    - whether any definitions are missing
    - whether any vehicles are not being used any more
    - whether any vehicles are being used for multiple trips
    
Nb. The same dictionary output can be generated by using `Schedule` object's own `generate_validation_report` method.

In [4]:
pprint(report['schedule']['schedule_level'])

{'has_valid_services': True,
 'headways': {'has_zero_min_headways': False},
 'invalid_services': [],
 'invalid_stages': [],
 'is_valid_schedule': True,
 'speeds': {}}


In [5]:
pprint(report['schedule']['service_level']['12430'])

{'has_valid_routes': True,
 'invalid_routes': [],
 'invalid_stages': [],
 'is_valid_service': True}


In [13]:
# ['schedule']['route_level'][SERVICE_ID][ROUTE_ID]
pprint(report['schedule']['route_level']['12430']['VJ06420fdab0dfe5c8e7f2f9504df05cf6289cd7d3'])

{'headway_stats': {'max_headway_mins': 10.0,
                   'mean_headway_mins': 8.886363636363637,
                   'min_headway_mins': 8.000000000000002,
                   'std_headway_mins': 0.3867520743564629},
 'invalid_stages': [],
 'is_valid_route': True}


In [7]:
pprint(report['schedule']['vehicle_level'])

{'vehicle_definitions_valid': True,
 'vehicle_definitions_validity_components': {'missing_vehicles': {'missing_vehicles_types': set(),
                                                                  'vehicles_affected': {}},
                                             'multiple_use_vehicles': {},
                                             'unused_vehicles': {'veh_unused_bus'}}}


## Routing

Finally, the `routing` section describes routing of the transit schedule services onto the network graph.
- `services_have_routes_in_the_graph`: all routes have network routes and the links they refer to exist in the graph,
are connected (to nodes of preceding link is the from node of the next link in the chain) and the `modes` saved on the
link data accept the mode of the route.
- `service_routes_with_invalid_network_route`: flags routes not satifying the above,
- `route_to_crow_fly_ratio`: gives ratio of the length of route to crow-fly distance between each of the stops along 
route. If the route is invalid, it will result in 0. If the route has only one stop it will result in 
`'Division by zero'`.

In [8]:
pprint(report['routing'])

{'route_to_crow_fly_ratio': {'12430': {'VJ06420fdab0dfe5c8e7f2f9504df05cf6289cd7d3': 1.3239649602342694,
                                       'VJ06cd41dcd58d947097df4a8f33234ef423210154': 1.3239649602342694,
                                       'VJ0f3c08222de16c2e278be0a1bf0f9ea47370774e': 1.0701730717991658,
                                       'VJ15419796737689e742962a625abcf3fd5b3d58b1': 1.3239649602342694,
                                       'VJ235c8fca539cf931b3c673f9b056606384aff950': 1.0701730717991658,
                                       'VJ8f9aea7491080b0137d3092706f53dc11f7dba45': 1.0701730717991658,
                                       'VJ948e8caa0f08b9c6bf6330927893942c474b5100': 1.0701730717991658,
                                       'VJ95b4c534d7c903d76ec0340025aa88b81dba3ce4': 1.0701730717991658,
                                       'VJeae6e634f8479e0b6712780d5728f0afca964e64': 1.3239649602342694,
                                       'VJeb72539d69ddf

The above report relies on a lot of convenience methods which can be used on their own. For example, you can list all invalid routes for the network using:

In [9]:
n.invalid_network_routes()

[]

In [10]:
n.schedule.is_valid_schedule()

True

Something that is not included in the validity report is strong connectivity of PT (MATSim doesn't insist on it being satified). You can call `is_strongly_connected` on `Schedule` or the schedule components: `Service` and `Route`. The process uses an underlying  directed graph of stop connections (which you can access by calling `graph` method on a schedule-type element, e.g. if `s` is a `genet.Service` object, `s.graph()` will give you this directed graph)).

In [11]:
n.schedule.is_strongly_connected()

False

In [12]:
n.schedule.graph().is_directed()

True