Skip to content

baixianghuang/travel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TAP: Traffic Accident Prediction

Introduction

This repo contains the Traffic Accident Prediction (TAP) datasets proposed in the following paper:

TAP: A Comprehensive Data Repository for Traffic Accident Prediction in Road Networks. Baixiang Huang, Bryan Hooi, Kai Shu. Huang, Baixiang, Bryan Hooi, and Kai Shu. link

@article{huang2023tap,
  title={TAP: A Comprehensive Data Repository for Traffic Accident Prediction in Road Networks},
  author={Huang, Baixiang and Hooi, Bryan and Shu, Kai},
  journal={arXiv preprint arXiv:2304.08640},
  year={2023}
}

Datasets

The Traffic Accident Prediction (TAP) data repository offers extensive coverage for 1,000 US cities (TAP-city) and 49 states (TAP-state), providing real-world road structure data that can be easily used for graph-based machine learning methods such as Graph Neural Networks. Additionally, it features multi-dimensional geospatial attributes, including angular and directional features, that are useful for analyzing transportation networks. The TAP repository has the potential to benefit the research community in various applications, including traffic crash prediction, road safety analysis, and traffic crash mitigation. The datasets can be accessed in the TAP-city and TAP-state directories.

For example, this repository can aid in traffic accident occurrence prediction and accident severity prediction. Binary labels are used to indicate whether a node contains at least one accident for the occurrence prediction task, while severity is represented by a number between 0 and 7 for the severity prediction task. A severity level of 0 denotes no accident, and 1 to 7 represents increasingly significant impacts on traffic.

The table below shows the features included in our datasets.

Graph features Description
highway The type of a road (tertiary, motorway, etc.).
length The length of a road.
bridge Indicates whether a road represents a bridge.
lanes The number of lanes of a road.
oneway Indicates whether a road is a one-way street.
maxspeed The maximum legal speed limit of a road.
access Describes restrictions on the use of a road.
tunnel Indicates whether a road runs in a tunnel.
junction Describes the junction type of a road.
street_count The number of roads connected to a node.

Each dataset contains a single, directed road garph. Each road network is described by an instance of 'torch_geometric.data.Data', which include following attributes:

  • x: Node feature matrix with shape [num_nodes, num_node_features]
  • y: Node-level labels for the traffic accident occurrence prediction task
  • severity_labels: Node-level labels for the traffic accident severity prediction task
  • edge_attr: Edge feature matrix with shape [num_edges, num_edge_features]
  • edge_attr_dir: Directional edge feature matrix with shape [num_edges, 2]
  • edge_attr_ang: Angular edge feature matrix with shape [num_edges, 3]
  • coords: Node coordinates
  • crash_time: Start and end timestamps for traffic accidents with shape [2, num_nodes]
  • edge_index: Graph indices in COO format (coordinate format) with shape [2, num_edges] and type torch.long

List of 1,000 cities sorted by their total counts of traffic accident occurrences.

with open('util/cities_sorted_by_accident.pkl', 'rb') as fp:
    cities_sorted_by_accident = pickle.load(fp)

Enivorment

Run the following commands to create an environment and install all the required packages:

conda env create -f environment.yml

In addition, you can access and run the code from Google Colab: occurrence_task and severity_task.

You can use the code below to load the datasets.

import torch
import numpy as np
import os.path as osp
from typing import Union, Tuple, Callable, Optional
from torch_geometric.data import Data, DataLoader, InMemoryDataset, download_url
          
class TRAVELDataset(InMemoryDataset):
    r"""
    Args:
        root (string): Root directory where the dataset should be saved.
        name (string): The name of the dataset.
        transform (callable, optional): A function/transform that takes in an
            :obj:`torch_geometric.data.Data` object and returns a transformed
            version. The data object will be transformed before every access.
            (default: :obj:`None`)
        pre_transform (callable, optional): A function/transform that takes in
            an :obj:`torch_geometric.data.Data` object and returns a
            transformed version. The data object will be transformed before
            being saved to disk. (default: :obj:`None`)
    """
    
    url = 'https://github.com/baixianghuang/travel/raw/main/TAP-city/{}.npz'
    # url = 'https://github.com/baixianghuang/travel/raw/main/TAP-state/{}.npz'
    
    def __init__(self, root: str, name: str,
                 transform: Optional[Callable] = None,
                 pre_transform: Optional[Callable] = None):
        self.name = name.lower()
        super().__init__(root, transform, pre_transform)
        self.data, self.slices = torch.load(self.processed_paths[0])
        
    @property
    def raw_dir(self) -> str:
        return osp.join(self.root, self.name, 'raw')

    @property
    def processed_dir(self) -> str:
        return osp.join(self.root, self.name, 'processed')

    @property
    def raw_file_names(self) -> str:
        return f'{self.name}.npz'

    @property
    def processed_file_names(self) -> str:
        return 'data.pt'

    def download(self):
        download_url(self.url.format(self.name), self.raw_dir)

    def process(self):
        data = read_npz(self.raw_paths[0])
        data = data if self.pre_transform is None else self.pre_transform(data)
        data, slices = self.collate([data])
        torch.save((data, slices), self.processed_paths[0])

    def __repr__(self) -> str:
        return f'{self.name.capitalize()}Full()'

def read_npz(path):
    with np.load(path, allow_pickle=True) as f:
        return parse_npz(f)


def parse_npz(f):
    crash_time = f['crash_time']
    x = torch.from_numpy(f['x']).to(torch.float)
    coords = torch.from_numpy(f['coordinates']).to(torch.float)
    edge_attr = torch.from_numpy(f['edge_attr']).to(torch.float)
    cnt_labels = torch.from_numpy(f['cnt_labels']).to(torch.long)
    occur_labels = torch.from_numpy(f['occur_labels']).to(torch.long)
    edge_attr_dir = torch.from_numpy(f['edge_attr_dir']).to(torch.float)
    edge_attr_ang = torch.from_numpy(f['edge_attr_ang']).to(torch.float)
    severity_labels = torch.from_numpy(f['severity_8labels']).to(torch.long)
    edge_index = torch.from_numpy(f['edge_index']).to(torch.long).t().contiguous()
    return Data(x=x, y=occur_labels, severity_labels=severity_labels, edge_index=edge_index, 
                edge_attr=edge_attr, edge_attr_dir=edge_attr_dir, edge_attr_ang=edge_attr_ang, 
                coords=coords, cnt_labels=cnt_labels, crash_time=crash_time)


dataset = TRAVELDataset('file_path', 'Dallas_TX')
data = dataset[0]
print(f'Number of graphs: {len(dataset)}')
print(f'Number of node features: {dataset.num_features}')
print(f'Number of edge features: {dataset.num_edge_features}')
print(f'Number of classes: {dataset.num_classes}')
print(f'Number of nodes: {data.num_nodes}')
print(f'Number of edges: {data.num_edges}')
print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}')
print(f'Contains isolated nodes: {data.has_isolated_nodes()}')
print(f'Contains self-loops: {data.has_self_loops()}')
print(f'Is undirected: {data.is_undirected()}')

State-level datasets with size over 100 MB are stored in Google Drive

Please note that we do not have ownership of the data and therefore cannot provide a license or control its use. However, we kindly request that the data only be used for research purposes.

About

Traffic accident prediction using graph neural networks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published