The SkyEye dataset is the first aerial dataset for monitoring intersections with mixed traffic and lane-less behavior. Around 1 hour of video each from 4 intersections, namely, Paldi (P), Nehru bridge - Ashram road (N), Swami Vivekananda bridge - Ashram road (V), and APMC market (A) in the city of Ahmedabad, India.
These intersections were considered because of the diverse traffic conditions they present.
The videos were captured using a DJI Phantom 4 Pro drone at 50 frames per second in 4K resolution (4096x2160).
There are 50,000 frames in total with 4,021 distinct road user tracks are annotated. A detailed breakdown is below:Number of unique road users
Intersection | car | bus | motorbike | auto-rickshaw | truck | van | pedestrains |
---|---|---|---|---|---|---|---|
P | 175 | 54 | 881 | 494 | 45 | 16 | 226 |
V | 132 | 9 | 627 | 195 | 7 | 0 | 9 |
N | 41 | 8 | 275 | 99 | 12 | 6 | 33 |
A | 73 | 6 | 402 | 135 | 43 | 0 | 81 |
Total | 421 | 77 | 2185 | 971 | 107 | 22 | 349 |
The Skyeye dataset is available as images with bounding box annotations for road user localization and type detection or videos with tracks extracted from every road user for road-user tracking. Additionally, we also provide labeled collision prone tracks
- Dataset consists of 49,652 images in 4096x2160 here or sliced into 198,485 sliced images in 1920x1080 here
- Annotations for 4096x2160 images (in Pascal VOC XML format) here
- Annotations for 1920x1080 images (in CSV format) here
- Dataset consists of 5 videos that can be downloaded here
- Annotations (in MOT format) that can be downloaded here
frame_number, object_id, top_left_x, top_left_y, width, height, road_user_type
Road User type | Name |
---|---|
1 | car |
2 | bus |
3 | motorbike (includes all two-wheelers) |
4 | autorickshaw |
5 | truck |
6 | van |
7 | pedestrian |
The Retinanet architecure is trained for road user localization and type detection.
- The images are sliced into 1920x1080 tiles to train the Retinanet architecture. Download the sliced images here (68.1 GB)
- The train, test, and validation CSV files for benchmarking are as follows - train_annotations.csv, test_annotations.csv, and val_annotations.csv.
- The trained weights for our Retinanet model are uploaded here. We started with the resnet50_coco_best_v2.1.0.h5 model.
- For replicating the results shown here, use the trained model given above with the evaluate.py script on the test_annotations.csv file.
The meanAP for the trained model is 0.8175.
Road user type | Average Precision (AP) |
---|---|
car | 0.9747 |
bus | 0.9863 |
motorbike | 0.6136 |
autorickshaw | 0.9802 |
truck | 0.9568 |
van | 0.9695 |
pedestrian | 0.2413 |
For tracking, the SORT algorithm is evaluated as a preliminary benchmark. The user-defined detections were used for tracking.
Video name | Precision | Recall | False Acceptance Rate (FAR) |
---|---|---|---|
Paldi 1 | 7.2 | 7.2 | 19.53 |
Vivek 1 | 15.9 | 15.9 | 17.12 |
Nehru 1 | 8.8 | 8.8 | 11.53 |
APMC 1 | 8.6 | 8.6 | 15.65 |
Paldi 2 | 9.7 | 9.8 | 15.72 |
Some more tracking output videos with the DeepSORT tracker are available here
This dataset is provided for academic and research purposes only.
-
Debaditya Roy, Post-doctoral Researcher, Dept. of TSE, Nihon University
-
Tetsuhiro Ishizaka, Associate Professor, Dept. of TSE, Nihon University
-
Atsushi Fukuda, Professor, Dept. of TSE, Nihon University
- Yusuke Doi (土井 悠輔), Bachelor student, Dept. of TSE, Nihon University
- Sho Matsunoshita (松野下 翔), Bachelor student, Dept. of TSE, Nihon University
- Daichi Tashiro (田代 大智), Bachelor student, Dept. of TSE, Nihon University
- Kaoru Kuga (空閑 香), Bachelor student, Dept. of TSE, Nihon University
If you use this dataset, consider citing one of our papers.
@INPROCEEDINGS{roy2020defining,
author={D. {Roy} and Naveen Kumar {K.} and C. K. {Mohan}},
booktitle={2020 IEEE Intelligent Transportation Systems Conference (ITSC)},
title={Defining Traffic States based on Spatio-Temporal Traffic Graphs},
year={2020},
}
@article{roy2020detection,
title={Detection of Collision-Prone Vehicle Behavior at Intersections using Siamese Interaction LSTM},
author={Roy, Debaditya and Ishizaka, Tetsuhiro and Mohan, C Krishna and Fukuda, Atsushi},
journal={IEEE Transactions on Intelligent Transportation Systems},
year={2020},
publisher={IEEE}
}
This work has been conducted as the part of SATREPS project M2Smart “Smart Cities development for EmergingCountries by Multimodal Transport System based on Sensing, Network and Big Data Analysis of Regional Transportation” (JPMJSA1606) funded by JST and JICA