The currently supported datasets are - Pascal VOC, MS-COCO, ImageNet-DET and ImageNet-VID
The datasets should be stored in the following directory structure
VidDet/ └── datasets/ ├── ImageNetDET (170.8 GB) ├── ImageNetVID (409.9 GB) ├── MSCoco (84.9 GB) ├── PascalVOC (9.8 GB) └── # version controlled files
The datasets can be downloaded from my Google Drive:
It's possible to combine all four datasets into one larger dataset with the utilisation of the CombinedDetection()
dataset specified in combined.py
Following ideas from YOLO-9k with utilising the WordNet structure classes have been manually matched across datasets, furthermore a hierarchical tree structure has been generated for the classes. This is visualised below and is specified in trees/
, with the main tree (inclusive of ImageNet-DET) specified in trees/filtered_det.tree
These are the training
split statistics, also samples in ImageNet-VID are calculated on a clip basis not a frame basis