Skip to content

Latest commit

 

History

History
153 lines (129 loc) · 4.88 KB

README.md

File metadata and controls

153 lines (129 loc) · 4.88 KB

Prepare Datasets for ZegFormer

A dataset can be used by accessing DatasetCatalog for its data, or MetadataCatalog for its metadata (class names, etc). This document explains how to setup the builtin datasets so they can be used by the above APIs. Use Custom Datasets gives a deeper dive on how to use DatasetCatalog and MetadataCatalog, and how to add new datasets to them.

ZegFormer has builtin support for a few datasets. The datasets are assumed to exist in a directory specified by the environment variable DETECTRON2_DATASETS. Under this directory, detectron2 will look for datasets in the structure described below, if needed.

$DETECTRON2_DATASETS/
  coco/
  ADE20K_2021_17_01/

You can set the location for builtin datasets by export DETECTRON2_DATASETS=/path/to/datasets. If left unset, the default is ./datasets relative to your current working directory.

Prepare data for COCO-Stuff:

Expected data structure

coco/
  coco_stuff/
    annotations/
      train2017/
        000000144874.png
        ...
      val2017/
        000000213035.png
        ...
    images/
      train2017/
        000000189148.jpg
        ...   
      val2017/
        000000213547.jpg
        ...
    word_vectors/
      fasttext.pkl
      glove.pkl
      word2vec.pkl    
    # below are generated by prepare_coco_stuff_sem_seg.py
    split/
      seen_cls.npy
      val_cls.npy
      novel_cls.npy
      seen_classnames.json
      unseen_classnames.json
      all_classnames.json
      ...
    annotations_detectron2/
      train2017/
      val2017_unseen/ 

Get the COCO (2017) images from https://cocodataset.org/

wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip

Get the COCO-Stuff annotation from https://github.com/nightrome/cocostuff.

wget http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip

Unzip train2017.zip, val2017, and stuffthingmaps_trainval2017.zip. Then put them to the correct location listed above.

Split the classes into seen and unseen for training and testing.

python datasets/coco-stuff/create_cocostuff_class_names_json.py

Generate the labels for training and testing.

python datasets/coco-stuff/prepare_coco_stuff_sem_seg_seen.py
python datasets/coco-stuff/prepare_coco_stuff_sem_seg_unseen.py
python datasets/coco-stuff/prepare_coco_stuff_sem_seg_val_all.py

Prepare data for ADE20k-Full:

Download the data of ADE20k-Full from https://groups.csail.mit.edu/vision/datasets/ADE20K/request_data/

Expected data structure

ADE20K_2021_17_01/
  images/
  images_detectron2_freq/
  annotations_detectron2_freq/
  index_ade20k.pkl
  index_ade20k.mat
  objects.txt
  ADE20K_275_pure_class.json
  ADE20K_572_pure_class.json
  ADE20K_847_pure_class.json

The ADE20K_275_pure_class.json, ADE20K_572_pure_class.json, ADE20K_847_pure_class.json, images_detectron2 and annotations_detectron2 are generated by the following scripts

python datasets/ade20k-full-frequency-split/create_ade-frequency_json.py
python datasets/ade20k-full-frequency-split/prepare_ade20k_full_frequency_all_val.py
python datasets/ade20k-full-frequency-split/prepare_ade20k_full_frequency_seen.py
python datasets/ade20k-full-frequency-split/prepare_ade20k_full_frequency_unseen_val.py

Prepare data for PASCAL VOC:

We follow the CaGNet to set up the training and testing data of PASCAL VOC. We also create a copy on the google drive for the convenience.

Expected data structure

VOCZERO/
  images/
    train/
        2011_003261.jpg
        ...
    val/
        2011_003145.jpg
        ...
  annotations/
    train/
        2011_003255.png
        ...
    val/
        2011_003103.png
        ...
  all_classnames.json
  seen_classnames.json
  unseen_classnames.json
  annotations_detectron2/
    train_seen
    
python datasets/pascal/create_voc_class_names_json.py
python datasets/pascal/prepare_pascal_voc_seen.py
python datasets/pascal/prepare_pascal_voc_unseen_val.py
python datasets/pascal/prepare_pascal_voc_val_all.py