Skip to content
/ NICE Public

The offical implementation of "NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning".

License

Notifications You must be signed in to change notification settings

Mr-Neko/NICE

Repository files navigation

NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning

The offical implementation of "NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning".

News

  • [6/23] 🔥 We released the checkpoint trained on PNG and Flickr30K Entites. Please see documentations.

What is NICE?

NICE is a multi-task collaborative cascaded framework for Panoptic Narrative Segmentation and Panoptic Narrative Detection (Visual Grounding). It introduces a novel insight that is "mask first and box next".

Instruction

Environments

You need the Pytorch >= 1.7.1, and follow the command that:

conda create -n nice python=3.7.15
conda activate nice
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
pip install -r requirements.txt

After that, please follow the instruction of detectron2 to install detectron2 for the enviroment with:

python -m pip install -e detectron2

Dataset

  1. Download the 2017 MSCOCO Dataset from its official webpage. You will need the train and validation splits' images and panoptic segmentations annotations.
  2. Download the Panoptic Narrative Grounding Benchmark from the PNG's project webpage. Organize the files as follows:
NICE
|_ panoptic_narrative_grounding
   |_ images
   |  |_ train2017
   |  |_ val2017
   |_ annotations
   |  |_ png_coco_train2017.json
   |  |_ png_coco_val2017.json
   |  |_ panoptic_segmentation
   |  |  |_ train2017
   |  |  |_ val2017
   |  |_ panoptic_train2017.json
   |  |_ panoptic_val2017.json
|_ data
  1. Pre-process the Panoptic narrative Grounding Ground-Truth Annotation for the dataloader using utils/pre_process.py.
  2. At the end of this step you should have two new files in your annotations folder.
panoptic_narrative_grounding
|_ annotations
   |_ png_coco_train2017.json
   |_ png_coco_val2017.json
   |_ png_coco_train2017_dataloader.json
   |_ png_coco_val2017_dataloader.json
   |_ panoptic_segmentation
   |  |_ train2017
   |  |_ val2017
   |_ panoptic_train2017.json
   |_ panoptic_val2017.json

Pretrained Bert Model and PFPN

The pre-trained checkpoint can be downloaded from here, and the folder should be like:

pretrained_models
|_fpn
|  |_model_final_cafdb1.pkl
|_bert
|  |_bert-base-uncased
|  |  |_pytorch_model.bin
|  |  |_bert_config.json
|  |_bert-base-uncased.txt

Train and Inference

Modify the routes in train_net.sh according to your local paths. If you want to only test the pretrained model, add --ckpt_path ${PRETRAINED_MODEL_PATH} and --test_only.

The checkpoint trained on PNG and Flickr30K Entites is released, you can download it and test it!

Customized

You can try to only train the PND task (visual grounding) with ignore the output of masks. We provide the Flickr30K Entries to test the PND task only. You can check the train_net_flickr.py and replace it on the main.py to adapt the PND task. For dataset, you can download the Flickr30K annotations from here and the image from the offical.

Acknowledge

Some of the codes are built upon K-Net and PNG. Thanks them for their great works!

About

The offical implementation of "NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages