# Evaluation Guide for TAP-B


## Overview

We provide three simple scripts to evaluate **TAP** on ***Instance Segmentation***, ***Instance Classification*** and ***Region Caption*** tasks respectively.

## Setup

Necessary datasets, imports and models for evaluation.

```
datasets
|_ coco
|  |_ train2017
|  |  |_ 000000000009.jpg
|  |  |_ ...
|  |_ val2017
|  |  |_ 000000000139.jpg
|  |  |_ ...
|  |_ annotations
|  |  |_ coco_instances_val2017.json
|  |_ results
|  |  |_ coco_instances_val2017_vitdet_h_cascade.json  # Run detectron2 to generate this file.
|- lvis
|  |_ annotations
|  |  |_ lvis_val_v1.json
|  |_ results
|  |  |_ lvis_val_v1_vitdet_h_cascade.json  # Run detectron2 to generate this file.
|_ vg
|  |_ images
|  |  |_ 1.jpg
|  |  |_ ...
|  |_ annotations
|  |  |_ test.json  # https://datarelease.blob.core.windows.net/grit/VG_preprocessed_annotations/test.json
```

In [1]:
import sys
sys.path.append("..")

class AttrDict(dict):
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self

# global arguments.
args = AttrDict(read_every=100, prompt_size=256)
args.model_type = "tap_vit_b"
args.checkpoint = "../models/tap_vit_b_v1_1.pkl"
args.device = [0, 1, 2, 3, 4, 5, 6, 7]


## Evaluation: Instance Segmentation on COCO

In [2]:
from scripts.eval_seg import main

args.images_dir = "../datasets/coco/val2017"
args.gt_json_file = "../datasets/coco/annotations/coco_instances_val2017.json"
args.det_json_file = "../datasets/coco/results/coco_instances_val2017_vitdet_h_cascade.json"
main(args)


92857 instances in 5000 images.
im_process: 5000/5000 [0.071s + 0.014s] (eta: 0:00:00)
Writing segmentations to /data/workspace/models/tokenize-anything/scripts/../outputs/coco_segmentations.json

Evaluating COCO segmentations...
loading annotations into memory...
Done (t=0.47s)
creating index...
index created!
Loading and preparing results...
DONE (t=1.55s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *segm*
DONE (t=24.56s).
Accumulating evaluation results...
DONE (t=3.62s).
Summary:
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.451
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.712
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.481
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.287
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.501
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets

## Evaluation: Instance Segmentation on LVIS

In [3]:
from scripts.eval_seg import main

args.images_dir = "../datasets/coco/val2017"
args.gt_json_file = "../datasets/lvis/annotations/lvis_val_v1.json"
args.det_json_file = "../datasets/lvis/results/lvis_val_v1_vitdet_h_cascade.json"
main(args)


3293288 instances in 19809 images.
im_process: 19809/19809 [0.131s + 0.127s] (eta: 0:00:00)
Writing segmentations to /data/workspace/models/tokenize-anything/scripts/../outputs/lvis_segmentations.json

Evaluating LVIS segmentations...
Summary:
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=all] = 0.422
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=300 catIds=all] = 0.609
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=300 catIds=all] = 0.450
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     s | maxDets=300 catIds=all] = 0.295
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     m | maxDets=300 catIds=all] = 0.550
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     l | maxDets=300 catIds=all] = 0.640
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  r] = 0.336
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  c] = 0.431
 Average Precision  (AP) @[ IoU=0.50

## Evaluation: Instance Classification on LVIS

In [4]:
from scripts.eval_cls import main

args.images_dir = "../datasets/coco/val2017"
args.gt_json_file = "../datasets/lvis/annotations/lvis_val_v1.json"
args.concept = "../concepts/lvis_1203.pkl"
args.max_dets = 300
main(args)


  from .autonotebook import tqdm as notebook_tqdm


244707 instances in 19809 images.
im_process: 19809/19809 [0.057s + 0.020s] (eta: 0:00:00)
Writing detections to /data/workspace/models/tokenize-anything/scripts/../outputs/lvis_detections.json

Evaluating LVIS detections...
Summary:
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=all] = 0.574
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=300 catIds=all] = 0.579
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=300 catIds=all] = 0.574
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     s | maxDets=300 catIds=all] = 0.431
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     m | maxDets=300 catIds=all] = 0.701
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     l | maxDets=300 catIds=all] = 0.805
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  r] = 0.586
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  c] = 0.568
 Average Precision  (AP) @[ IoU=0.50:0.95 | ar

## Evaluation: Region Caption on Visual Genome

In [5]:
from scripts.eval_cap import main

args.images_dir = "../datasets/vg/images"
args.gt_json_file = "../datasets/vg/annotations/test.json"
main(args)


232935 instances in 5000 images.
im_process: 5000/5000 [0.185s] (eta: 0:00:00)
Evaluating captions...
Bleu [0.3737476039273303, 0.23526039725915165, 0.16513572981883956, 0.12226500630696996]
METEOR 0.17705993786192956
Rouge 0.359520834691781
CIDEr 1.5234709370648996
