# Evaluation Guide for TAP-B


## Overview

We provide three simple scripts to evaluate **TAP** on ***Instance Segmentation***, ***Instance Classification*** and ***Region Caption*** tasks respectively.

## Setup

Necessary datasets, imports and models for evaluation.

```
datasets
|_ coco
|  |_ train2017
|  |  |_ 000000000009.jpg
|  |  |_ ...
|  |_ val2017
|  |  |_ 000000000139.jpg
|  |  |_ ...
|  |_ annotations
|  |  |_ coco_instances_val2017.json
|  |_ results
|  |  |_ coco_instances_val2017_vitdet_h_cascade.json  # Run detectron2 to generate this file.
|- lvis
|  |_ annotations
|  |  |_ lvis_val_v1.json
|  |_ results
|  |  |_ lvis_val_v1_vitdet_h_cascade.json  # Run detectron2 to generate this file.
|_ vg
|  |_ images
|  |  |_ 1.jpg
|  |  |_ ...
|  |_ annotations
|  |  |_ test.json  # https://datarelease.blob.core.windows.net/grit/VG_preprocessed_annotations/test.json
```

In [1]:
import sys
sys.path.append("..")

class AttrDict(dict):
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self

# global arguments.
args = AttrDict(read_every=100, prompt_size=256)
args.model_type = "tap_vit_b"
args.checkpoint = "../models/tap_vit_b_v1_0.pkl"
args.device = [0, 1, 2, 3, 4, 5, 6, 7]


## Evaluation: Instance Segmentation on COCO

In [2]:
from scripts.eval_seg import main

args.images_dir = "../datasets/coco/val2017"
args.gt_json_file = "../datasets/coco/annotations/coco_instances_val2017.json"
args.det_json_file = "../datasets/coco/results/coco_instances_val2017_vitdet_h_cascade.json"
main(args)


  from .autonotebook import tqdm as notebook_tqdm


92857 instances in 5000 images.
im_process: 5000/5000 [0.065s + 0.014s] (eta: 0:00:00)
Writing segmentations to /data/workspace/models/tokenize-anything/scripts/../outputs/coco_segmentations.json

Evaluating COCO segmentations...
loading annotations into memory...
Done (t=0.49s)
creating index...
index created!
Loading and preparing results...
DONE (t=1.66s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *segm*
DONE (t=26.46s).
Accumulating evaluation results...
DONE (t=3.70s).
Summary:
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.449
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.711
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.479
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.280
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.499
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets

## Evaluation: Instance Segmentation on LVIS

In [3]:
from scripts.eval_seg import main

args.images_dir = "../datasets/coco/val2017"
args.gt_json_file = "../datasets/lvis/annotations/lvis_val_v1.json"
args.det_json_file = "../datasets/lvis/results/lvis_val_v1_vitdet_h_cascade.json"
main(args)


3293288 instances in 19809 images.
im_process: 19809/19809 [0.113s + 0.131s] (eta: 0:00:00)
Writing segmentations to /data/workspace/models/tokenize-anything/scripts/../outputs/lvis_segmentations.json

Evaluating LVIS segmentations...
Summary:
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=all] = 0.417
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=300 catIds=all] = 0.607
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=300 catIds=all] = 0.442
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     s | maxDets=300 catIds=all] = 0.289
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     m | maxDets=300 catIds=all] = 0.545
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     l | maxDets=300 catIds=all] = 0.637
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  r] = 0.331
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  c] = 0.427
 Average Precision  (AP) @[ IoU=0.50

## Evaluation: Instance Classification on LVIS

In [4]:
from scripts.eval_cls import main

args.images_dir = "../datasets/coco/val2017"
args.gt_json_file = "../datasets/lvis/annotations/lvis_val_v1.json"
args.concept = "../concepts/lvis_1203.pkl"
args.max_dets = 300
main(args)


244707 instances in 19809 images.
im_process: 19809/19809 [0.051s + 0.021s] (eta: 0:00:00)
Writing detections to /data/workspace/models/tokenize-anything/scripts/../outputs/lvis_detections.json

Evaluating LVIS detections...
Summary:
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=all] = 0.564
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=300 catIds=all] = 0.568
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=300 catIds=all] = 0.565
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     s | maxDets=300 catIds=all] = 0.422
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     m | maxDets=300 catIds=all] = 0.695
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     l | maxDets=300 catIds=all] = 0.806
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  r] = 0.556
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  c] = 0.555
 Average Precision  (AP) @[ IoU=0.50:0.95 | ar

## Evaluation: Region Caption on Visual Genome

In [5]:
from scripts.eval_cap import main

args.images_dir = "../datasets/vg/images"
args.gt_json_file = "../datasets/vg/annotations/test.json"
main(args)


232935 instances in 5000 images.
im_process: 5000/5000 [0.162s] (eta: 0:00:00)
Evaluating captions...
Bleu [0.36803494977180123, 0.23047192753065446, 0.16138780380493958, 0.11938476662571457]
METEOR 0.17416393195210034
Rouge 0.354880694034182
CIDEr 1.4921442501120745
