# Object Ontology (IS-A)

We aim to find sub-type candidates for objects, and determine the
best candidate when resolving annotation ambiguity in region captions.

Given object, find sub-types which end with object name.

e.g. ball --> {snowball, baseball, soccer ball}

#### Approach

For an object (O) & associated candidate set (C), we disambiguate the sub-type,
using the Image (I) & Dense Captions (T)

e.g. I = 'path-to-img'; T= 'car, road, meter, wheel, ...'; O = meter, C = {thermometer, parking meter}

--> parking meter

In [1]:
import requests
from tqdm import tqdm
from typing import List, Dict, Union
from utils import read_json, save_json, sort_dict

In [None]:
object2subtypes = read_json('../data/??????')

In [4]:
from models import Image2TextSimilarity, Text2TextSimilarity

# Models
im2txt = Image2TextSimilarity('cpu')
txt2txt = Text2TextSimilarity('cpu')

In [25]:
im = 'https://i.ytimg.com/vi/auUaaGHBgok/maxresdefault.jpg'
im = requests.get(im, stream=True).raw

anchor = 'can'

candidates = list(object2subtypes[anchor])

res = im2txt.inference(im, anchor, candidates, top_k=10)

print(res, '\n\n\n', candidates)

(0.202778622508049,
 {'metal garbage can': 0.2734043002128601,
  'metal trashcan': 0.2729678153991699,
  'metal trash can': 0.26972121000289917,
  'steel trash can': 0.2647626996040344,
  'bathroom trashcan': 0.2619898319244385,
  'garbage can': 0.25501561164855957,
  'trashcan': 0.24258899688720703,
  'metal can': 0.21338750422000885,
  'tin can': 0.2028762251138687,
  'can': 0.202778622508049,
  'water can': 0.20134201645851135,
  'aluminum can': 0.1989527940750122,
  'soda can': 0.19333428144454956,
  'beer can': 0.19059841334819794,
  'paint can': 0.1849059760570526,
  'drink can': 0.18371596932411194,
  'coca cola can': 0.17185527086257935,
  'bull can': 0.1696702390909195,
  'pepsi can': 0.16818806529045105,
  'coke can': 0.1628076434135437,
  'american': 0.15299029648303986,
  'pecan': 0.15161356329917908,
  'toucan': 0.14703787863254547,
  'pelican': 0.1459970772266388,
  'watering can': 0.1437484323978424,
  'word american': 0.1397479772567749})

In [None]:
txt2txt.inference()

## TODO: Incorporate Region-level information

We can use bounding boxes to assign unique instances of a class.
Use cropped features to select the candidate subtype.

### Two-stage approach:

#### Image-level Context for Filtering
Image (anchor) --> Objects in captions (anchor)

#### Region-level Feature for Selection
--> Visual BBox ft. (best)

If the bbox_area < 224*224, increase the bbox (h,w) about the center.

<br><br><br>

### Visualize Subtype with Image BBoxes

In [None]:
regions = read_json('../VG/regions/r_20k.json')

region2subtypes = read_json('?????')