<a href="https://colab.research.google.com/github/QasimKhan5x/image-search-analysis-dip/blob/main/Metadata%20Creation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Metadata Creation (2nd Milestone of DIP Project)

This notebook is used to take an image as input and create a dictionary that contains metadata about the image. This metadata contains information related to the objects in that image as well as some attributes that describe specific objects.

**Sample Metadata Object Structure**
```
[
    34534543, # milvus_id
    {
    "supercategory": "vehicle",
    "category": "car",
    "attributes": [
                   "make": "Mercedes-Benz",
                   "model": "C-Class Sedan",
                   "year": 2012,
                   ],
    "milvus_id": 1
    },
]

```

This metadata is useful for improving the precision of a vector similarity search engine. The process is as follows:

1. Create metadata for all the images in your dataset
2. Store that metadata in a NoSQL database (MongoDB) or JSON file
3. When performing reverse image search, create metadata for your image
4. Query your vector embeddings collection to only find vectors that contain one or more of the objects in your metadata

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Installations, Imports, Model Declaration

In [None]:
!mkdir output
!mkdir input
!mkdir predictions
!wget https://dl.fbaipublicfiles.com/maskformer/mask2former/coco/panoptic/maskformer2_swin_large_IN21k_384_bs16_100ep/model_final_f07440.pkl
!pip install -q 'git+https://github.com/facebookresearch/detectron2.git'
!pip install -q deepface
!git clone https://github.com/QasimKhan5x/Mask2Former
%cd Mask2Former
!pip install -r requirements.txt
%cd ./mask2former/modeling/pixel_decoder/ops
!sh make.sh
%cd /content

In [3]:
import subprocess
import os
import json
import pickle
import requests
import json
from pathlib import Path

from PIL import Image
import numpy as np
import cv2
import torchvision.transforms as T
from scipy.io import loadmat

# Person Model
from deepface import DeepFace

# Cat Model

In [4]:
# Mask2Former Panoptic Segmentation

preds_dst = '/content/predictions/'
config_file = '/content/Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml'
m2f_weights = '/content/model_final_f07440.pkl'
coco_anns = json.loads(requests.get('https://raw.githubusercontent.com/cocodataset/panopticapi/master/panoptic_coco_categories.json').text)

In [None]:
# Car Model

# stores the annotations as a 2D array in which axis=1 contains single-item arrays
car_anns = loadmat('/content/drive/MyDrive/AttributeDetection/Car/cars_annos.mat')['class_names']
# converts the annotations to an intelligible list
car_anns = np.concatenate(car_anns['class_names'].flatten()).tolist()

ckpt = '/content/drive/MyDrive/AttributeDetection/Car/car.pt'
car_model = torch.load(ckpt).eval()

In [None]:
# Cat Model
cat_model = CatModel().eval()

## Panoptic Segmentation 

This section contains code to perform panoptic segmentation with Mask2Former and create an initial metadata object without attributes of objects.

**Notes**

1. Object should occupy atleast 1% of image.

Sample command to run Mask2Former on an image, stores its output in /content/output, and store the predictions dictionary in /content/predictions

In [None]:
'''
%cd /content/Mask2Former/demo
!python demo.py --config-file /content/Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml \
--input /content/input/tv_image05.png \
--preds_dest /content/predictions/tv_image05 \
--output /content/output \
--opts MODEL.WEIGHTS /content/model_final_f07440.pkl
%cd /content
'''

In [5]:
def panoptic_segment(img_path, config_file, weights):
    '''
    Provide the paths to
        1. Image
        2. Configuration file
        3. Weights
    Saves output in output and predictions directories
    '''
    %cd /content/Mask2Former/demo
    dst_fn = Path(img_path).stem
    dst_fp = os.path.join(preds_dst, dst_fn)
    cmd = f'python demo.py --config-file {config_file} ' \
          f'--input {img_path} ' \
          f'--preds_dest {dst_fp} ' \
          '--output /content/output ' \
          f'--opts MODEL.WEIGHTS {weights}'
    p = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True)
    out, err = p.communicate()
    %cd /content

In [7]:
def get_thresh(img):
    '''Only consider objects whose size > 1% of image size'''
    return img.size[0] * img.size[1] * 0.01

def get_bounding_box(img):
    '''Get locations of bounding box on an object'''
    # region of interest
    roi = np.argwhere(img == 255)
    # starting point --> top left corner
    y1, x1 = roi[:, 0].min(), roi[:, 1].min()
    # ending point --> bottom right corner
    y2, x2 = roi[:, 0].max(), roi[:, 1].max()
    return (x1, y1), (x2, y2)

def apply_bounding_box(img):
    '''Draw a bounding box on the image'''
    start, end = get_bounding_box(img)
    rect = cv2.rectangle(img, start, end, (255, 0, 0), 1)
    return rect.astype('uint8')

In [24]:
def get_metadata(img, labels, instances):
    '''
    For each object instance in the image,
    create a metadata object that contains the
        1. Image Supercategory
        2. Image Category
        3. Array representing the image
    (3) is temporarily saved so that it can be
    used for attributes prediction
    (1) and (2) are obtained through COCO annotations
    '''
    img_rgb = np.asarray(img)
    thresh = get_thresh(img)
    img_metadata = list()
    global coco_anns

    for instance in instances:
        if instance['area'] <= thresh or not instance['isthing']:
            continue

        instance_id = instance['id']
        cat_id = instance['category_id']
        
        metadata = dict()
        supercategory = coco_anns[cat_id]['supercategory']
        name = coco_anns[cat_id]['name']
        metadata['supercategory'] = supercategory
        metadata['category'] = name

        # get region of interest for current instance
        roi = np.where(labels == instance_id, 255, 0).astype('uint8')
        (x1, y1), (x2, y2) = get_bounding_box(roi)
        crop_rgb = np.zeros(img_rgb.shape, dtype='uint8')
        crop_rgb[y1:y2, x1:x2] = img_rgb[y1:y2, x1:x2]
        # crop
        crop_rgb = crop_rgb[y1:y2, x1:x2]
        metadata['image'] = crop_rgb
        img_metadata.append(metadata)
    return img_metadata

def create_base_metadata_from_imgpath(imgpath):
    global config_file
    global m2f_weights
    panoptic_segment(imgpath, config_file, m2f_weights)
    filename = os.path.basename(imgpath)
    filestem = filename.split('.')[0]
    preds_fp = '/content/predictions/' + filestem + ".pkl"
    with open(preds_fp, "rb") as f:
        preds = pickle.load(f)['panoptic_seg']
    img = Image.open(imgpath).convert("RGB")
    labels = preds[0].cpu().detach().numpy()
    instances = preds[1]
    metadata = get_metadata(img, labels, instances)
    return metadata

In [16]:
'''
metadata = create_metadata_from_imgpath('/content/input/2008_000880.jpg')
print(len(metadata))
# Display cropped image of object
Image.fromarray(metadata[0]['image'])
'''

/content/Mask2Former/demo
/content


## Image Classification

In [27]:
def get_person_attributes(img):
    '''
    Return attributes if face found
    Else return {}
    '''
    # if face not found
    detectors = ['opencv', 'ssd', 'dlib', 'mtcnn', 'retinaface']
    face_found = False
    for detector in detectors:
        try:
            img = DeepFace.detectFace(img, detector_backend = detector)   
            face_found = True
            break
        except Exception:
            pass

    if not face_found:
        return {}

    demography = DeepFace.analyze(img, ['age', 'gender', 'race', 'emotion'], enforce_detection=False)

    if demography['age'] < 13:
        age_cat = 'child'
    elif demography['age'] < 20:
        age_cat = 'teenager'
    elif demography['age'] < 40:
        age_cat = 'adult'
    elif demography['age'] < 65:
        age_cat = 'middle_aged'
    else:
        age_cat = 'old'

    attributes = {
        'gender': demography["gender"],
        'race': demography["dominant_race"],
        'emotion': demography["dominant_emotion"],
        'age': age_cat
    }

    return attributes

In [1]:
def get_cat_attributes(img):
    pass

In [None]:
def get_car_attributes(img):
    transform = T.Compose([T.Resize((400, 400)),
                           T.ToTensor(),
                           T.Normalize((0.5, 0.5, 0.5), 
                                       (0.5, 0.5, 0.5))])
    
    tensor = transform(img).float().unsqueeze(0)
    if torch.cuda.is_avalable():
        tensor = tensor.cuda()
    
    output = car_model(image)
    _, predicted = torch.max(output.data, 1)
    label = car_anns[predicted.item()]

    # separate by spaces
    # 1st token -> make
    # 2nd to penultimate -> model
    # last -> year
    # fails if make consists of more than 1 token
    tokens = label.split()

    return {
        'make': tokens[0],
        'model': " ".join(tokens[1:-1])
        'year': int(tokens[-1])
    }

In [1]:
def add_attributes_to_metadata(metadata):
    '''
    Apply available image classification models on the metadata
    Currently, the following models are used:
        1. Person - ethnicity, age, gender, mood
        2. Cat - breed
        3. Car - make, model, year
    '''
    for instance in metadata:
        img = Image.fromarray(instance['image'])
        if image['category'] == 'person':
            image['metadata'] = get_person_attributes(img)
        elif image['category'] == 'cat':
            image['metadata'] = get_cat_attributes(img)
        elif image['category'] == 'car':
            image['metadata'] = get_car_attributes(img)
        else:
            image['metadata'] = {}
        # the image is no longer needed
        del instance['image']
    return metadata

## Final Metadata Creation

In [None]:
def create_metadata_from_imgpath(imgpath):
    metadata = create_base_metadata_from_imgpath(imgpath)
    metadata_with_attributes = add_attributes_to_metadata(metadata)
    return metadata_with_attributes

## Update PASCAL VOC 2012

This section contains the code to create metadata objects for the aforementioned dataset. This section is present because we are using VOC'12 for our project demo.

Since we will be annotating our images with their respective `milvus_id`s as well, a JSON file containing the `milvus_id` for each image name is required.

In [5]:
filepath = '/content/drive/MyDrive/AttributeDetection/mid_filename.json'
with open(filepath) as f:
    imgname2mid = json.load(f)

### Load Dataset from Kaggle

In [2]:
kaggle_creds = '/content/drive/MyDrive/AttributeDetection/kaggle.json'
!pip install -q kaggle
!mkdir ~/.kaggle
!cp $kaggle_creds ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle datasets download huanghanchina/pascal-voc-2012
!unzip pascal-voc-2012.zip

In [8]:
IMGS_DIR = '/content/VOC2012/JPEGImages'
images = os.listdir()

In [None]:
for img in images:
    img_path = os.path.join(IMGS_DIR, img)
    metadata = create_metadata_from_imgpath(img_path)
    mid = imgname2mid[img]
    metadata.insert(0, mid)
with open("/content/metadata.json", "w") as f:
    json.dump(metadata, f)