<a href="https://colab.research.google.com/github/QasimKhan5x/image-search-analysis/blob/main/Metadata_Creation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Metadata Creation (2nd Milestone of DIP Project)

This notebook is used to take an image as input and create a dictionary that contains metadata about the image. This metadata contains information related to the objects in that image as well as some attributes that describe specific objects.

**Sample Metadata Object Structure**
```
[
    {
        "milvus_id": 123456,
        "name": "2007_001234.jpg"
    },
    {
    "supercategory": "vehicle",
    "category": "car",
    "attributes": [
                   "make": "Mercedes-Benz",
                   "model": "C-Class Sedan",
                   "year": 2012,
                   ],
    },
]

```

This metadata is useful for improving the precision of a vector similarity search engine. The process is as follows:

1. Create metadata for all the images in your dataset
2. Store that metadata in a NoSQL database (MongoDB) or JSON file
3. When performing reverse image search, create metadata for your image
4. Query your vector embeddings collection to only find vectors that contain one or more of the objects in your metadata

Run all sections until **Final Metadata Creation**. The last section is exclusively to generate metadata for PASCAL VOC 2012.

**Personal Notes**
- If Mask2Former is unable to detect an object, say a person, then even if **DeepFace** can find that person, he/she will not be registered as metadata. Possible solution of above are:
    1. Specifically for persons, use DeepFace and Mask2Former both
    2. Finetune Mask2Former on a custom dataset (for future use)

## Installations, Imports, Model Declaration

In [1]:
!mkdir output
!mkdir input
!mkdir predictions
# mask2former weights
!wget https://dl.fbaipublicfiles.com/maskformer/mask2former/coco/panoptic/maskformer2_swin_large_IN21k_384_bs16_100ep/model_final_f07440.pkl
# detectron
!pip install -q 'git+https://github.com/facebookresearch/detectron2.git'
!pip install -q deepface gdown -U
!gdown -qq https://drive.google.com/drive/folders/1NYasnQ9aSG9KuFKO0moeN4JCwHfI7avq -O /content/ --folder
!git clone https://github.com/QasimKhan5x/Mask2Former
%cd Mask2Former
!pip install -r requirements.txt
%cd ./mask2former/modeling/pixel_decoder/ops
!sh make.sh
%cd /content

--2022-01-07 17:40:54--  https://dl.fbaipublicfiles.com/maskformer/mask2former/coco/panoptic/maskformer2_swin_large_IN21k_384_bs16_100ep/model_final_f07440.pkl
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 172.67.9.4, 104.22.75.142, 104.22.74.142, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|172.67.9.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 866223934 (826M) [binary/octet-stream]
Saving to: ‘model_final_f07440.pkl’


2022-01-07 17:41:28 (25.2 MB/s) - ‘model_final_f07440.pkl’ saved [866223934/866223934]

[K     |████████████████████████████████| 49 kB 2.7 MB/s 
[K     |████████████████████████████████| 74 kB 2.2 MB/s 
[K     |████████████████████████████████| 145 kB 49.1 MB/s 
[K     |████████████████████████████████| 130 kB 43.7 MB/s 
[K     |████████████████████████████████| 843 kB 50.0 MB/s 
[K     |████████████████████████████████| 749 kB 43.9 MB/s 
[K     |████████████████████████████████| 596 kB 33.8 MB/s

In [2]:
import subprocess
import os
import logging
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import json
import pickle
import requests
import json
from tqdm.notebook import trange, tqdm
from pathlib import Path

import torch
from PIL import Image
import numpy as np
import cv2
import torchvision.transforms as T
from scipy.io import loadmat
import matplotlib.pyplot as plt


# Dog Model
import tensorflow as tf
tf.get_logger().setLevel(3)
from tensorflow.keras.models import load_model,Model
from tensorflow.keras.applications.resnet_v2 import preprocess_input

# Hide GPU from visible devices
tf.config.set_visible_devices([], 'GPU')

# Person Model
from deepface import DeepFace

Directory  /root /.deepface created
Directory  /root /.deepface/weights created


In [3]:
# Mask2Former Panoptic Segmentation

preds_dst = '/content/predictions/'
config_file = '/content/Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml'
m2f_weights = '/content/model_final_f07440.pkl'
coco_anns = json.loads(requests.get('https://raw.githubusercontent.com/cocodataset/panopticapi/master/panoptic_coco_categories.json').text)

In [4]:
# Car Model

# stores the annotations as a 2D array in which axis=1 contains single-item arrays
car_anns = loadmat('/content/AttributeDetection/Car/cars_annos.mat')['class_names']
# converts the annotations to an intelligible list
car_anns = np.concatenate(car_anns.flatten()).tolist()

ckpt = '/content/AttributeDetection/Car/car.pt'
car_model = torch.load(ckpt, map_location='cpu').eval()

In [5]:
# Dog Model

# get labels of dog classes
with open("/content/AttributeDetection/labels.txt") as f:
    lbls = f.readlines()
dog_labels = [breed.strip() for breed in lbls]
dog_labels.remove('')

dog_model = tf.keras.models.load_model('/content/AttributeDetection/DBC.h5')

## Panoptic Segmentation 

This section contains code to perform panoptic segmentation with Mask2Former and create an initial metadata object without attributes of objects.

**Notes**

1. Object should occupy atleast 1% of image. This comes out as a major drawback for several cases and needs to be revised. For example, objects that are inherently small will always occupy <1% of an image.

Sample command to run Mask2Former on an image, stores its output in /content/output, and store the predictions dictionary in /content/predictions

In [6]:
'''
%cd /content/Mask2Former/demo
!python demo.py --config-file /content/Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml \
--input /content/input/man.jpg \
--preds_dest /content/predictions/demo \
--output /content/output \
--opts MODEL.WEIGHTS /content/model_final_f07440.pkl
%cd /content
'''

'\n%cd /content/Mask2Former/demo\n!python demo.py --config-file /content/Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml --input /content/input/man.jpg --preds_dest /content/predictions/demo --output /content/output --opts MODEL.WEIGHTS /content/model_final_f07440.pkl\n%cd /content\n'

In [7]:
def panoptic_segment(img_path, config_file, weights):
    '''
    Provide the paths to
        1. Image
        2. Configuration file
        3. Weights
    Saves output in output and predictions directories
    '''
    %cd /content/Mask2Former/demo
    dst_fn = Path(img_path).stem
    dst_fp = os.path.join(preds_dst, dst_fn)
    torch.cuda.empty_cache()
    cmd = f'python demo.py --config-file {config_file} ' \
          f'--input {img_path} ' \
          f'--preds_dest {dst_fp} ' \
          f'--opts MODEL.WEIGHTS {weights}'
    p = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True)
    out, err = p.communicate()
    %cd /content

In [8]:
def get_thresh(img):
    '''Only consider objects whose size > 1% of image size'''
    return img.size[0] * img.size[1] * 0.01

def get_bounding_box(img):
    '''Get locations of bounding box on an object'''
    # region of interest
    roi = np.argwhere(img == 255)
    # starting point --> top left corner
    y1, x1 = roi[:, 0].min(), roi[:, 1].min()
    # ending point --> bottom right corner
    y2, x2 = roi[:, 0].max(), roi[:, 1].max()
    return (x1, y1), (x2, y2)

def apply_bounding_box(img):
    '''Draw a bounding box on the image'''
    start, end = get_bounding_box(img)
    rect = cv2.rectangle(img, start, end, (255, 0, 0), 1)
    return rect.astype('uint8')

In [9]:
def get_metadata(img, labels, instances):
    '''
    For each object instance in the image,
    create a metadata object that contains the
        1. Image Supercategory
        2. Image Category
        3. Array representing the image
    (3) is temporarily saved so that it can be
    used for attributes prediction
    (1) and (2) are obtained through COCO annotations
    '''
    img_rgb = np.asarray(img)
    thresh = get_thresh(img)
    img_metadata = list()
    global coco_anns

    for instance in instances:
        if instance['area'] <= thresh or not instance['isthing']:
            continue

        instance_id = instance['id']
        cat_id = instance['category_id']
        
        metadata = dict()
        supercategory = coco_anns[cat_id]['supercategory']
        name = coco_anns[cat_id]['name']
        metadata['supercategory'] = supercategory
        metadata['category'] = name

        # get region of interest for current instance
        roi = np.where(labels == instance_id, 255, 0).astype('uint8')
        (x1, y1), (x2, y2) = get_bounding_box(roi)
        crop_rgb = np.zeros(img_rgb.shape, dtype='uint8')
        crop_rgb[y1:y2, x1:x2] = img_rgb[y1:y2, x1:x2]
        # crop
        crop_rgb = crop_rgb[y1:y2, x1:x2]
        metadata['image'] = crop_rgb
        img_metadata.append(metadata)
    return img_metadata

def create_base_metadata_from_imgpath(imgpath):
    global config_file
    global m2f_weights
    panoptic_segment(imgpath, config_file, m2f_weights)
    filename = os.path.basename(imgpath)
    filestem = filename.split('.')[0]
    preds_fp = '/content/predictions/' + filestem + ".pkl"
    with open(preds_fp, "rb") as f:
        preds = pickle.load(f)['panoptic_seg']
    os.remove(preds_fp)
    img = Image.open(imgpath).convert("RGB")
    labels = preds[0].cpu().detach().numpy()
    instances = preds[1]
    metadata = get_metadata(img, labels, instances)
    return metadata

In [10]:
# metadata = create_base_metadata_from_imgpath('/content/AttributeDetection/demo.jpeg')
# print(len(metadata))

In [11]:
# Display cropped image of object
# Image.fromarray(metadata[2]['image'])

## Image Classification

In [12]:
def get_person_attributes(img):
    '''
    Return attributes if face found
    Else return {}
    '''
    # if face not found
    detectors = ['opencv', 'ssd', 'dlib', 'mtcnn', 'retinaface']
    face_found = False
    img = np.asarray(img)
    inp = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    for detector in detectors:
        try:
            img = DeepFace.detectFace(inp, detector_backend = detector) 
            print(f'{detector} failed')  
            face_found = True
            break
        except Exception as e:
            print(e)
            print(f'{detector} failed')
            pass

    if not face_found:
        return {}

    demography = DeepFace.analyze(inp, ['age', 'gender', 'race', 'emotion'], enforce_detection=False)

    if demography['age'] < 13:
        age_cat = 'child'
    elif demography['age'] < 20:
        age_cat = 'teenager'
    elif demography['age'] < 40:
        age_cat = 'adult'
    elif demography['age'] < 65:
        age_cat = 'middle_aged'
    else:
        age_cat = 'old'

    attributes = {
        'gender': demography["gender"],
        'race': demography["dominant_race"],
        'emotion': demography["dominant_emotion"],
        'age': age_cat
    }

    return attributes

In [13]:
def get_dog_attributes(img):
    '''
    @input: img (PIL Image) of dog
    @returns: { "breed": "value" }
    '''
    global new_list

    # resize to (224, 224)
    pred_img_array = cv2.resize(np.asarray(img),((224, 224)))
    # resize to (BxHxWxC)
    pred_img_array = np.expand_dims(pred_img_array, 0)
    # preprocessing for resnet
    pred_img_array = preprocess_input(pred_img_array)

    # feed the model with the image array for prediction
    pred_val = dog_model.predict(pred_img_array)
    pred_breed = dog_labels[np.argmax(pred_val)]
    return {
        'breed': pred_breed
    }


In [14]:
def get_car_attributes(img):
    transform = T.Compose([T.Resize((400, 400)),
                           T.ToTensor(),
                           T.Normalize((0.5, 0.5, 0.5), 
                                       (0.5, 0.5, 0.5))])
    
    tensor = transform(img).float().unsqueeze(0)
    
    output = car_model(tensor)
    _, predicted = torch.max(output.data, 1)
    label = car_anns[predicted.item()]

    # separate by spaces
    # 1st token -> make
    # 2nd to penultimate -> model
    # last -> year
    # fails if make consists of more than 1 token
    tokens = label.split()

    return {
        'make': tokens[0],
        'model': " ".join(tokens[1:-1]),
        'year': int(tokens[-1]),
    }

In [15]:
def add_attributes_to_metadata(metadata):
    '''
    Apply available image classification models on the metadata
    Currently, the following models are used:
        1. Person - ethnicity, age, gender, mood
        2. Dog - breed
        3. Car - make, model, year
    '''
    for instance in metadata:
        img = Image.fromarray(instance['image'])
        if instance['category'] == 'person':
            instance['metadata'] = get_person_attributes(img)
        elif instance['category'] == 'dog':
            instance['metadata'] = get_dog_attributes(img)
        elif instance['category'] == 'car':
            instance['metadata'] = get_car_attributes(img)
        else:
            instance['metadata'] = {}
        # the image is no longer needed
        del instance['image']
    return metadata

## Final Metadata Creation

Use `create_metadata_from_imgpath` to create a metadata object.
Running it the first time will be slow because some of our models will first download their dependencies.

In [16]:
def create_metadata_from_imgpath(imgpath):
    metadata = create_base_metadata_from_imgpath(imgpath)
    metadata_with_attributes = add_attributes_to_metadata(metadata)
    return metadata_with_attributes

### Extract metadata for a single image

In [17]:
path = '/content/AttributeDetection/demo.jpeg'
create_metadata_from_imgpath(path)

/content/Mask2Former/demo
/content
opencv failed
facial_expression_model_weights.h5 will be downloaded...


Downloading...
From: https://github.com/serengil/deepface_models/releases/download/v1.0/facial_expression_model_weights.h5
To: /root/.deepface/weights/facial_expression_model_weights.h5
100%|██████████| 5.98M/5.98M [00:00<00:00, 69.8MB/s]


age_model_weights.h5 will be downloaded...


Downloading...
From: https://github.com/serengil/deepface_models/releases/download/v1.0/age_model_weights.h5
To: /root/.deepface/weights/age_model_weights.h5
100%|██████████| 539M/539M [00:34<00:00, 15.8MB/s]


gender_model_weights.h5 will be downloaded...


Downloading...
From: https://github.com/serengil/deepface_models/releases/download/v1.0/gender_model_weights.h5
To: /root/.deepface/weights/gender_model_weights.h5
100%|██████████| 537M/537M [00:10<00:00, 49.8MB/s]


race_model_single_batch.h5 will be downloaded...


Downloading...
From: https://github.com/serengil/deepface_models/releases/download/v1.0/race_model_single_batch.h5
To: /root/.deepface/weights/race_model_single_batch.h5
100%|██████████| 537M/537M [00:11<00:00, 46.6MB/s]
Action: emotion: 100%|██████████| 4/4 [00:02<00:00,  1.38it/s]


[{'category': 'dog',
  'metadata': {'breed': 'leonberg'},
  'supercategory': 'animal'},
 {'category': 'car',
  'metadata': {'make': 'Jeep', 'model': 'Compass SUV', 'year': 2012},
  'supercategory': 'vehicle'},
 {'category': 'person',
  'metadata': {'age': 'adult',
   'emotion': 'neutral',
   'gender': 'Man',
   'race': 'white'},
  'supercategory': 'person'}]

## (Optional) Update PASCAL VOC 2012

This section contains the code to create metadata objects for the aforementioned dataset. This section is present because we are using VOC'12 for our project demo.

Since we will be annotating our images with their respective `milvus_id`s as well, a JSON file containing the `milvus_id` for each image name is required.

In [None]:
filepath = '/content/AttributeDetection/mid_filename.json'
with open(filepath) as f:
    imgname2mid = json.load(f)

### Load Dataset from Kaggle

In [None]:
kaggle_creds = '/content/AttributeDetection/kaggle.json'
!pip install -q kaggle
!mkdir ~/.kaggle
!cp $kaggle_creds ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle datasets download huanghanchina/pascal-voc-2012
!unzip -qq pascal-voc-2012.zip

In [None]:
IMGS_DIR = '/content/VOC2012/JPEGImages'
images = os.listdir(IMGS_DIR)

pascal_metadata = list()
start = ...
for i in trange(start, len(images)):
    img = images[i]
    img_path = os.path.join(IMGS_DIR, img)
    try:
        metadata = create_metadata_from_imgpath(img_path)
    except:
        with open("/content/drive/MyDrive/AttributeDetection/errors_dawood3.txt", "a") as f:
            f.write(str(i)+"\n")
        continue
    else:
        mid = imgname2mid[img]   
        metadata.insert(0, { "milvus_id": mid, "name": img })
        pascal_metadata.append(metadata)
        with open("/content/drive/MyDrive/AttributeDetection/metadata_dawood3.json", "w") as f:
            json.dump(pascal_metadata, f)