<a href="https://colab.research.google.com/github/KumundzhievMaxim/AppliedDeepLearning/blob/main/Mask_R_CNN/Mask_R_CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Description

We are going to build a `Mask R-CNN` based keypoint detector model using `Detectron2`. Detectron2 was written in PyTorch and contains many state-of-the-art obejct detection models with pretrained weights. (Don't worry, you don't have to use any PyTorch specific function, just the methods provided by the Detectron2 package.)

## Agenda
1. To preprocess the Cats Dataset
2. Convert Cats Dataset to COCO format. 
3. Finetune a pretrained keypoint model (which was trained on COCO dataset) to predict cat keypoints. 
4. To see the results, run it on some videos.

### Notes
Notebook for guidance: https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5

Optional: if temporally extend this method. By matching the bounding boxes using `Hungarian algorithm`, you can create a simple tracking method without modifying the network. 
Here a good explanation of how it works: https://towardsdatascience.com/computer-vision-for-tracking-8220759eee85 (It's enough to only implement the Hungarian matching part without using Kalman filter.)

## Imports

In [1]:
# install dependencies: 
!pip install pyyaml==5.1 'pycocotools>=2.0.1'
!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.6/index.html

Looking in links: https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.6/index.html


In [131]:
import torch, torchvision
!gcc --version

import pandas as pd

from sklearn.model_selection import train_test_split

import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import numpy as np
import os, json, cv2, random
from google.colab.patches import cv2_imshow

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog

gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.



## Prepare dataset

- Download the Cats dataset # Just CAT_OO
  - don't resize Images
    - Keep only the left eye, right eye and mouth coordinates.
    - Generate bounding boxes by taking the min and max keypoint coordinates.  
    - Add +10 pixels to the max bounding box coordinates to include the entire head of the cat.

### Download the Cats dataset **Just CAT_OO**

In [7]:
# Download from Drive
!if ! [ -f CAT_00.zip ]; then curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=1wGwNi8t-UKAKs-LQL3dG-D8dzGVPHv2w" > /dev/null; curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=1wGwNi8t-UKAKs-LQL3dG-D8dzGVPHv2w" -o CAT_00.zip; fi

# Check if the file size is correct (~402MB)
!if (( $(stat -c%s CAT_00.zip) < 421896648 )); then rm -rfd CAT_00.zip; fi

# If not, download it from NIPG12
!wget -nc -O CAT_00.zip http://nipg1.inf.elte.hu:8000/CAT_00.zip

!unzip -o CAT_00.zip && rm CAT_00.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100   408    0   408    0     0   5589      0 --:--:-- --:--:-- --:--:--  5589
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  402M    0  402M    0     0   100M      0 --:--:--  0:00:04 --:--:--  107M
File ‘CAT_00.zip’ already there; not retrieving.
Archive:  CAT_00.zip
 extracting: CAT_00/00000001_000.jpg  
  inflating: CAT_00/00000001_000.jpg.cat  
 extracting: CAT_00/00000001_005.jpg  
  inflating: CAT_00/00000001_005.jpg.cat  
 extracting: CAT_00/00000001_008.jpg  
  inflating: CAT_00/00000001_008.jpg.cat  
 extracting: CAT_00/00000001_011.jpg  
  inflating: CAT_00/00000001_011.jpg.cat  
 extracting: CAT_00/00000001_012.jpg  
  inflat

### Prepare dataset

In object detection, we usually use a bounding box to describe the target location. The bounding box is a rectangular box that can be determined by the  𝑥  and  𝑦  axis coordinates in the upper-left corner and the  𝑥  and  𝑦  axis coordinates in the lower-right corner of the rectangle. 

In [103]:
TARGET_PATH = './CAT_00'

def prepare_dataset():
  """
  Notes:
    bbox formed as: ((min_x, min_y), (max_x, max_y)) <- min and max keypoint coordinates. 
  """

  def process_images():
    def grouped(iterable, n):
      "s -> (s0,s1,s2,...sn-1), (sn,sn+1,sn+2,...s2n-1), (s2n,s2n+1,s2n+2,...s3n-1), ..."
      return zip(*[iter(iterable)]*n)

    rows = []
    files_list = sorted(os.listdir(TARGET_PATH))
    for index, file_name in enumerate(files_list):
      if '.cat' not in file_name:
        continue
      
      # processing corresponding annotation file 
      annotation_path = os.path.join(TARGET_PATH, file_name)
      image_path = annotation_path.replace('.cat', '')
      image_content = cv2.imread(image_path)
      
      with open(f'{annotation_path}', 'r') as f:
        landmarks = f.readlines()[0].replace('\n', ' ').split()
        landmarks = [int(s) for s in landmarks][:6] # [9, 175, 160, 239, 162, 199]
    
      pairs = []
      for x, y in grouped(landmarks, 2):
        pairs.append((x, y))

      min_ = min(pairs)
      max_ = max(pairs)
      max_ = [pixel+10 for pixel in max_]

      bounding_box = [min_, max_]  
    
      buffer_row = {
          'image_path': image_path,
          'annotation_path': annotation_path,
          'bbox': bounding_box,
          'keypoints': landmarks,
          'height': image_content.shape[0],
          'width': image_content.shape[1],
          'image_content': image_content
      }
      rows.append(buffer_row)
    return rows

  rows = process_images()
  df = pd.DataFrame(rows)

  return df

In [104]:
df = prepare_dataset()
df

Unnamed: 0,image_path,annotation_path,bbox,keypoints,height,width,image_content
0,./CAT_00/00000001_000.jpg,./CAT_00/00000001_000.jpg.cat,"[(9, 175), [172, 209]]","[9, 175, 160, 239, 162, 199]",500,375,"[[[87, 85, 115], [76, 87, 115], [90, 74, 121],..."
1,./CAT_00/00000001_005.jpg,./CAT_00/00000001_005.jpg.cat,"[(9, 96), [137, 113]]","[9, 96, 96, 153, 127, 103]",375,500,"[[[147, 152, 153], [143, 151, 150], [140, 151,..."
2,./CAT_00/00000001_008.jpg,./CAT_00/00000001_008.jpg.cat,"[(9, 318), [234, 337]]","[9, 318, 222, 340, 224, 327]",375,500,"[[[86, 80, 99], [67, 76, 89], [59, 75, 87], [7..."
3,./CAT_00/00000001_011.jpg,./CAT_00/00000001_011.jpg.cat,"[(9, 167), [195, 201]]","[9, 167, 173, 242, 185, 191]",375,500,"[[[165, 150, 178], [181, 167, 191], [165, 154,..."
4,./CAT_00/00000001_012.jpg,./CAT_00/00000001_012.jpg.cat,"[(9, 115), [132, 189]]","[9, 115, 122, 179, 121, 133]",333,500,"[[[103, 78, 58], [103, 78, 58], [103, 78, 58],..."
...,...,...,...,...,...,...,...
1701,./CAT_00/00000455_015.jpg,./CAT_00/00000455_015.jpg.cat,"[(9, 155), [259, 198]]","[9, 155, 245, 221, 249, 188]",500,355,"[[[233, 225, 196], [253, 245, 215], [250, 245,..."
1702,./CAT_00/00000455_016.jpg,./CAT_00/00000455_016.jpg.cat,"[(9, 113), [148, 174]]","[9, 113, 138, 164, 132, 154]",500,302,"[[[113, 104, 101], [118, 108, 108], [121, 114,..."
1703,./CAT_00/00000455_017.jpg,./CAT_00/00000455_017.jpg.cat,"[(9, 187), [103, 258]]","[9, 187, 93, 248, 89, 219]",500,376,"[[[15, 16, 14], [12, 13, 11], [12, 12, 12], [1..."
1704,./CAT_00/00000455_026.jpg,./CAT_00/00000455_026.jpg.cat,"[(9, 435), [262, 449]]","[9, 435, 203, 520, 252, 439]",768,1024,"[[[190, 152, 152], [233, 202, 201], [248, 224,..."


### Split dataset
Split the dataset into train-test sets (ratio: 80-20), without shuffling, and print the size of each set.

In [106]:
objects = df.shape[0]
train_index = objects - (objects*20) // 100

X_train = df[:train_index]
X_test = df[train_index:]

print(X_train.shape)
print(X_test.shape)

(1365, 7)
(341, 7)


### Convert the datasets to COCO format

The desired format per each image should be formed as:
```
[
{ file_name, height, width, image_id, annotations: [{bbox, bbox_mode,category_id, iscrowd, keypoints}] }
...
]

```

Notes: 
- category_id == 0
- iscrowd == 0

In [128]:
def cast_to_coco_format(df: pd.DataFrame):
  """
  Notes:
    target format: 
    [ 
      { file_name, height, width, image_id, annotations: [{bbox, bbox_mode, category_id, iscrowd, keypoints}] } 
    ]
  """

  rows = []

  for row_index, row in X_train.iterrows():
    buff_dict = {
        'filename': row['image_path'],
        'height': row['height'],
        'width': row['width'],
        'image_id': row_index,
        'annotations': [{
            'bbox': row['bbox'],
            'bbox_mode':  0, # XYXY_ABS = 0 -> (x0, y0, x1, y1) 
            'category_id': 0,
            'iscrowd': 0,
            'keypoints': row['keypoints']
        }]
    }
    rows.append(buff_dict)
  return rows

In [138]:
X_train_coco = cast_to_coco_format(X_train)
X_test_coco = cast_to_coco_format(X_test)

with open('train_dataset.json', 'w') as outfile:
    json.dump(X_train_coco, outfile)

with open('test_dataset.json', 'w') as outfile:
    json.dump(X_test_coco, outfile)    

In [140]:
from detectron2.data.datasets import register_coco_instances

# register_coco_instances("train_dataset", {}, "train_dataset.json", TARGET_PATH)
# register_coco_instances("test_dataset", {}, "test_dataset.json", TARGET_PATH)

MetadataCatalog.get("train_dataset")

Metadata(evaluator_type='coco', image_root='./CAT_00', json_file='train_dataset.json', name='train_dataset')