# Task 3 - Feature Matching using SIFT

Write a function which takes an image from the same dataset for training
and testing as in the previous task.

**Main steps:**
1. You first extract keypoints and feature descriptors from your
Test and Train images using standard SIFT or SURF feature extraction
function from a library.
2. Then you match features between images which will give you the raw noisy matches (correspondences).
3. Now you should decide which geometric transform to use to reject the outliers. (using RANSAC)
4. Finally, you will
define a score on the obtained inlier matches and will use this to detect the
objects (icons) scoring high for a given Test image. A basic score is counting
the inlier matches.

**Output:**

Detect objects in the Test images using SIFT or equivalent features (such as SURF), recognize to which class they belong, and identify
their scales and orientations. Similar as Task2, for visual demonstration the
function should open a box around each detected object and indicate its class
label. This box is scaled and rotated according to the object’s scale and orientation. Demonstrate example images(s) of the outcome detection in your report. Besides, demonstrate example images(s) that shows the feature-based
matches established between the recognised objects and a Test image, before
and after the outlier refinement step.

**Evaluation:**

Evaluate your algorithm on all Test images to report the overall Intersection over Union (IoU), False Positive (FPR), True Positive (TPR) and
Accuracy (ACC) rates, as well as the average runtime. Refer to the following report http://host.robots.ox.ac.uk/pascal/VOC/voc2012/devkit_
doc.pdf section 4.4 for further information about the evaluation metrics.
Show and explain cases where this scheme finds difficulty to perform correctly. Compare the SIFT/SURF results to that of Task2 algorithm e.g.,
does it improve the overall speed or accuracy? How much? Why?

**Hyperparameter tuning:**

Similarly, you will have some hyper-parameters to tune. This includes the
number of Octaves and the (within-octave) Scalelevels within SIFT to build
scale-spaces for keypoint detection, and the MaxRatio parameter within the
matchFeatures function to reject weak matches. How are these parameters
set for this task? Show quantitatively why.

**Notes:**

For task 2 and task 3, you are allowed to use library functions for creating the pyramid or using Gaussian convolution. You are also allowed to use the library functions for extracting features, for e.g. extracting SIFT features. You are allowed to use math libraries, for instance svd functions for computing the homography.

You are *not* allowed to use the `cv2.matchTemplate` or `cv2.BFMatcher`.
- Basically functions for matching features need to be coded. 
- You would need to implement RANSAC also yourself.

### Imports & Constants

In [1]:
%load_ext autoreload
%autoreload 2

import json
import task3
import logging
import numpy as np
import pandas as pd
from typing import Dict, Tuple

from pathlib import Path
from tqdm.auto import tqdm
from skopt import gp_minimize
from skopt.space import Integer, Real
from skopt.utils import use_named_args
from skopt.callbacks import Callable
from task3 import ImageDataset, ObjectDetector, Verbosity, TutorialObjectDetector

QUERY_IMG_DIR = Path("IconDataset", "png")
TEST_IMG_DIR = Path("Task3Dataset", "images")

ANNOTATIONS_DIR = Path("Task3Dataset", "annotations")

# Configure basic logging
logging.basicConfig(level=logging.INFO, format='[%(asctime)s]::[%(levelname)s] %(message)s')
logger = logging.getLogger(__name__)

### Run detection pipeline on dataset

In [2]:
def detect_on_dataset(
        test_imgs: ImageDataset, 
        query_imgs: ImageDataset, 
        sift_hps: Dict = {},
        ransac_hps: Dict = {},
        lowe_threshold: float = 0.7,
        min_match_count: int = 10, 
        verbose: Verbosity = Verbosity.MEDIUM
    ) -> Tuple[float, ...]:

    num_images = len(test_imgs)
    acc_lst, tpr_list, fpr_lst, fnr_lst = [], [], [], []
    detector = ObjectDetector(query_imgs, sift_hps, verbose=False, ransac_hyperparams=ransac_hps)

    # Iterate through each test image and detect objects in it. Compare these detctions to the ground truth annotations.
    for i, (img, img_path) in enumerate(test_imgs):
        annotations_path = ANNOTATIONS_DIR / img_path.with_suffix(".csv").name
        img_annotations = pd.read_csv(annotations_path)

        print(flush=True); logger.info(f"{i+1}/{num_images} - Detecting objects in {img_path.stem}")
        detections = detector.detect(img, lowe_threshold, min_match_count, draw=False)

        acc, tpr, fpr, fnr = task3.evaluate_detections(detections, img_annotations)
        acc_lst.append(acc); tpr_list.append(tpr); fpr_lst.append(fpr); fnr_lst.append(fnr)

    return np.mean(acc_lst), np.mean(tpr_list), np.mean(fpr_lst), np.mean(fnr_lst)

In [5]:
test_images = ImageDataset(TEST_IMG_DIR, file_ext="png")
query_images = ImageDataset(QUERY_IMG_DIR, file_ext="png")

params = {"sift_n_features": 0, "sift_n_octave_layers": 10, "sift_contrast_threshold": 0.005, "sift_edge_threshold": 20.0, "sift_sigma": 1.9797752688667505, "ransac_reproj_threshold": 1.0, "ransac_min_datapoints": 4, "ransac_inliers_threshold": 0, "ransac_confidence": 0.9326390829908878, "lowe_threshold": 0.5, "min_match_count": 4}

sift_hps = {
        'nfeatures': params['sift_n_features'],
        'nOctaveLayers': params['sift_n_octave_layers'],
        'contrastThreshold': params['sift_contrast_threshold'],
        'edgeThreshold': params['sift_edge_threshold'],
        'sigma': params['sift_sigma'],
    }

ransac_hps = {
    'inliers_threshold': params['ransac_inliers_threshold'],
    'min_datapoints': params['ransac_min_datapoints'],
    'reproj_threshold': params['ransac_reproj_threshold'],
    'confidence': params['ransac_confidence']
}

lowe_ratio = params['lowe_threshold']
min_match_count = params['min_match_count']


acc, tpr, fpr, fnr = detect_on_dataset(test_images, query_images, sift_hps, ransac_hps, lowe_ratio, min_match_count)
# acc, tpr, fpr, fnr = detect_on_dataset(test_images, query_images)
print(acc, tpr, fpr, fnr)




[2024-04-24 18:39:26,505]::[INFO] 1/20 - Detecting objects in test_image_16





[2024-04-24 18:39:37,320]::[INFO] 2/20 - Detecting objects in test_image_19





[2024-04-24 18:39:47,701]::[INFO] 3/20 - Detecting objects in test_image_18





[2024-04-24 18:40:00,527]::[INFO] 4/20 - Detecting objects in test_image_12





[2024-04-24 18:40:11,690]::[INFO] 5/20 - Detecting objects in test_image_3





[2024-04-24 18:40:25,916]::[INFO] 6/20 - Detecting objects in test_image_4





[2024-04-24 18:40:37,562]::[INFO] 7/20 - Detecting objects in test_image_13





[2024-04-24 18:40:47,825]::[INFO] 8/20 - Detecting objects in test_image_6





[2024-04-24 18:41:03,542]::[INFO] 9/20 - Detecting objects in test_image_17





[2024-04-24 18:41:11,809]::[INFO] 10/20 - Detecting objects in test_image_2





[2024-04-24 18:41:21,998]::[INFO] 11/20 - Detecting objects in test_image_5





[2024-04-24 18:41:34,508]::[INFO] 12/20 - Detecting objects in test_image_15





[2024-04-24 18:41:47,097]::[INFO] 13/20 - Detecting objects in test_image_9





[2024-04-24 18:41:54,935]::[INFO] 14/20 - Detecting objects in test_image_1





[2024-04-24 18:42:03,599]::[INFO] 15/20 - Detecting objects in test_image_10





[2024-04-24 18:42:19,716]::[INFO] 16/20 - Detecting objects in test_image_14





[2024-04-24 18:42:29,991]::[INFO] 17/20 - Detecting objects in test_image_11





[2024-04-24 18:42:43,557]::[INFO] 18/20 - Detecting objects in test_image_20





[2024-04-24 18:42:57,354]::[INFO] 19/20 - Detecting objects in test_image_7





[2024-04-24 18:43:14,681]::[INFO] 20/20 - Detecting objects in test_image_8


0.5900000000000001 0.9875 0.36666666666666664 0.0125


| Model                                           | acc  | tpr  | fpr  | fnr  |
|-------------------------------------------------|------|------|------|------|
| Mine w/ Manhattan Distance                      | 0.68 | 0.88 | 0.27 | 0.13 |
| Mine w/ Manhattan Distance & Tuned Hyperparams  | 0.76 | 0.89 | 0.17 | 0.11 |
| Mine w/ Euclidean Distance                      | 0.64 | 0.83 | 0.24 | 0.17 |
| Mine w/ Euclidean Distance & Tuned Hyperparams  | 0.75 | 0.86 | 0.15 | 0.14 |
| Mine w/ SSD                                     | 0.66 | 0.94 | 0.35 | 0.01 |
| Tutorial w/ my matcher                          | 0.68 | 0.76 | 0.13 | 0.24 |

### Hyperparameter Optimisation

We'll use the following objective function to measure performance over a range of hyperparameters for `SIFT`, `RANSAC`, Lowe's Test, and minimum match counts. Then, using Bayesian optimisation, we'll minimise the function, hence we use '-accuracy'.

In [3]:
init_params = [
    0, 
    3,
    0.029251667936301906,
    14.902523213631955,
    1.7958084069148594,
    1.1300029670277771,
    4,
    0,
    0.9948101962008622,
    0.6189659066006492,
    4,
]

In [4]:
class NumpyEncoder(json.JSONEncoder):
    """Special json encoder for numpy types."""
    def default(self, obj):
        if isinstance(obj, np.integer):
            return int(obj)
        elif isinstance(obj, np.floating):
            return float(obj)
        elif isinstance(obj, np.ndarray):
            return obj.tolist()
        return json.JSONEncoder.default(self, obj)
    

class LogDisplayProgressCallback(Callable):

    def __init__(self, tqdm_obj, hyperparam_names: list):
        self.tqdm_obj = tqdm_obj
        self.hyperparam_names = hyperparam_names
        self.iteration = 1

    def __call__(self, result):
        self.tqdm_obj.update(1)

        # Since we are minimising negative accuracy.
        best_score = -result.fun 
        current_params = result.x
        
        result_data = {
            'iteration': self.iteration,
            'best_score': best_score,
            'parameters': dict(zip(self.hyperparam_names, current_params))
        }

        with open('optimisation_log_ssd.json', 'a') as f:
            json.dump(result_data, f, cls=NumpyEncoder)
            f.write(',\n')

        self.iteration += 1


# Define the hyperparameter space
space = [
    Integer(0, 164, name='sift_n_features'),
    Integer(1, 10, name='sift_n_octave_layers'),
    Real(0.005, 0.2, name='sift_contrast_threshold'),
    Real(0.05, 20, name='sift_edge_threshold'),
    Real(0.1, 5.0, name='sift_sigma'),
    Real(1.0, 20.0, name='ransac_reproj_threshold'),
    Integer(4, 10, name='ransac_min_datapoints'),
    Integer(0, 8, name='ransac_inliers_threshold'),
    Real(0.9, 1, name='ransac_confidence'),
    Real(0.5, 2.0, name='lowe_threshold'),
    Integer(4, 15, name='min_match_count'),
]

@use_named_args(space)
def objective_function(**params):
    sift_hps = {
        'nfeatures': params['sift_n_features'],
        'nOctaveLayers': params['sift_n_octave_layers'],
        'contrastThreshold': params['sift_contrast_threshold'],
        'edgeThreshold': params['sift_edge_threshold'],
        'sigma': params['sift_sigma'],
    }

    ransac_hps = {
        'inliers_threshold': params['ransac_inliers_threshold'],
        'min_datapoints': params['ransac_min_datapoints'],
        'reproj_threshold': params['ransac_reproj_threshold'],
    }

    acc, _, _, _ = detect_on_dataset(test_images, query_images, sift_hps, ransac_hps,
                                     params['lowe_threshold'],
                                     params['min_match_count'])
    
    # Negative because we minimise in the optimisation procedure.
    return -acc


ITERATIONS = 250

tqdm_o = tqdm(total=ITERATIONS)
callback = LogDisplayProgressCallback(tqdm_o, [param.name for param in space])

test_images = ImageDataset(TEST_IMG_DIR, file_ext="png")
query_images = ImageDataset(QUERY_IMG_DIR, file_ext="png")

result = gp_minimize(
    objective_function,
    dimensions=space,
    n_calls=ITERATIONS,
    callback=[callback],
    x0=[init_params]
)

tqdm_o.close()

  0%|          | 0/250 [00:00<?, ?it/s]

  self_area = (self.right - self.left) * (self.bottom - self.top)
  oriented_bounding_points = np.int32(destination).T.reshape(-1, 1, 2)
  oriented_bounding_points = np.int32(destination).T.reshape(-1, 1, 2)


: 