# Deep descriptors baseline

Here we will evaluate all descriptors, which are available easily enough, e.g. from kornia or authors github implementation. There will be final comparison table in the end of this notebook.

### Disclaimer 1: don't trust this table fully


I haven't (yet!) checked if all the deep descriptors models, trained on Brown, were trained with flip-rotation 90 degrees augmentation. In the code below I assume that they were, however, it might not be true -- and the comparison might not be completely fair. I will do my best to check it, but if you know that I have used wrong weights - please open an issue. Thank you. 


### Disclaimer 2: it is not "benchmark".


The intended usage of the package is not to test and report the numbers in the paper. Instead think about is as cross-validation tool, helping the development. Thus, one CAN tune hyperparameters based on the benchmark results  instead of doing so on [HPatches](https://github.com/hpatches/hpatches-benchmark). After you have finished tuning, please, evaluate your local descriptors on some downstream task like [IMC image matching benchmark](https://github.com/vcg-uvic/image-matching-benchmark) or [visual localization](https://www.visuallocalization.net/).


**If you found any mistake, please open an issue**

# RootSIFT reference

Let's first add RootSIFT, which is not deep learned, but is the gold-standard.

In [7]:
from IPython.display import clear_output
import torch
import kornia
from brown_phototour_revisited.benchmarking import *

descs_out_dir = 'data/descriptors'
download_dataset_to = 'data/dataset'
results_dir = 'data/mAP'
patch_size = 32

full_results_dict = {}

for patch_size in [32]:
    desc_name = 'Kornia RootSIFT'
    model = kornia.feature.SIFTDescriptor(patch_size, rootsift=True).eval()
    desc_dict = full_evaluation(model,
                                desc_name,
                                path_to_save_dataset = download_dataset_to,
                                path_to_save_descriptors = descs_out_dir,
                                path_to_save_mAP = results_dir,
                                patch_size = patch_size, 
                                device = torch.device('cuda:0'), 
                           distance='euclidean',
                           backend='pytorch-cuda')
    full_results_dict[f'{desc_name} {patch_size}px'] = desc_dict

clear_output()
print_results_table(full_results_dict)

------------------------------------------------------------------------------
Mean Average Precision wrt Lowe SNN ratio criterion on UBC Phototour Revisited
------------------------------------------------------------------------------
trained on       liberty notredame  liberty yosemite  notredame yosemite
tested  on           yosemite           notredame            liberty
------------------------------------------------------------------------------
Kornia RootSIFT 32px   58.24              49.07               49.65 
------------------------------------------------------------------------------


# HardNet and SOSNet, [kornia](https://kornia.readthedocs.io/en/latest/feature.html#kornia.feature) implementation

Kornia provides only liberty-trained (best) weights, so we have to download the rest.


In [None]:
!wget https://github.com/DagnyT/hardnet/raw/master/pretrained/train_yosemite_with_aug/checkpoint_yosemite_with_aug.pth
!wget https://github.com/DagnyT/hardnet/raw/master/pretrained/train_notredame_with_aug/checkpoint_notredame_with_aug.pth

In [8]:
desc_name = 'HardNet'
hardnet_lib = kornia.feature.HardNet(True).eval()

hardnet_notre = kornia.feature.HardNet(False)
hardnet_notre.load_state_dict(torch.load('checkpoint_notredame_with_aug.pth')['state_dict'])
                            
hardnet_yos = kornia.feature.HardNet(False)
hardnet_yos.load_state_dict(torch.load('checkpoint_yosemite_with_aug.pth')['state_dict'])

models = {'liberty': hardnet_lib,
         'notredame': hardnet_notre,
         'yosemite': hardnet_yos}

desc_dict = full_evaluation(models,
                            desc_name,
                            path_to_save_dataset = download_dataset_to,
                            path_to_save_descriptors = descs_out_dir,
                            path_to_save_mAP = results_dir,
                            patch_size = patch_size, 
                            device = torch.device('cuda:0'), 
                       distance='euclidean',
                       backend='pytorch-cuda')

full_results_dict[f'{desc_name} {patch_size}px'] = desc_dict
clear_output()

print_results_table(full_results_dict)

------------------------------------------------------------------------------
Mean Average Precision wrt Lowe SNN ratio criterion on UBC Phototour Revisited
------------------------------------------------------------------------------
trained on       liberty notredame  liberty yosemite  notredame yosemite
tested  on           yosemite           notredame            liberty
------------------------------------------------------------------------------
Kornia RootSIFT 32px   58.24              49.07               49.65 
HardNet 32px       70.64  70.31        61.93  59.56        63.06  61.64
------------------------------------------------------------------------------


In [None]:
!wget https://github.com/yuruntian/SOSNet/raw/master/sosnet-weights/sosnet_32x32_notredame.pth
!wget https://github.com/yuruntian/SOSNet/raw/master/sosnet-weights/sosnet_32x32_yosemite.pth
clear_output()

In [9]:
desc_name = 'SOSNet'
sosnet_lib = kornia.feature.SOSNet(True).eval()

sosnet_notre = kornia.feature.SOSNet(False)
sosnet_notre.load_state_dict(torch.load('sosnet_32x32_notredame.pth'))
                            
sosnet_yos = kornia.feature.SOSNet(False)
sosnet_yos.load_state_dict(torch.load('sosnet_32x32_yosemite.pth'))

models = {'liberty': sosnet_lib,
         'notredame': sosnet_notre,
         'yosemite': sosnet_yos}


desc_dict = full_evaluation(models,
                            desc_name,
                            path_to_save_dataset = download_dataset_to,
                            path_to_save_descriptors = descs_out_dir,
                            path_to_save_mAP = results_dir,
                            patch_size = patch_size, 
                            device = torch.device('cuda:0'), 
                       distance='euclidean',
                       backend='pytorch-cuda')
full_results_dict[f'{desc_name} {patch_size}px'] = desc_dict
clear_output()
print_results_table(full_results_dict)

------------------------------------------------------------------------------
Mean Average Precision wrt Lowe SNN ratio criterion on UBC Phototour Revisited
------------------------------------------------------------------------------
trained on       liberty notredame  liberty yosemite  notredame yosemite
tested  on           yosemite           notredame            liberty
------------------------------------------------------------------------------
Kornia RootSIFT 32px   58.24              49.07               49.65 
HardNet 32px       70.64  70.31        61.93  59.56        63.06  61.64
SOSNet 32px        70.03  70.19        62.09  59.68        63.16  61.65
------------------------------------------------------------------------------


# [TFeat](https://github.com/vbalnt/tfeat)

It is really light-weight and strong descriptor. We will copy-paste author implementation.

In [10]:
# https://github.com/vbalnt/tfeat/blob/master/tfeat_model.py
import torch
from torch import nn
import torch.nn.functional as F

class TNet(nn.Module):
    """TFeat model definition
    """
    def __init__(self):
        super(TNet, self).__init__()
        self.features = nn.Sequential(
            nn.InstanceNorm2d(1, affine=False),
            nn.Conv2d(1, 32, kernel_size=7),
            nn.Tanh(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(32, 64, kernel_size=6),
            nn.Tanh()
        )
        self.descr = nn.Sequential(
            nn.Linear(64 * 8 * 8, 128),
            nn.Tanh()
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.descr(x)
        return x

In [None]:
!wget https://github.com/vbalnt/tfeat/raw/master/pretrained-models/tfeat-liberty.params
!wget https://github.com/vbalnt/tfeat/raw/master/pretrained-models/tfeat-yosemite.params
!wget https://github.com/vbalnt/tfeat/raw/master/pretrained-models/tfeat-notredame.params

In [11]:
desc_name = 'TFeat'
tfeat_lib = TNet()
tfeat_lib.load_state_dict(torch.load('tfeat-liberty.params'))

tfeat_notre = TNet()
tfeat_notre.load_state_dict(torch.load('tfeat-notredame.params'))
                            
tfeat_yos = TNet()
tfeat_yos.load_state_dict(torch.load('tfeat-yosemite.params'))


models = {'liberty': tfeat_lib,
          'yosemite': tfeat_yos,
          'notredame': tfeat_notre}
desc_dict = full_evaluation(models,
                            desc_name,
                            path_to_save_dataset = download_dataset_to,
                            path_to_save_descriptors = descs_out_dir,
                            path_to_save_mAP = results_dir,
                            patch_size = patch_size, 
                            device = torch.device('cuda:0'), 
                       distance='euclidean',
                       backend='pytorch-cuda')
full_results_dict[f'{desc_name} {patch_size}px'] = desc_dict
clear_output()
print_results_table(full_results_dict)

------------------------------------------------------------------------------
Mean Average Precision wrt Lowe SNN ratio criterion on UBC Phototour Revisited
------------------------------------------------------------------------------
trained on       liberty notredame  liberty yosemite  notredame yosemite
tested  on           yosemite           notredame            liberty
------------------------------------------------------------------------------
Kornia RootSIFT 32px   58.24              49.07               49.65 
HardNet 32px       70.64  70.31        61.93  59.56        63.06  61.64
SOSNet 32px        70.03  70.19        62.09  59.68        63.16  61.65
TFeat 32px         65.45  65.77        54.99  54.69        56.55  56.24
------------------------------------------------------------------------------


# Dynamic Soft Margin HardNet

In [None]:
!wget https://github.com/lg-zhang/dynamic-soft-margin-pytorch/raw/master/pretrained/liberty_float/model.state_dict -O dsm_lib.pth
!wget https://github.com/lg-zhang/dynamic-soft-margin-pytorch/raw/master/pretrained/notredame_float/model.state_dict -O dsm_notre.pth
!wget https://github.com/lg-zhang/dynamic-soft-margin-pytorch/raw/master/pretrained/yosemite_float/model.state_dict -O dsm_yos.pth

In [12]:
desc_name = 'SoftMargin'
tfeat_lib = kornia.feature.HardNet(False).eval()
tfeat_lib.load_state_dict(torch.load('dsm_lib.pth'))

tfeat_notre = kornia.feature.HardNet(False).eval()
tfeat_notre.load_state_dict(torch.load('dsm_notre.pth'))
                            
tfeat_yos = kornia.feature.HardNet(False).eval()
tfeat_yos.load_state_dict(torch.load('dsm_yos.pth'))


models = {'liberty': tfeat_lib,
          'yosemite': tfeat_yos,
          'notredame': tfeat_notre}
desc_dict = full_evaluation(models,
                            desc_name,
                            path_to_save_dataset = download_dataset_to,
                            path_to_save_descriptors = descs_out_dir,
                            path_to_save_mAP = results_dir,
                            patch_size = patch_size, 
                            device = torch.device('cuda:0'), 
                       distance='euclidean',
                       backend='pytorch-cuda')
full_results_dict[f'{desc_name} {patch_size}px'] = desc_dict
clear_output()
print_results_table(full_results_dict)

------------------------------------------------------------------------------
Mean Average Precision wrt Lowe SNN ratio criterion on UBC Phototour Revisited
------------------------------------------------------------------------------
trained on       liberty notredame  liberty yosemite  notredame yosemite
tested  on           yosemite           notredame            liberty
------------------------------------------------------------------------------
Kornia RootSIFT 32px   58.24              49.07               49.65 
HardNet 32px       70.64  70.31        61.93  59.56        63.06  61.64
SOSNet 32px        70.03  70.19        62.09  59.68        63.16  61.65
TFeat 32px         65.45  65.77        54.99  54.69        56.55  56.24
SoftMargin 32px    69.29  69.20        61.82  58.61        62.37  60.63
------------------------------------------------------------------------------


# HardNetPS

HardNetPS is the HardNet version, trained on the [PS dataset](https://github.com/rmitra/PS-Dataset), which does very well on HPatches, but badly on IMC benchmark.

In [None]:
!wget https://github.com/DagnyT/hardnet/raw/master/pretrained/3rd_party/HardNetPS/HardNetPS.pth

In [13]:
class HardNetPS(nn.Module):
    def __init__(self):
        super(HardNetPS, self).__init__()
        self.features = nn.Sequential(
        nn.Conv2d(1, 32, kernel_size=3, padding=1, bias = True),
        nn.BatchNorm2d(32, affine=True),
        nn.ReLU(),
        nn.Conv2d(32, 32, kernel_size=3, padding=1, bias = True),
        nn.BatchNorm2d(32, affine=True),
        nn.ReLU(),
        nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1, bias = True),
        nn.BatchNorm2d(64, affine=True),
        nn.ReLU(),
        nn.Conv2d(64, 64, kernel_size=3, padding=1, bias = True),
        nn.BatchNorm2d(64, affine=True),
        nn.ReLU(),
        nn.Conv2d(64, 128, kernel_size=3, stride=2,padding=1, bias = True),
        nn.BatchNorm2d(128, affine=True),
        nn.ReLU(),
        nn.Conv2d(128, 128, kernel_size=3, padding=1, bias = True),
        nn.BatchNorm2d(128, affine=True),
        nn.ReLU(),
        nn.Conv2d(128, 128, kernel_size=8, bias = True)
    )
    def input_norm(self,x):
        flat = x.view(x.size(0), -1)
        mp = torch.mean(flat, dim=1)
        sp = torch.std(flat, dim=1) + 1e-7
        return (x - mp.unsqueeze(-1).unsqueeze(-1).unsqueeze(-1).expand_as(x)) / sp.unsqueeze(-1).unsqueeze(-1).unsqueeze(1).expand_as(x)

    def forward(self, input):
        x_features = self.features(self.input_norm(input))
        x = x_features.view(x_features.size(0), -1)
        return F.normalize(x, p=2, dim=1)


desc_name = 'HardNetPS'
model = HardNetPS().eval()
model.load_state_dict(torch.load('HardNetPS.pth'))
desc_dict = full_evaluation(model,
                            desc_name,
                            path_to_save_dataset = download_dataset_to,
                            path_to_save_descriptors = descs_out_dir,
                            path_to_save_mAP = results_dir,
                            patch_size = patch_size, 
                            device = torch.device('cuda:0'), 
                       distance='euclidean',
                       backend='pytorch-cuda')
full_results_dict[f'{desc_name} {patch_size}px'] = desc_dict
clear_output()
print_results_table(full_results_dict)

------------------------------------------------------------------------------
Mean Average Precision wrt Lowe SNN ratio criterion on UBC Phototour Revisited
------------------------------------------------------------------------------
trained on       liberty notredame  liberty yosemite  notredame yosemite
tested  on           yosemite           notredame            liberty
------------------------------------------------------------------------------
Kornia RootSIFT 32px   58.24              49.07               49.65 
HardNet 32px       70.64  70.31        61.93  59.56        63.06  61.64
SOSNet 32px        70.03  70.19        62.09  59.68        63.16  61.65
TFeat 32px         65.45  65.77        54.99  54.69        56.55  56.24
SoftMargin 32px    69.29  69.20        61.82  58.61        62.37  60.63
HardNetPS 32px         55.56              49.70               49.12 
------------------------------------------------------------------------------


# R2D2

That is NOT fair benchmark for R2D2 for 2 reasons. First, it is dense descriptor, which outputs 32x32 descriptor field for 32x32 patch. We take central pixel, which is kind of reasonable, but not the thing, R2D2 is trained for.
Second, it expects RGB patches and Brown dataset is in grayscale. 
Anyway, let's try!


In [None]:
!wget https://raw.githubusercontent.com/naver/r2d2/master/nets/patchnet.py
!wget https://github.com/naver/r2d2/raw/master/models/r2d2_WASF_N8_big.pt
!wget https://github.com/naver/r2d2/raw/master/models/r2d2_WASF_N16.pt

In [14]:
from patchnet import *

R2D2 = Quad_L2Net_ConfCFS(mchan=6)
weights = torch.load('r2d2_WASF_N8_big.pt')
print (weights['net'])
weights2 = {}
for k, v in weights['state_dict'].items():
    weights2[k.replace('module.','')] = v
R2D2.load_state_dict(weights2)
R2D2.eval()

Quad_L2Net_ConfCFS(mchan=6)


Quad_L2Net_ConfCFS(
  (ops): ModuleList(
    (0): Conv2d(3, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
    (5): ReLU(inplace=True)
    (6): Conv2d(48, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
    (8): ReLU(inplace=True)
    (9): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
    (10): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
    (11): ReLU(inplace=True)
    (12): Conv2d(96, 192, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
    (13): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=False, track_run

In [18]:
class R2D2_Center(torch.nn.Module):
    def __init__(self, r2d2):
        super(R2D2_Center, self).__init__()
        self.r2d2 = r2d2
        return
    def forward(self,x):
        orig_out = self.r2d2([x.repeat(1,3,1,1)])
        return orig_out['descriptors'][0][...,15,15]
eval_r2d2 = R2D2_Center(R2D2)

In [19]:
desc_name = 'R2D2_center_grayscale'
desc_dict = full_evaluation(eval_r2d2,
                            desc_name,
                            path_to_save_dataset = download_dataset_to,
                            path_to_save_descriptors = descs_out_dir,
                            path_to_save_mAP = results_dir,
                            patch_size = patch_size, 
                            device = torch.device('cuda:0'), 
                       distance='euclidean',
                       backend='pytorch-cuda')
full_results_dict[f'{desc_name} {patch_size}px'] = desc_dict
clear_output()
print_results_table(full_results_dict)

------------------------------------------------------------------------------
Mean Average Precision wrt Lowe SNN ratio criterion on UBC Phototour Revisited
------------------------------------------------------------------------------
trained on       liberty notredame  liberty yosemite  notredame yosemite
tested  on           yosemite           notredame            liberty
------------------------------------------------------------------------------
Kornia RootSIFT 32px   58.24              49.07               49.65 
HardNet 32px       70.64  70.31        61.93  59.56        63.06  61.64
SOSNet 32px        70.03  70.19        62.09  59.68        63.16  61.65
TFeat 32px         65.45  65.77        54.99  54.69        56.55  56.24
SoftMargin 32px    69.29  69.20        61.82  58.61        62.37  60.63
HardNetPS 32px         55.56              49.70               49.12 
R2D2_center_grayscal   61.47              53.18               54.98 
-----------------------------------------------

Another way to get patch descriptors, is to take the mean over central descriptors. Let's try 2x2 window, as 32x32 doesn't have a single "center"

In [20]:
class R2D2_MeanCenter(torch.nn.Module):
    def __init__(self, r2d2):
        super(R2D2_MeanCenter, self).__init__()
        self.r2d2 = r2d2
        return
    def forward(self,x):
        orig_out = self.r2d2([x.repeat(1,3,1,1)])
        return F.normalize(orig_out['descriptors'][0][...,15:17, 15:17].mean(dim=-1).mean(dim=-1), p=2, dim=1)

eval_r2d2 = R2D2_MeanCenter(R2D2)
desc_name = 'R2D2_MeanCenter_gray'
desc_dict = full_evaluation(eval_r2d2,
                            desc_name,
                            path_to_save_dataset = download_dataset_to,
                            path_to_save_descriptors = descs_out_dir,
                            path_to_save_mAP = results_dir,
                            patch_size = patch_size, 
                            device = torch.device('cuda:0'), 
                       distance='euclidean',
                       backend='pytorch-cuda')
full_results_dict[f'{desc_name} {patch_size}px'] = desc_dict
clear_output()
print_results_table(full_results_dict)

------------------------------------------------------------------------------
Mean Average Precision wrt Lowe SNN ratio criterion on UBC Phototour Revisited
------------------------------------------------------------------------------
trained on       liberty notredame  liberty yosemite  notredame yosemite
tested  on           yosemite           notredame            liberty
------------------------------------------------------------------------------
Kornia RootSIFT 32px   58.24              49.07               49.65 
HardNet 32px       70.64  70.31        61.93  59.56        63.06  61.64
SOSNet 32px        70.03  70.19        62.09  59.68        63.16  61.65
TFeat 32px         65.45  65.77        54.99  54.69        56.55  56.24
SoftMargin 32px    69.29  69.20        61.82  58.61        62.37  60.63
HardNetPS 32px         55.56              49.70               49.12 
R2D2_center_grayscal   61.47              53.18               54.98 
R2D2_MeanCenter_gray   62.73              54.10

A bit better, but not enough. OK, that was an unfair comparison anyway and R2D2 performed quite decently.

If you use the benchmark, please cite it:

    @misc{BrownRevisited2020,
      title={UBC PhotoTour Revisied},
      author={Mishkin, Dmytro},
      year={2020},
      url = {https://github.com/ducha-aiki/brown_phototour_revisited}
    }