# Linear SVR for approximation of the fundamental matrices of image pairs
### to learn the ropes of the evaluation metric and establish our first (definitely bad) baseline model

## The general Task:

- The datasets are large and a baseline model which is not a neural network is definitely going to have a hard time here. We came up with the idea of applying an SVM to solve for the fundamental matrices of the image pairs.
- Due to the sheer size of the data (over 1.4 million pairings), the fact that for each pairing, a set of 2 images per pair (already resized by us to 150*150 px, resulting in 150x150 _px_ x3 _colors_ = 67500 features) and the fact that SVMs tend to become slower the larger the datasets become, we decided to only work with a small subsample of the data

In [1]:
import pandas as pd
import os
import numpy as np
from sklearn.model_selection import train_test_split
import model
import validation

the list "scenes" tells the code, which scenes to load (see *list of all scenes* below)

In [2]:
input_dir = '../../data/train/' # directory of the training data

# list of all scenes: ["british_museum", "brandenburg_gate", "buckingham_palace",
#  "colosseum_exterior", "grand_place_brussels", "lincoln_memorial_statue",
#  "notre_dame_front_facade", "pantheon_exterior", "piazza_san_marco",
#  "sacre_coeur", "sagrada_familia", "st_pauls_cathedral", "st_peters_square",
#  "taj_mahal", "temple_nara_japan", "trevi_fountain"]

scenes=["brandenburg_gate"]



First, the corresponding dataframes, containing either image pairs or processed 150x150 images with their image IDs are loaded:

In [3]:
df, images = model.load_scenes(scenes, input_dir)

loading category 1 of 1: brandenburg_gate
loaded brandenburg_gate successfully


Train-test-Split.

In [4]:
x= df[["image1_id", "image2_id","pair", "building"]]
y= df[["fm1","fm2","fm3","fm4","fm5","fm6","fm7","fm8","fm9"]]

x_train, x_test, y_train, y_test = train_test_split(x,y, train_size= 0.2, shuffle=True, random_state=0)

The training dataset is _only_ n = 500 pairings, although even a scene with only 300 images contains pairing data for (300^2)/2= *61250* image pairs. each of which containing *67500* features. 

Without dimensionality reduction, this otherwise takes a LOT of time.

In [5]:
n = 250
x_train_short = x_train[0:min(n,x_train.shape[0])]
y_train_short = y_train[0:min(n,y_train.shape[0])]
x_test_short = x_test[0:min(n*5,x_test.shape[0])]
y_test_short = y_test[0:min(n*5,y_test.shape[0])]

The Prediction is performed with 9 separately initialized LinearSVR models. these don't communicate with each other, as opposed to models capable of classifying/regressing for multiples targets.

In [6]:
y_pred = model.fit_pred_9xLinSVR(x_train_short,x_test_short,y_train_short, images)

fitting dataset of 250 training entries
then predicting 1250 test entries
------------------------------
inflating 1500 entries with images
training data:
starting inflating 🐡
done inflating 💥
test data:
starting inflating 🐡
done inflating 💥
------------------------------
initialising models... 🤖
------------------------------
fitting model1 for fm1 📐




predicting 🔮
------------------------------
fitting model2 for fm2 📐




predicting 🔮
------------------------------
fitting model3 for fm3 📐
predicting 🔮
------------------------------
fitting model4 for fm4 📐




predicting 🔮
------------------------------
fitting model5 for fm5 📐




predicting 🔮
------------------------------
fitting model6 for fm6 📐
predicting 🔮
------------------------------
fitting model7 for fm7 📐
predicting 🔮
------------------------------
fitting model8 for fm8 📐
predicting 🔮
------------------------------
fitting model9 for fm9 📐
predicting 🔮
------------------------------


for the imported evaluation metrics to properly work, some adjustments to the formatting need to be done:

In [7]:

fund_matrix_list_all = [list((y_pred.iloc[1])[0:9]) for i in range(y_pred.shape[0])]
fund_matrix_list_all = [" ".join([str(num) for num in fundmatrix]) for fundmatrix in fund_matrix_list_all]
sample_id_list_all = [";".join(["phototourism",scene,pair]) for scene, pair in zip(x_test_short.iloc[:,3],x_test_short.iloc[:,2])]

validation metrics are used, as demonstrated in https://www.kaggle.com/code/tmyok1984/imc2022-validation-code/notebook

In [8]:
maa = validation.evaluate(input_dir, sample_id_list_all, fund_matrix_list_all)
print(f'mAA={maa:.05f} (n={len(sample_id_list_all)})')

mAA=0.00536 (n=1250)


as is apparent, the accuracy is abysmal.
The main reasons for this are probably:
- the small sample size, as explained above. Only using a small fraction of the actual data does not deliver an appropriate model, probably introducing bias.
- the model is too simple: although the predictor consists of 9 different SVMs, each fitting for every single pixel color channel, this is far from sophisticated enough to grasp the relative movement of objects within the images. This introduces even more bias.
- The data format here is extremely limiting to the applicability and clearly not appropriate for the dimensionality of the images. Actually using all provided data might take several days to compute, with a very limited chance of success. A proper dimensionality reduction is necessary for handling the image datasets.
- The data need sanitising first. I.e. images with a covisibility below 0.1 are not recommended for training, as stated by the competition hosts. Such image pairs are having insufficient overlap (or none at all). this weakens the model's predictive power even further. Additionally, small thumbnail images provide too little feature information to classify them properly.
