# SVM Classification of Map-Cam Pairs
In this notebook, we train an SVM to check whether one satellite image contains the other. We use the extracted features of the images using the pre-trained [MobileNet](https://keras.io/applications/#mobilenet) implementation.

### Load the features

In [43]:
import pickle
import glob
import csv
import numpy as np
from config import CSV_PATH, FEATURE_PATH

with open(FEATURE_PATH,'br') as f:
    feature_dict = pickle.load(f)
img_idx, preds = feature_dict['img_idx'], feature_dict['preds']

y = []
X = []

csv_paths = glob.glob(CSV_PATH+'*.csv')
for path in csv_paths:
    with open(path,'r') as f:
        reader = csv.reader(f)
        for row in reader:
            if len(row) == 4:
                y.append(int(row[3]))
                x = np.concatenate((preds[img_idx[row[1]]],preds[img_idx[row[2]]]))
                X.append(x)
y = np.array(y)
X = np.array(X)

print('Distribution of 0s and 1s :')
print('0\t',1-np.sum(y)/y.shape[0])
print('1\t',np.sum(y)/y.shape[0])

Distribution of 0s and 1s :
0	 0.5
1	 0.5


### Split the dataset into training and test sets

In [3]:
from sklearn.model_selection import train_test_split

# Use one fifth of the dataset as the test set 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123457)

### Train an SVM using Cross-Validated Grid-Search

In [16]:
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.svm import SVC

# Set the parameters by cross-validation
tuned_parameters = [{'kernel': ['rbf'], 'gamma': [2e-3] }]

scores = ['precision', 'recall']

for score in scores:
    print("# Tuning hyper-parameters for %s" % score)
    print()

    clf = GridSearchCV(SVC(), tuned_parameters, cv=5, scoring='%s_macro' % score, n_jobs=-1)
    clf.fit(X_train, y_train)

    print("Best parameters set found on development set:")
    print()
    print(clf.best_params_)
    print()
    print("Grid scores on development set:")
    print()
    means = clf.cv_results_['mean_test_score']
    stds = clf.cv_results_['std_test_score']
    for mean, std, params in zip(means, stds, clf.cv_results_['params']):
        print("%0.3f (+/-%0.03f) for %r"
              % (mean, std * 2, params))
    print()

    print("Detailed classification report:")
    print()
    print("The model is trained on the full development set.")
    print("The scores are computed on the full evaluation set.")
    print()
    y_true, y_pred = y_test, clf.predict(X_test)
    print(classification_report(y_true, y_pred))
    print()


# Tuning hyper-parameters for precision

Best parameters set found on development set:

{'gamma': 0.002, 'kernel': 'rbf'}

Grid scores on development set:

0.589 (+/-0.053) for {'gamma': 0.0001, 'kernel': 'rbf'}
0.873 (+/-0.031) for {'gamma': 0.001, 'kernel': 'rbf'}
0.882 (+/-0.029) for {'gamma': 0.002, 'kernel': 'rbf'}
0.881 (+/-0.034) for {'gamma': 0.003, 'kernel': 'rbf'}
0.783 (+/-0.034) for {'gamma': 0.01, 'kernel': 'rbf'}

Detailed classification report:

The model is trained on the full development set.
The scores are computed on the full evaluation set.

              precision    recall  f1-score   support

           0       0.97      0.81      0.88       297
           1       0.84      0.97      0.90       303

   micro avg       0.89      0.89      0.89       600
   macro avg       0.90      0.89      0.89       600
weighted avg       0.90      0.89      0.89       600


# Tuning hyper-parameters for recall

Best parameters set found on development set:

{'gamma': 0.003, 'k

# Modelling Human Recognition
In this case, we train an SVM to capture human capabilities for checking if an image contains another image or not.

In [44]:
from config import TABLE_PATH

y_ = []
X_ = []
with open(TABLE_PATH,'r') as f:
    reader = csv.reader(f)
    next(reader, None)
    for row in reader:
        y_.append(int(row[9]))
        x = np.concatenate((preds[img_idx[row[5]]],preds[img_idx[row[6]]]))
        X_.append(x)
np.sum(y_train)/y_train.shape[0]
y_ = np.array(y_)
X_ = np.array(X_)

print('Distribution of 0s and 1s :')
print('0\t',1-np.sum(y_)/y_.shape[0])
print('1\t',np.sum(y_)/y_.shape[0])

Distribution of 0s and 1s :
0	 0.13
1	 0.87


### Split the dataset into training and test sets

In [37]:
# Use one fifth of the dataset as the test set 
X_train_, X_test_, y_train_, y_test_ = train_test_split(X_, y_, test_size=0.2, random_state=1)

### Train an SVM using Cross-Validated Grid-Search

In [38]:
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.svm import SVC

# Set the parameters by cross-validation
tuned_parameters = [{'kernel': ['rbf'], 'gamma': [2e-3] }]

scores = ['precision', 'recall']

for score in scores:
    print("# Tuning hyper-parameters for %s" % score)
    print()

    clf = GridSearchCV(SVC(), tuned_parameters, cv=5, scoring='%s_macro' % score, n_jobs=-1)
    clf.fit(X_train_, y_train_)

    print("Best parameters set found on development set:")
    print()
    print(clf.best_params_)
    print()
    print("Grid scores on development set:")
    print()
    means = clf.cv_results_['mean_test_score']
    stds = clf.cv_results_['std_test_score']
    for mean, std, params in zip(means, stds, clf.cv_results_['params']):
        print("%0.3f (+/-%0.03f) for %r"
              % (mean, std * 2, params))
    print()

    print("Detailed classification report:")
    print()
    print("The model is trained on the full development set.")
    print("The scores are computed on the full evaluation set.")
    print()
    y_true, y_pred = y_test_, clf.predict(X_test_)
    print(classification_report(y_true, y_pred))
    print()


# Tuning hyper-parameters for precision

Best parameters set found on development set:

{'gamma': 0.002, 'kernel': 'rbf'}

Grid scores on development set:

0.957 (+/-0.016) for {'gamma': 0.002, 'kernel': 'rbf'}

Detailed classification report:

The model is trained on the full development set.
The scores are computed on the full evaluation set.

              precision    recall  f1-score   support

           0       0.89      0.50      0.64        16
           1       0.91      0.99      0.95        84

   micro avg       0.91      0.91      0.91       100
   macro avg       0.90      0.74      0.79       100
weighted avg       0.91      0.91      0.90       100


# Tuning hyper-parameters for recall

Best parameters set found on development set:

{'gamma': 0.002, 'kernel': 'rbf'}

Grid scores on development set:

0.663 (+/-0.132) for {'gamma': 0.002, 'kernel': 'rbf'}

Detailed classification report:

The model is trained on the full development set.
The scores are computed on the f