# Painting style recognition from images

We will evaluate linear regression, without and with feature expansion, kernel regression and kernel SVM on the task of painting style recognition from images. In the SVM case, we will make use of scikit-learn.

We will use a subset of the wikiart dataset of Tan et al., A Deep Convolutional Network for Fine-art Paintings Classification, ICIP 2016. This subset consists of 64x64 images of paintings of 8 different styles (Abstract-Expressionism, Art-Nouveau Modern, Baroque, Color Field Painting, Cubism, Early Renaissance, Expressionism, High Renaissance). There are between 1343 and 2782 examples per class. I precomputed features (Histogram of Oriented Gradient) from the images. The data is given in two parts (because of Moodle size limitations). Each part contains:
- X: the feature vector for each image
- Y: the label of each image

Let us first load the data

In [None]:
import numpy as np
import scipy.io as sio # This will allow us to load the data
data = sio.loadmat('wikiart_data/wikiart_data_1.mat')
X1 = data['X1']
Y1 = data['Y1']
data = sio.loadmat('wikiart_data/wikiart_data_2.mat')
X2 = data['X2']
Y2 = data['Y2']
X = np.vstack((X1,X2))
Y = np.vstack((Y1,Y2))
print(X.shape)
print(Y.shape)

Let us visualize the first image of each class

In [None]:
import matplotlib.pyplot as plt
import os

In [None]:
f, ax = plt.subplots(2, 4)
os.chdir('wikiart_samples')
for i in range(0,2):
    for j in range(0,4):
        dirnum = i*4+j+1
        os.chdir(str(dirnum))
        name = os.listdir()
        img = plt.imread(name[0])
        ax[i,j].imshow(img)
        os.chdir('..')
os.chdir('..')
plt.show()

We can then separate the data between training and test samples. To this end, we will make use of the scikit-learn train_test_split function, and keep one third of the data as test data.

In [None]:
from sklearn.model_selection import train_test_split
Xtrain,Xtest,Ytrain,Ytest = train_test_split(X,Y,test_size=0.33,random_state=1)
Ytrain = Ytrain.squeeze()
Ytest = Ytest.squeeze()
print(Xtrain.shape)
print(Xtest.shape)
print(Ytrain.shape)
print(Ytest.shape)

Let us compute one-hot encodings of the training labels

In [None]:
Ytrain_oh = np.zeros((Ytrain.shape[0],8))
Ytrain_oh[(np.arange(Ytrain_oh.shape[0]),Ytrain.flatten()-1)] = 1
print(Ytrain_oh.shape)

## Linear regression

Let us first look at linear regression for a linear baseline. The first thing to do is to add a 1 to the inputs to account for the bias

In [None]:
Xbtrain = np.hstack((np.ones((Xtrain.shape[0],1)),Xtrain))
Xbtest = np.hstack((np.ones((Xtest.shape[0],1)),Xtest))

 Then, we can compute the optimal parameter matrix W

In [None]:
M = np.linalg.pinv(Xbtrain)
W = M@Ytrain_oh
print(W.shape)

From these weights, we can compute the predicted class score vectors, and convert them to labels

In [None]:
Yhat_oh = Xbtest@W
Yhat = np.argmax(Yhat_oh,axis=1)+1

Then, we can compute the confusion matrix and the accuracy

In [None]:
import sklearn.metrics as skm
cmat = skm.confusion_matrix(Ytest,Yhat)
cmat

In [None]:
np.diag(cmat).sum()/cmat.sum()

## Feature expansion

We will now look at expanding the features. To this end, we will consider quadratic expansion, but, because of the size of the input, limit ourselves to the squares of individual variables (i.e., we will not consider products of the form $x_i^{(j)}x_i^{(k)}$). Note that we will still use the original features in addition to the quadratic ones. Let us first compute the expanded features

In [None]:
# TODO: Expand the original features with quadratic ones
Phibtrain = np.hstack((Phitrain,np.ones((Phitrain.shape[0],1))))
Phibtest = np.hstack((Phitest,np.ones((Phitest.shape[0],1))))
print(Phibtrain.shape)
print(Phibtest.shape)

We can then again apply linear regression to the resulting features

In [None]:
M = np.linalg.pinv(Phibtrain)
W = M@Ytrain_oh
Yhat_oh = Phibtest@W
Yhat = np.argmax(Yhat_oh,axis=1)+1

Then, we can compute the confusion matrix and the accuracy

In [None]:
cmat = skm.confusion_matrix(Ytest,Yhat)
cmat

In [None]:
np.diag(cmat).sum()/cmat.sum()

This is disappointing: It means that, with the additional features, the model is able to better fit to the training data, without generalizing to the test data (you can see this by evaluating the trained models on the training data instead of the testing one). This is a case of "overfitting", which we will discuss next week. Feel free to experiment with other expansion strategies, such as higher polynomial degrees and sine/cosine functions, to see if you can improve this result.

## Kernel regression

Let us now look at kernel regression for classification. We will try with both a quadratic polynomial kernel and an RBF one. Let us start with the polynomial one. First, we need to compute the kernel matrices (training and test). To this end, make use of the data augmented with the additional 1. Note that the quadratic kernel is not equivalent to our previous quadratic feature expansion, because it implicitly encompasses all pairwise products between the variables.

In [None]:
# TODO: Compute the training and testing kernel matrices for a quadratic kernel
print(K.shape)
print(Kt.shape)

We can then compute the scores for the test data and convert them into labels. Note that, in practice, the kernel matrix is quite large, and is therefore likely to have low rank and thus not be invertible. To overcome this, we can add a small value, e.g., 1e-3, on its diagonal. This may seem like a heuristic but is in fact justified, as we will discuss next week during the lecture.

In [None]:
# TODO: Compute the predictions (Yhat_oh) using the closed-form solution of kernel regression
Yhat = np.argmax(Yhat_oh,axis=1)+1

Compute the confusion matrix and the accuracy

In [None]:
cmat = skm.confusion_matrix(Ytest,Yhat)
cmat

In [None]:
np.diag(cmat).sum()/cmat.sum()

Let us now try with an RBF kernel. Play with the value $\sigma^2$ used in this kernel. First, compute the training and test kernel matrices. Note that the pairwise distances between the samples can be computed using the scikit-learn function sklearn.metrics.pairwise.pairwise_distances.

In [None]:
# TODO: Compute the training and testing kernel matrices for an RBF kernel

Compute the scores for the test data and convert them to labels. Again, when inverting the kernel matrix, add a small value to its diagonal.

In [None]:
# TODO: Compute the predictions (Yhat_oh) using the closed-form solution of kernel regression
Yhat = np.argmax(Yhat_oh,axis=1)+1

Compute the confusion matrix and the accuracy

In [None]:
cmat = skm.confusion_matrix(Ytest,Yhat)
cmat

In [None]:
np.diag(cmat).sum()/cmat.sum()

## Support Vector Machine

We can now look at linear and kernel SVM (in scikit-learn). Recall that, with scikit-learn, you don't need to add the 1 to the input features because this is handled within the SVM implementation. In both cases, evaluate the influence of the hyper-parameter $C$ that balances the margin-related term with the one penalizing large slack variables. Start with the linear case, using LinearSVC, by fitting the classifier to the training data (N.B. Do not worry too much about the warning regarding the number of iterations; training is already long and it seems that we are in fact close to convergence).

In [None]:
from sklearn import svm
clf = svm.LinearSVC(loss='hinge',C=1)
clf.fit(Xtrain, Ytrain)

You can then use the classifier to predict the labels for the test data

In [None]:
Yhat = clf.predict(Xtest)

Then, compute the confusion matrix and the accuracy

In [None]:
cmat = skm.confusion_matrix(Ytest,Yhat)
cmat

In [None]:
np.diag(cmat).sum()/cmat.sum()

With linear SVM, we can still use feature expansion. Let's re-use the same quadratically-expanded features as before.

In [None]:
# TODO: Train a linear SVM with the same expanded features as before

You can then use the classifier to predict the labels for the test data

In [None]:
Yhat = clf.predict(Phitest)

Then, compute the confusion matrix and the accuracy

In [None]:
cmat = skm.confusion_matrix(Ytest,Yhat)
cmat

In [None]:
np.diag(cmat).sum()/cmat.sum()

At least, this time, we can get slightly better results than with the original features.

And now let us apply kernel SVM with an RBF kernel. To this end, you need to use the SVC function of scikit-learn. Evaluate the influence of $\sigma$ (gamma in scikit-learn, hint: around 0.1 seems reasonable). Again, first fit the classifier to the training data, and then use it to predict the labels for the test data

In [None]:
# TODO: Train a kernel SVM with an RBF kernel

In [None]:
Yhat = clf.predict(Xtest)

Then, compute the confusion matrix and the accuracy

In [None]:
cmat = skm.confusion_matrix(Ytest,Yhat)
cmat

In [None]:
np.diag(cmat).sum()/cmat.sum()