# MODEL SELECTION

We will focus on two different algorithms to classify our dataset. The first one is SVM and the other one is a Neural Network, because they have not been extensively practiced in class. The goal of this section is to choose which one of these two algorithms we will keep working on and tune in order to get a very good accuracy on the test dataset. 

For this purpose, and for computational efficiency, we will select a subset of the entire training set and proceed to do cross-validation on this new subset of the data to select what model provides the best cross-validation score. The one reporting the best cross-validation score will be the chosen one to fine-tune and practice in the entire dataset. The models selected are:

- Linear Kernel SVM
- Polynomial Kernel SVM
- Simple Neural Network. 

In [1]:
from __future__ import absolute_import, division, print_function, unicode_literals

import keras
import tensorflow as tf
import h5py

from keras.utils import to_categorical
from keras.datasets import cifar10
import matplotlib.pyplot as plt
from sklearn.model_selection import KFold

#Load original cifar10 dataset
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


Shuffling the data

In [14]:
from sklearn.utils import shuffle

train_images_s, train_labels_s = shuffle(train_images,train_labels)

xval_samples = 5000
train_images_xval = train_images_s[:xval_samples]
train_labels_xval = train_labels_s[:xval_samples]

# LINEAR KERNEL SVM

In [5]:
from AR_functions import linear_svm

In [None]:
train_images_xval_svm = train_images_xval.reshape(train_images_xval.shape[0], 32*32*3)
nsplits=5
kf = KFold(n_splits=nsplits, shuffle=True)

c_values = [0.0001, 0.001, 0.1, 1.0, 10]

acc_train_lin = []
acc_test_lin = []

for c in c_values:
    acc_train = 0
    acc_test  = 0
    for train, test in kf.split(train_images_xval_svm):
        
        x_train, y_train = train_images_xval_svm[train], train_labels_xval[train]
        
        x_test, y_test = train_images_xval_svm[test], train_labels_xval[test]
        
        train_scores, test_scores = linear_svm(c, x_train, y_train.ravel(), x_test, y_test.ravel())

        acc_train = acc_train + train_scores
        acc_test = acc_test + test_scores
        

    acc_train = acc_train/nsplits
    acc_test = acc_test/nsplits


    acc_train_lin.append(acc_train)
    acc_test_lin.append(acc_test)
    print(acc_test_lin)
    
        
plt.figure()
plt.plot(c_values, acc_train_lin)
plt.plot(c_values,acc_test_lin)
plt.xscale("log")
plt.show()   