# Lesson 0011 - MNIST Support Vector Machines Classification
We assume the reader to be familiar with the preceeding lessons.<br>
In this lesson, we want to attempt to classify the MNIST data which we introduced in [lesson 0010](https://github.com/Mathhead/Lessons-in-Machine-Learning/blob/master/lesson_0010_mnist_classification_linear_classifier.ipynb) using support vector machines as in lessons [0002](https://github.com/Mathhead/Lessons-in-Machine-Learning/blob/master/lesson_0002_iris_classification_support_vector_machines.ipynb) and [0007](https://github.com/Mathhead/Lessons-in-Machine-Learning/blob/master/lesson_0007_breast_cancer_classification_support_vector_machine.ipynb).<br>
We start by stealing the code from lesson 0010 in order to prepare the data.<br> We have to consider, that the support vector machines assume the classification to be encoded in a 1-dimensional vector, whereas in lesson 0010, we encoded the classification in a one-hot-scheme.

In [1]:
import tensorflow as tf

tf.set_random_seed( 1234567890 )

print( tf.__version__ )

  from ._conv import register_converters as _register_converters


1.12.0


In [2]:
( train_x, train_y ),( test_x, test_y ) = tf.keras.datasets.mnist.load_data()

In [3]:
import numpy as np

np.random.seed( 1234567890 )

print( np.__version__ )

1.14.3


In [4]:
mu = np.mean( train_x )

sigma = np.std( train_x )



train_x = ( train_x - mu ) / sigma

test_x = ( test_x - mu ) / sigma

In [5]:
train_x_f = np.zeros( shape = [ 60000, 28 * 28 ] )

test_x_f = np.zeros( shape = [ 10000, 28 * 28 ] )



    

for i in range( 60000 ):
    
    dummy = np.array( train_x[ i ] )
    
    train_x_f[ i, : ] = dummy.flatten()
    
    
for i in range( 10000 ):
    
    dummy = np.array( test_x[ i ] )
    
    test_x_f[ i ] = dummy.flatten()
    
    
    
    
train_x = train_x_f

test_x = test_x_f

Now we steal the code from [lesson 0002](https://github.com/Mathhead/Lessons-in-Machine-Learning/blob/master/lesson_0002_iris_classification_support_vector_machines.ipynb).

In [6]:
import sklearn as sklearn
from sklearn.svm import SVC as SVC

print( sklearn.__version__ )

0.19.1


We use __SVC__ straight out of the box.
We train the model __svc__, store the correctly predicted data in __hit__ and compute the accuracy of that model in __accuracy__.

In [7]:
svc = SVC()

svc.fit( train_x, train_y )

hit = ( svc.predict( test_x ) == test_y )

accuracy = 0.0

for i in range( 10000 ):
    
    if hit[ i ]:
        
        accuracy = accuracy + 1.0
        
accuracy = accuracy / 100

print( 'The out of the box model reached an accuracy of ' + str( accuracy ) + '%' )

The out of the box model reached an accuracy of 97.92%


Since we have a sample size of $10000$ items in the test set, we only have $208$ items misclassified using the out of the box model.<br>
We could perform a grid search like in [lesson 0006](https://github.com/Mathhead/Lessons-in-Machine-Learning/blob/master/lesson_0006_breast_cancer_classification_linear_classifier.ipynb) or [lesson 0010](https://github.com/Mathhead/Lessons-in-Machine-Learning/blob/master/lesson_0010_mnist_classification_linear_classifier.ipynb), but on my machine, learning __svc__ took a little over an hour, so either we let our computers work for days, or the grid will be very sparse.<br>
Therefore, we attempt __boosting__. The idea of __boosting__ is to train $3$ simple models, and then to combine the predictions made by these models. Since learning __svc__ took over an hour, and since the algorithm used to train __svc__ is [quadratic](https://scikit-learn.org/stable/modules/svm.html) (in the best case) with respect to the number of samples, our approach for boosting will be the following:
- we train a first model, __svc_1__ on $5000$ randomly drawn training data
- we train the second model, __svc_2__ on those training data, that were misclassified by __svc_1__, but at most on $5000$ data items
- the third model, __svc_3__ will be trained on those training data, where __svc_1__ and __svc_2__ disagree, but again at most on $5000$ data
- these models will be combined to the predictor __predictor__. This predictor will classify a data item using all three classifiers and respond the majority vote. If there is a tie, the answer will be randomly drawn from the answers of __svc_1__, __svc_2__ and __svc_3__.

In [8]:
random_integers = np.random.choice( range( 60000 ), 5000, replace = False )

random_integers = np.sort( random_integers )




train_2_1_x = np.zeros( shape = [ 5000, 28 * 28 ] )

train_2_1_y = np.zeros( shape = [ 5000 ] )





j = 0

for i in range( 60000 ):
    
    if i == random_integers[ j ]:
        
        train_2_1_x[ j, : ] = train_x[ i, : ]
        
        train_2_1_y[ j ] = train_y[ i ]
        
        j = j + 1
        
        if j == 5000:
            
            j = 0
            
    
    
    
    
svc_1 = SVC()




svc_1.fit( train_2_1_x, train_2_1_y )

hit_1 = ( svc_1.predict( train_x ) == train_y )





j = 0

for i in range( 60000 ):
    
    if not hit_1[ i ]:
        
        j = j + 1
        

        
j = min( j, 5000 )





train_2_2_x = np.zeros( shape = [ j, 28 * 28 ] )

train_2_2_y = np.zeros( shape = [ j ] )





if j < 5000:

    k = 0

    for i in range( 60000 ):
    
        if not hit_1[ i ]:
        
            train_2_2_x[ k, : ] = train_x[ i, : ]
        
            train_2_2_y[ k ] = train_y[ i ]
        
            k = k + 1
            
else:
    
    random_indexes = []
    
    for  i in range( 60000 ):
        
        if not hit_1[ i ]:
            
            random_indexes.append( i )
            
            
    
    random_integers = np.random.choice( random_indexes, 5000, replace = False )
    
    random_integers = np.sort( random_integers )
    
    

    
    j = 0

    for i in range( 60000 ):
    
        if i == random_integers[ j ]:
        
            train_2_2_x[ j, : ] = train_x[ i, : ]
        
            train_2_2_y[ j ] = train_y[ i ]
        
            j = j + 1
        
            if j == 5000:
            
                j = 0
        
   


        
svc_2 = SVC()




svc_2.fit( train_2_2_x, train_2_2_y )

hit_3 = ( svc_1.predict( train_x ) == svc_2.predict( train_x ) )






j = 0

for i in range( 60000 ):
    
    if not hit_3[ i ]:
        
        j = j + 1
        

        
j = min( j, 5000 )



train_2_3_x = np.zeros( shape = [ j, 28 * 28 ] )

train_2_3_y = np.zeros( shape = [ j ] )


if j < 5000:
    
    k = 0

    for i in range( 60000 ):
    
        if not hit_3[ i ]:
        
            train_2_3_x[ k, : ] = train_x[ i, : ]
        
            train_2_3_y[ k ] = train_y[ i ]
        
            k = k + 1
            
else:
    
    random_indexes = []
    
    for  i in range( 60000 ):
        
        if not hit_3[ i ]:
            
            random_indexes.append( i )
            
            
    
    random_integers = np.random.choice( random_indexes, 5000, replace = False )
    
    random_integers = np.sort( random_integers )
    
    

    
    j = 0

    for i in range( 60000 ):
    
        if i == random_integers[ j ]:
        
            train_2_3_x[ j, : ] = train_x[ i, : ]
        
            train_2_3_y[ j ] = train_y[ i ]
        
            j = j + 1
        
            if j == 5000:
            
                j = 0
                

                

svc_3 = SVC()



svc_3.fit( train_2_3_x, train_2_3_y )



def predictor( data ):
    
    data = data.reshape( 1, 28 * 28 )
    
    pred = [ svc_1.predict( data )[ 0 ], svc_2.predict( data )[ 0 ], svc_3.predict( data )[ 0 ] ]
    
    sorted_pred = np.sort( pred )
    
    if ( sorted_pred[ 0 ] == sorted_pred[ 1 ] ):
        
        return sorted_pred[ 0 ]
    
    elif ( sorted_pred[ 1 ] == sorted_pred[ 2 ] ):
        
        return sorted_pred[ 2 ]
    
    else:
        
        return sorted_pred[ np.random.randint( 0, 2, 1 )[ 0 ] ]
    
    


accuracy_predictor = 0.0



for i in range( 10000 ):
    
    if predictor( test_x[ i, ] ) == test_y[ i ]:
        
        accuracy_predictor = accuracy_predictor + 1.0
        

        
        
accuracy_predictor = accuracy_predictor / 100.0




print( 'The boosted predictor achieves an accuracy of ' + str( accuracy_predictor ) + '%' )

The boosted predictor achieves an accuracy of 95.4%


Ok, the boosted __predictor__ is worse than the original support vector machine which was trained on the complete training set.<br>
Class dismissed.