**SVM Multiclass Classification.**

SVM does not support multiclass classification natively. Two commonly used approaches that extend SVM for multiclass classification are One-vs-One and One-vs-Rest. In this exercise, we would like you to apply multiclass classification using SVM to classify number 0-10 from MNIST dataset. 
 
Specifically, we would like you to explore the following: 

1. **[5 scores]** You may randomly select 6000 samples for training and 1000 sample for testing. Ensure that you have chosen the samples evenly from each class. Then, show us the distribution of labels in the selected training and testing samples.

2. **[10 scores]** Let's assume that we choose the RBF kernel for SVM. You may separate your training set for tuning and validation. Please show the following results:   
*   a. Show the accuracy (or loss ) curves across of the validation set across different kernels and model parameters. 
*   b. Pick the best set of parameters and verify the final performance on the testing dataset.  

3. **[25 scores]** To see the differences between One-vs-one and One-vs-the rest. Let’s observe the positive and negative supports.  
*   a. For one-vs-one classification, what is the number of binary classifiers and how is it related to the number of classes? 
    - Observe the positive and negative supports of the first separation, the last separation, and any where in the middle.
*   b. For one-vs-rest classification, same question for the binary classifiers and number of classes. 
    - Also, observe the positive and negative supports of the first separation,  the last separation, and any where inbetween.
    
*   c. Can you tell the differences between the observation in (3.a) and (3.b)? 
    - For each observation, you may plot the mean shapes of the positive and negative supports & the histogram of the labels associated with the positive and negative supports.



---

Note.

To get the full score, you should be able to provide the following plots with resonable results and **with good explaination**:

1. SVM_1_MNIST_label_distribution.png  **[5 scores]**  
2. SVM_2_ModelSelection.png **[5 scores]** +  your answers **[5 scores]**
3. Two sets for one_vs_one plots **[10 scores]**  and one_vs_rest_0/8/x plots **[10 scores]** and your answers **[5 scores]**. The examples of the plot files are as follows: 

  - SVM_3_mean_positive_support_one_vs_one_0.png
  - SVM_3_mean_positive_support_one_vs_one_8.png
  - SVM_3_mean_positive_support_one_vs_one_x.png
  - SVM_3_mean_negative_support_one_vs_one_0.png
  - SVM_3_mean_negative_support_one_vs_one_8.png
  - SVM_3_mean_negative_support_one_vs_one_x.png

  - SVM_3_MNIST_neg_pos_distribution_one_vs_one_0.png
  - SVM_3_MNIST_neg_pos_distribution_one_vs_one_8.png
  - SVM_3_MNIST_neg_pos_distribution_one_vs_one_x.png

  - SVM_3_positive_support_one_vs_one_0.png
  - SVM_3_positive_support_one_vs_one_8.png
  - SVM_3_positive_support_one_vs_one_x.png

  - SVM_3_negative_support_one_vs_one_0.png
  - SVM_3_negative_support_one_vs_one_8.png
  - SVM_3_negative_support_one_vs_one_x.png



In [None]:
from scipy.stats import mode
import numpy as np
#from mnist import MNIST
from time import time
import pandas as pd
import os
import matplotlib.pyplot as plt
import matplotlib 
 
from itertools import chain
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
 
from sklearn.model_selection import ParameterGrid
from sklearn.svm import SVC, LinearSVC
from sklearn.multiclass import OneVsRestClassifier
import pandas as pd
import tensorflow as tf

########################################################### 
####################   Q.1 [5 scores]  ####################
########################################################### 

# Load the MNIST dataset

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

# <<<< Q.1 Sample the data and reshape it into a 2D array, e.g. seq = np.random.randint(0,60000,6000)
# [Hint!] Dont forget to reshape train_images and test_images, e.g., train_images[seq,:,:].reshape(-1,28*28)
seq_train =  # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
train_samp  = train_images[seq_train,:,:] 
trlab_samp  = train_labels[seq_train] 

seq_test =  # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
test_samp   = test_images[seq_test, :,:]
tslab_samp  = test_labels[seq_test]  

# <<<< Q.1 Sample the distribution of the training and testing labels. 
# [Hint] You may use `hist, bin= np.histogram(trlab_samp, range=[0,10])`
# to get the histogram of the training labels.

train_hist, train_bins =  # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
test_hist, test_bins   =  # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

ticks = range(10) 
width = 0.4
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 4)) 
ax = axes[0]
ax.bar(ticks,  , width, label='Training') # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
ax.set_xticks(ticks)
ax.set_ylabel('Frequency')
ax.set_xlabel('Label')
ax.set_title('Training Labels')
 
ax = axes[1] 
ax.bar(ticks,  , width, label='Testing') # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
ax.set_xticks(ticks)
ax.set_ylabel('Frequency')
ax.set_xlabel('Label')
ax.set_title('Testing Labels')  
fig.savefig("SVM_1_MNIST_label_distribution.png")

In [None]:
#################################################################
#####################   Q.2 [10 scores] #########################
#################################################################
# 2. Training and model selection

from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import hinge_loss

c_list =   #<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<                       
g_list =   #<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 

# [Hint] You can also recheck your result with GridSearchCV. For example .... 
''' 
gamma_list = [0.01, 0.1, 1]
c_list     = [0.001, 0.1, 1, 10, 100]
param_grid = {'C': c_list, 'gamma':  gamma_list}

# Create a GridSearchCV object with the SVM model, hyperparameters, and custom scoring function
svc = SVC(kernel='rbf')
grid_search = GridSearchCV(svc, param_grid=param_grid, cv=5, return_train_score=True)

# Fit the GridSearchCV object to the training data
grid_search.fit(train_samp, trlab_samp)

# Get the best hyperparameters and associated training and validation losses
best_params = grid_search.best_params_ 
C = best_params['C']
gamma = best_params['gamma']
'''

sub_train_samp, val_samp, sub_trlab_samp, valab_samp = train_test_split(train_samp, trlab_samp, test_size=0.2)

tuning_ = [] 

for c in c_list:  
  for g in g_list:
    # <<<<<<<<<<<<<<<< Q.2.A  Train the SVM and compute the accuracy or loss on the training and validation data. 
    # [Hint] Train the SVM, i.e., svm = SVC(kernel='rbf', C=xx, gamma=xx), for the selected hyperparameters using svm.fit() ... 
    # and compute the accuracy or loss on the training and validation data. 
     

    # Compute the accuracy (or loss), e.g. accuracy_train = svm.score(sub_train_samp, sub_trlab_samp)...
    # alternatively, you can comput the loss, e.g., using hinge_loss(sub_trlab_samp, svm.decision_function(sub_train_samp))
    

    # Compute the accuracy (or loss), e.g. accuracy_validate = svm.score(val_samp, valab_samp) 
    # alternatively, you can comput the loss, e.g., using hinge_loss(valab_samp, svm.decision_function(val_samp))

    tuning_.append({"C":c, "gamma":g,   'ACC/Loss_tra' : accuracy_train, 'ACC/Loss_val' : accuracy_validate })
    print({"C":c, "gamma":g,    'ACC/Loss_tra' : accuracy_train, 'ACC/Loss_val' : accuracy_validate })

df_tuning = pd.DataFrame(tuning_)
 
training_acc   = df_tuning['ACC/Loss_tra']
validating_acc = df_tuning['ACC/Loss_val']

fig = plt.figure(figsize=(5,5)) 
plt.plot(training_acc, label='training',color='blue', linewidth=2.0)
plt.plot(validating_acc, label='validate',color='red', linewidth=2.0)
plt.grid(which='major', color='#DDDDDD', linewidth=0.8)
plt.grid(which='minor', color='#EEEEEE', linestyle=':', linewidth=0.5)
plt.xlabel("C, gamma parameters") 
plt.legend()
plt.ylabel("Loss [The lower the better]") # <<<< If you use Accuracy [the higher the better], please change the y axist to accuracy... if you are using loss, please change the y axis to loss.
fig.savefig("SVM_2_ModelSelection.png")

# You can also try .... 

# gamma_list = [0.01, 0.1, 1]
# c_list     = [0.001, 0.1, 1, 10, 100]
# param_grid = {'C': c_list, 'gamma':  gamma_list}

# # Create a GridSearchCV object with the SVM model, hyperparameters, and custom scoring function
# svc = SVC(kernel='rbf')
# grid_search = GridSearchCV(svc, param_grid=param_grid, cv=5, return_train_score=True)

# # Fit the GridSearchCV object to the training data
# grid_search.fit(train_samp, trlab_samp)

# # Get the best hyperparameters and associated training and validation losses
# best_params = grid_search.best_params_

In [None]:
#################################################################
#####################   Q.3 [25 scores] #########################
#################################################################
# 3. To see the differences between One-vs-one and One-vs-the rest. Let’s observe the positive and negative supports.  
#     a. For one-vs-one classification, what is the number of binary classifiers and how is it related to the number of classes? 
#         - Observe the positive and negative supports of the first separation, the last separation, and any where in the middle.
#     b. For one-vs-rest classification, same question for the supports and number of classes. 
#         - Also, observe the positive and negative supports of the first separation,  the last separation, and any where inbetween.
#     c. Can you tell the differences between the observation in (3.a) and (3.b)? 
#     - For each observation, you may plot the mean shapes of the positive and negative supports & the histogram of 
#       the labels associated with the positive and negative supports.
 
best_c = 1 # Please change the number to the best C you found in the previous step
best_gamma =  1 # Please change the number to the best C you found in the previous step

type_svm =  "one_vs_one"  #<<<<<<   Q.3.A,B,C [Please change the type of SVM you want to use. You can choose either "one_vs_one" or "one_vs_rest"]

# Perform the training for SVM classification
if type_svm == "one_vs_one":
  # <<<<<<<< Q.3.A  Train One vs One [SVM model with RBF kernel].  
  # [Hint]: svm = SVC(kernel='rbf', C=XX, gamma=XX) and svm.fit(train_samp, trlab_samp)
  #

elif type_svm == "one_vs_rest":
  # #<<<<<< Q.3.B Train One vs Rest [SVM model with RBF kernel].   
  # [Hint]: Use OneVsRestClassifier function from sklearn.multiclass from svm to ovr_svc = OneVsRestClassifier(svm) and ovr_svc.fit(train_samp, trlab_samp)
  # 
  


# Visualize the supports (positive and negative supports)

if type_svm == "one_vs_one":
  dual_coef = svm.dual_coef_
  support   = svm.support_

  print("Number of  binary classifiers: %d" % dual_coef.shape[0])
  print("Number of  Support Coefficients: %d" % dual_coef.shape[1])
  
  class_i = 0 # <<<<   Q.3.A,C Try the first separation class_i=0, the last separation class_i=8, and any where inbetween (e.g. class_i=3) ...Check with the number of  binary classifiers
  separate_i    = class_i
  pos_support = support[dual_coef[separate_i,:] > 0]
  neg_support = support[dual_coef[separate_i,:] < 0]

elif type_svm == "one_vs_rest":  
   
  class_i = 0  #  <<<<  Q.3.B,C  Try the first separation class_i=0, the last separation class_i=8, and any where inbetween (e.g. class_i=3)...Check with the number of binary classifiers 

  # Get the binary classifiers for each class
  binary_clf = ovr_svc.estimators_[class_i] 
  # get the dual coefficients for class class_i
  dual_coef = binary_clf.dual_coef_
  support   = binary_clf.support_

  print("Number of  binary classifiers: %d" % len(ovr_svc.estimators_))
  print("Number of  Support Coefficients: %d" % dual_coef.shape[1])

  pos_support = support[dual_coef[0,:] > 0]
  neg_support = support[dual_coef[0,:] < 0]


print(f"Number of supports for positive class: {len(pos_support)}")
print(f"Number of supports for negative class: {len(neg_support)}")
 

In [None]:
# Plot the mean shapes of the positive supports
# [Hint] Use `pos_support` to find the positive support from the traing samples, i.e, train_samp, or from svm.support_vectors_
pos_supports     =    #<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< <<<<  Q.3.A,B,C
label_pos        =    #<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< <<<<  Q.3.A,B,C

av_train = pos_supports.mean(axis=0).reshape(-1, 28, 28) 
fig = plt.figure(figsize=(5,5))
plt.imshow(av_train.reshape(28, 28), cmap=plt.cm.RdBu)
fig.savefig("SVM_3_mean_positive_support_%s_%d.png" % (type_svm,class_i))

show_support = min(len(pos_support),len(neg_support))  
chosen_sample = np.random.randint(0, show_support,100)

ind = 0
fig = plt.figure(figsize=(24,50))
for i, sample_id in enumerate(chosen_sample):
  l1 = plt.subplot(int(len(chosen_sample)/5), 5, i + 1)  
  sv_image = pos_supports[sample_id,:]
  sv_label = label_pos[sample_id]
  l1.imshow(sv_image.reshape(28, 28), cmap=plt.cm.RdBu)
  l1.set_xticks(())
  l1.set_yticks(())
  l1.set_xlabel('Sep/Class %d : Sample %d, label %s' % (class_i, i, str(sv_label))) 
fig.savefig("SVM_3_positive_support_%s_%d.png" % (type_svm,class_i))


In [None]:
# Plot the mean shapes of the negative supports
# [Hint] Use `neg_support` to find the negative supports from the traing samples, i.e, train_samp, or from svm.support_vectors_
neg_supports =    #<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< <<<<  Q.3.A,B,C
label_neg        =     #<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< <<<<  Q.3.A,B,C

av_train = neg_supports.mean(axis=0).reshape(-1, 28, 28) 
fig = plt.figure(figsize=(5,5))
plt.imshow(av_train.reshape(28, 28), cmap=plt.cm.RdBu)
fig.savefig("SVM_3_mean_negative_support_%s_%d.png" % (type_svm,class_i))
 
ind = 0
fig = plt.figure(figsize=(24,50))
for i, sample_id in enumerate(chosen_sample):
  l1 = plt.subplot(int(len(chosen_sample)/5), 5, i + 1)  
  sv_image = neg_supports[sample_id,:]
  sv_label = label_neg[sample_id]
  l1.imshow(sv_image.reshape(28, 28), cmap=plt.cm.RdBu)
  l1.set_xticks(())
  l1.set_yticks(())
  l1.set_xlabel('Sep/Class %d : Sample %d, label %s' % (class_i, i, str(sv_label))) 

fig.savefig("SVM_3_negative_support_%s_%d.png" % (type_svm,class_i))

In [None]:
# <<<<<<<<<<<<<  Q3.C  Calculate the histogram of the labels associated with the positive and negative supports 
pos_hist, pos_bins =   #<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
neg_hist, neg_bins =   #<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

width = 0.8
ticks = np.arange(10)

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 4)) 
ax = axes[0]
ax.bar(range(10), pos_hist, width )
ax.set_xticks(ticks)
ax.set_ylabel('Frequency')
ax.set_xlabel('Label')
ax.set_title("Positive Supports' Labels")
 
ax = axes[1] 
ax.bar(range(10), neg_hist, width )
ax.set_xticks(ticks)
ax.set_ylabel('Frequency')
ax.set_xlabel('Label')
ax.set_title("Negative Supports' Labels")

plt.show()
fig.savefig("SVM_3_MNIST_neg_pos_distribution_%s_%d.png" % (type_svm,class_i))