# General Instructions to students:

1. There are 5 types of cells in this notebook. The cell type will be indicated within the cell.
    1. Markdown cells with problem written in it. (DO NOT TOUCH THESE CELLS) (**Cell type: TextRead**)
    2. Python cells with setup code for further evaluations. (DO NOT TOUCH THESE CELLS) (**Cell type: CodeRead**)
    3. Python code cells with some template code or empty cell. (FILL CODE IN THESE CELLS BASED ON INSTRUCTIONS IN CURRENT AND PREVIOUS CELLS) (**Cell type: CodeWrite**)
    4. Markdown cells where a written reasoning or conclusion is expected. (WRITE SENTENCES IN THESE CELLS) (**Cell type: TextWrite**)
    5. Temporary code cells for convenience and TAs. (YOU MAY DO WHAT YOU WILL WITH THESE CELLS, TAs WILL REPLACE WHATEVER YOU WRITE HERE WITH OFFICIAL EVALUATION CODE) (**Cell type: Convenience**)
    
2. You are not allowed to insert new cells in the submitted notebook.

3. You are not allowed to **import** any extra packages.

4. The code is to be written in Python 3.6 syntax. Latest versions of other packages maybe assumed.

5. In CodeWrite Cells, the only outputs to be given are plots asked in the question. Nothing else to be output/print. 

6. If TextWrite cells ask you to give accuracy/error/other numbers you can print them on the code cells, but remove the print statements before submitting.

7. Any runtime failures on the submitted notebook as it is will get zero marks.

8. All code must be written by yourself. Copying from other students/material on the web is strictly prohibited. Any violations will result in zero marks.

9. The dataset is given as .npz file, and will contain data in  numpy array. 

10. All plots must be labelled properly, all tables must have rows and columns named properly.

11. You are allowed to use the numpy library to calculate eigen values. All other functions should be written from scratch.

## Change the name of the file you submit with your rollNo_Name.ipynb 
### Eg. CS20B001_FullName.ipynb




In [3]:
# Cell type : CodeRead

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
import matplotlib as mpl
import pandas as pd
from sklearn.model_selection import GridSearchCV #for gridsearch CV as I learned it is allowed

Create Directory `../../Data/` and save data files in it.

In [4]:
## MAKE YOUR DIRECTORY STRUCTURE TO CONTAIN DATA FOLDER IN THE FOLLOWING WAY.


dataA = np.load("Data/data1.npz")
dataB = np.load("Data/data2.npz")

print (dataA.files)
print (dataB.files)

## Training data = arr1, arr2 
## Test data = arr3, arr4


['arr1', 'arr2', 'arr3', 'arr4']
['arr1', 'arr2', 'arr3', 'arr4']


# SVM

Write code for learning SVM below. Also write code for choosing **best hyperparameters** for **each kernel** type. You may use `sklearn.svm` for this purpose. _(use a part of training set as validation set)_

Write code for running in the cell after. 

In `text cell` after that report the following numbers you get by running appropriate code:

For each classification data set report the best kernel and regularisation parameters for 
* Linear, 
* RBF and 
* Poly kernels.   
Note: Linear has no kernel parameter.   

Report the training and test zero-one error for those hyperparameters.

For the synthetic classification datasets in 2-dimensions, also illustrate the learned classifier for each kernel setting. Do this in the last codeWrite cell for this question.

In [5]:
# CodeWrite cell
# Write SVM classifier using SKlearn
# write only functions here

def Load_test_train(data): # loads the testing and training datasets 
    x_train = {}
    y_train = {}
    x_test  = {}
    y_test  = {}
    
    x_train = data['arr1']
    y_train = data['arr2']
    x_test  = data['arr3']
    y_test  = data['arr4']
    
    return x_train,y_train,x_test,y_test


def zero_one_err(y_true,y_pred):  #function to calculate 0-1 error           
    flag = 0
    for i in range(len(y_true)):
        if(y_true[i] != y_pred[i]):
            flag +=1 #keep a count of wrong predictions
    
    error = flag/len(y_true) #percentage of misclassification
    
    return error

SVM = svm.SVC() #our support vector machine

'''
generating hyperparameter arrays for grid search 
Start-> Starting value,
Scale-> Scaling for successive values,
n = total no of points

'''    
    

def hyp_array(Start,Scale=10,n=5): 
    params = np.zeros(n)
    for i in range(n):
        params[i] = Start*(Scale**i)
    return params



In [7]:
Dataset = {'A':dataA,'B':dataB}
#C_params = hyp_array(0.1)
C_params = [0.1, 1, 10, 100, 1000]
#gamma_params = hyp_array(1,0.1)
gamma_params = [1, 0.1, 0.01, 0.001, 0.0001]
kernel_params = ['linear','poly','rbf']

In [None]:
# CodeWrite cell
# Write code here for doing validation (for kernel_param and regularisation_param)
# on a subset of the training set. 
# Also for generating the numbers that you report below.
best_params = {}
scores = {}
for index in ['A','B']:
    best_params[index] = {}
    scores[index] = {}
    x_train,y_train,x_test,y_test = Load_test_train(Dataset[index])
    
    for kernels in kernel_params:
        scores[index][kernels] = {}
        kernel_label = []
        kernel_label.append(kernels)
        if(kernels == 'linear'):
            param_labels = {'C':C_params,
              'kernel':kernel_label }
        else:
            param_labels = {'C':C_params,
                'gamma': gamma_params,
                  'kernel':kernel_label }
        
        grid = GridSearchCV(SVM,param_labels,verbose = 3,return_train_score = True)
        grid.fit(x_train,y_train)
    
        best_params[index][kernels] = grid.best_params_
        if(kernels == 'linear'):
            C_val = best_params[index][kernels]['C']
            gamma_val = 0
        else:
            C_val = best_params[index][kernels]['C']
            gamma_val = best_params[index][kernels]['gamma']
        for i in range(2):
            scores[index][kernels][str(i)] = {}
            C_val *= 5**i
            for j in range(2):
                scores[index][kernels][str(i)][str(j)] = {}
                gamma_val *= 2**j
                if(kernels == 'linear'):
                    SVM_2 = svm.SVC(C =C_val,kernel = kernels)
                else:
                     SVM_2 = svm.SVC(C =C_val,kernel = kernels,gamma = gamma_val)
                SVM_2.fit(x_train,y_train)
                ytrain_pred = SVM_2.predict(x_train)
                ytest_pred  = SVM_2.predict(x_test)
                scores[index][kernels][str(i)][str(j)]['train'] = zero_one_err(y_train,ytrain_pred)
                scores[index][kernels][str(i)][str(j)]['test']  = zero_one_err(y_test,ytest_pred)
        

Fitting 5 folds for each of 5 candidates, totalling 25 fits
[CV] C=0.1, kernel=linear ............................................
[CV]  C=0.1, kernel=linear, score=(train=0.823, test=0.833), total=   0.0s
[CV] C=0.1, kernel=linear ............................................
[CV]  C=0.1, kernel=linear, score=(train=0.825, test=0.820), total=   0.0s
[CV] C=0.1, kernel=linear ............................................
[CV]  C=0.1, kernel=linear, score=(train=0.829, test=0.803), total=   0.0s
[CV] C=0.1, kernel=linear ............................................
[CV]  C=0.1, kernel=linear, score=(train=0.821, test=0.837), total=   0.0s
[CV] C=0.1, kernel=linear ............................................
[CV]  C=0.1, kernel=linear, score=(train=0.828, test=0.823), total=   0.0s
[CV] C=1, kernel=linear ..............................................
[CV]  C=1, kernel=linear, score=(train=0.823, test=0.827), total=   0.0s
[CV] C=1, kernel=linear ..........................................

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s



[CV] C=10, kernel=linear .............................................
[CV]  C=10, kernel=linear, score=(train=0.820, test=0.843), total=   0.0s
[CV] C=10, kernel=linear .............................................
[CV]  C=10, kernel=linear, score=(train=0.828, test=0.830), total=   0.0s
[CV] C=100, kernel=linear ............................................
[CV]  C=100, kernel=linear, score=(train=0.822, test=0.830), total=   0.0s
[CV] C=100, kernel=linear ............................................
[CV]  C=100, kernel=linear, score=(train=0.824, test=0.823), total=   0.0s
[CV] C=100, kernel=linear ............................................
[CV]  C=100, kernel=linear, score=(train=0.829, test=0.803), total=   0.0s
[CV] C=100, kernel=linear ............................................
[CV]  C=100, kernel=linear, score=(train=0.820, test=0.840), total=   0.0s
[CV] C=100, kernel=linear ............................................
[CV]  C=100, kernel=linear, score=(train=0.826, test=0

[Parallel(n_jobs=1)]: Done  25 out of  25 | elapsed:    1.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s


[CV]  C=0.1, gamma=1, kernel=poly, score=(train=0.818, test=0.823), total=   0.0s
[CV] C=0.1, gamma=1, kernel=poly .....................................
[CV]  C=0.1, gamma=1, kernel=poly, score=(train=0.810, test=0.827), total=   0.0s
[CV] C=0.1, gamma=1, kernel=poly .....................................
[CV]  C=0.1, gamma=1, kernel=poly, score=(train=0.820, test=0.803), total=   0.0s
[CV] C=0.1, gamma=1, kernel=poly .....................................
[CV]  C=0.1, gamma=1, kernel=poly, score=(train=0.813, test=0.807), total=   0.0s
[CV] C=0.1, gamma=0.1, kernel=poly ...................................
[CV]  C=0.1, gamma=0.1, kernel=poly, score=(train=0.528, test=0.530), total=   0.0s
[CV] C=0.1, gamma=0.1, kernel=poly ...................................
[CV]  C=0.1, gamma=0.1, kernel=poly, score=(train=0.528, test=0.530), total=   0.0s
[CV] C=0.1, gamma=0.1, kernel=poly ...................................
[CV]  C=0.1, gamma=0.1, kernel=poly, score=(train=0.528, test=0.530), total=  

[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s


[CV]  C=0.1, gamma=0.1, kernel=poly, score=(train=0.529, test=0.527), total=   0.0s
[CV] C=0.1, gamma=0.01, kernel=poly ..................................
[CV]  C=0.1, gamma=0.01, kernel=poly, score=(train=0.528, test=0.530), total=   0.0s
[CV] C=0.1, gamma=0.01, kernel=poly ..................................
[CV]  C=0.1, gamma=0.01, kernel=poly, score=(train=0.528, test=0.530), total=   0.0s
[CV] C=0.1, gamma=0.01, kernel=poly ..................................
[CV]  C=0.1, gamma=0.01, kernel=poly, score=(train=0.528, test=0.530), total=   0.0s
[CV] C=0.1, gamma=0.01, kernel=poly ..................................
[CV]  C=0.1, gamma=0.01, kernel=poly, score=(train=0.529, test=0.527), total=   0.0s
[CV] C=0.1, gamma=0.01, kernel=poly ..................................
[CV]  C=0.1, gamma=0.01, kernel=poly, score=(train=0.529, test=0.527), total=   0.0s
[CV] C=0.1, gamma=0.001, kernel=poly .................................
[CV]  C=0.1, gamma=0.001, kernel=poly, score=(train=0.528, test=0

[Parallel(n_jobs=1)]: Done 125 out of 125 | elapsed:    4.8s finished


Fitting 5 folds for each of 25 candidates, totalling 125 fits
[CV] C=0.1, gamma=1, kernel=rbf ......................................
[CV]  C=0.1, gamma=1, kernel=rbf, score=(train=0.823, test=0.830), total=   0.0s
[CV] C=0.1, gamma=1, kernel=rbf ......................................
[CV]  C=0.1, gamma=1, kernel=rbf, score=(train=0.827, test=0.817), total=   0.0s
[CV] C=0.1, gamma=1, kernel=rbf ......................................
[CV]  C=0.1, gamma=1, kernel=rbf, score=(train=0.832, test=0.803), total=   0.0s
[CV] C=0.1, gamma=1, kernel=rbf ......................................
[CV]  C=0.1, gamma=1, kernel=rbf, score=(train=0.820, test=0.843), total=   0.0s
[CV] C=0.1, gamma=1, kernel=rbf ......................................
[CV]  C=0.1, gamma=1, kernel=rbf, score=(train=0.827, test=0.820), total=   0.0s
[CV] C=0.1, gamma=0.1, kernel=rbf ....................................
[CV]  C=0.1, gamma=0.1, kernel=rbf, score=(train=0.828, test=0.833), total=   0.0s
[CV] C=0.1, gamma=0.1, k

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s


[CV]  C=0.1, gamma=0.1, kernel=rbf, score=(train=0.831, test=0.820), total=   0.0s
[CV] C=0.1, gamma=0.1, kernel=rbf ....................................
[CV]  C=0.1, gamma=0.1, kernel=rbf, score=(train=0.828, test=0.820), total=   0.0s
[CV] C=0.1, gamma=0.1, kernel=rbf ....................................
[CV]  C=0.1, gamma=0.1, kernel=rbf, score=(train=0.825, test=0.837), total=   0.0s
[CV] C=0.1, gamma=0.1, kernel=rbf ....................................
[CV]  C=0.1, gamma=0.1, kernel=rbf, score=(train=0.831, test=0.810), total=   0.0s
[CV] C=0.1, gamma=0.01, kernel=rbf ...................................
[CV]  C=0.1, gamma=0.01, kernel=rbf, score=(train=0.767, test=0.773), total=   0.0s
[CV] C=0.1, gamma=0.01, kernel=rbf ...................................
[CV]  C=0.1, gamma=0.01, kernel=rbf, score=(train=0.772, test=0.790), total=   0.0s
[CV] C=0.1, gamma=0.01, kernel=rbf ...................................
[CV]  C=0.1, gamma=0.01, kernel=rbf, score=(train=0.775, test=0.770), tota

[Parallel(n_jobs=1)]: Done 125 out of 125 | elapsed:    5.8s finished


Fitting 5 folds for each of 5 candidates, totalling 25 fits
[CV] C=0.1, kernel=linear ............................................
[CV]  C=0.1, kernel=linear, score=(train=0.863, test=0.880), total=   0.0s
[CV] C=0.1, kernel=linear ............................................
[CV]  C=0.1, kernel=linear, score=(train=0.875, test=0.840), total=   0.0s
[CV] C=0.1, kernel=linear ............................................
[CV]  C=0.1, kernel=linear, score=(train=0.866, test=0.873), total=   0.0s
[CV] C=0.1, kernel=linear ............................................
[CV]  C=0.1, kernel=linear, score=(train=0.863, test=0.883), total=   0.0s
[CV] C=0.1, kernel=linear ............................................
[CV]  C=0.1, kernel=linear, score=(train=0.870, test=0.857), total=   0.0s
[CV] C=1, kernel=linear ..............................................
[CV]  C=1, kernel=linear, score=(train=0.867, test=0.893), total=   0.0s
[CV] C=1, kernel=linear ..........................................

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s



[CV] C=10, kernel=linear .............................................
[CV]  C=10, kernel=linear, score=(train=0.874, test=0.867), total=   0.0s
[CV] C=100, kernel=linear ............................................
[CV]  C=100, kernel=linear, score=(train=0.866, test=0.893), total=   0.1s
[CV] C=100, kernel=linear ............................................
[CV]  C=100, kernel=linear, score=(train=0.879, test=0.840), total=   0.0s
[CV] C=100, kernel=linear ............................................
[CV]  C=100, kernel=linear, score=(train=0.871, test=0.873), total=   0.1s
[CV] C=100, kernel=linear ............................................
[CV]  C=100, kernel=linear, score=(train=0.867, test=0.887), total=   0.1s
[CV] C=100, kernel=linear ............................................
[CV]  C=100, kernel=linear, score=(train=0.874, test=0.867), total=   0.1s
[CV] C=1000, kernel=linear ...........................................
[CV]  C=1000, kernel=linear, score=(train=0.866, test

[Parallel(n_jobs=1)]: Done  25 out of  25 | elapsed:    1.8s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s


[CV] C=0.1, gamma=1, kernel=poly .....................................
[CV]  C=0.1, gamma=1, kernel=poly, score=(train=0.934, test=0.963), total=   0.0s
[CV] C=0.1, gamma=1, kernel=poly .....................................
[CV]  C=0.1, gamma=1, kernel=poly, score=(train=0.945, test=0.917), total=   0.0s
[CV] C=0.1, gamma=1, kernel=poly .....................................
[CV]  C=0.1, gamma=1, kernel=poly, score=(train=0.941, test=0.937), total=   0.0s
[CV] C=0.1, gamma=1, kernel=poly .....................................
[CV]  C=0.1, gamma=1, kernel=poly, score=(train=0.934, test=0.963), total=   0.0s
[CV] C=0.1, gamma=1, kernel=poly .....................................
[CV]  C=0.1, gamma=1, kernel=poly, score=(train=0.944, test=0.920), total=   0.0s
[CV] C=0.1, gamma=0.1, kernel=poly ...................................
[CV]  C=0.1, gamma=0.1, kernel=poly, score=(train=0.737, test=0.737), total=   0.0s
[CV] C=0.1, gamma=0.1, kernel=poly ...................................
[CV]  C=0

[Parallel(n_jobs=1)]: Done 125 out of 125 | elapsed:    4.9s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s


[CV]  C=0.1, gamma=1, kernel=rbf, score=(train=0.995, test=0.980), total=   0.0s
[CV] C=0.1, gamma=1, kernel=rbf ......................................
[CV]  C=0.1, gamma=1, kernel=rbf, score=(train=0.993, test=0.993), total=   0.0s
[CV] C=0.1, gamma=1, kernel=rbf ......................................
[CV]  C=0.1, gamma=1, kernel=rbf, score=(train=0.993, test=0.993), total=   0.0s
[CV] C=0.1, gamma=1, kernel=rbf ......................................
[CV]  C=0.1, gamma=1, kernel=rbf, score=(train=0.992, test=0.993), total=   0.0s
[CV] C=0.1, gamma=0.1, kernel=rbf ....................................
[CV]  C=0.1, gamma=0.1, kernel=rbf, score=(train=0.852, test=0.870), total=   0.0s
[CV] C=0.1, gamma=0.1, kernel=rbf ....................................
[CV]  C=0.1, gamma=0.1, kernel=rbf, score=(train=0.862, test=0.833), total=   0.0s
[CV] C=0.1, gamma=0.1, kernel=rbf ....................................
[CV]  C=0.1, gamma=0.1, kernel=rbf, score=(train=0.858, test=0.863), total=   0.0s
[

In [None]:
#print(best_params)
#print(scores)

TextWrite Cell: Give your observations and the list of hyperparameter choices and train zero-one error  and test zero-one error for all three kernel choices, for all datasets. 

Observations and Inferences:<br>
-> From this we can see that the rbf kernel performs the best followed by the polynomial and then the linear kernel for any given dataset and it is intuitive to understand why. The rbf kernel, expands the input data to an infinite dimensional feature space which ensures that a better fit can be found.<br>
->Comparing datasets A and B, we can see that there is not much improvement in the rbf and the polynomial kernels when compared to the linear one in dataset A. This implies that the dataset is characteristically like that and no matter how many dimensions we expand it to, it will not get uncluttered. On the other hand, the rbf and polynomial kernels perform magnitudes better than the linear one for dataset B which is because, the dataset becomes linearly separable at a higher order dimension. <br>
->The rbf kernel has a very low error for dataset B. This is because it makes the soft margin SVM problem into a hard margin one by taking a large regularization value.Again, since we are taking infinite dimensions, the dataset becomes linearly separable which paves the way for this to happen.<br>

Zero-One error:

| Dataset | Kernel Type | C| gamma | train error| test error |
| :-: | :-: | :-: | :-: | :-: | :-: |
| A | linear | 10 |  - | 0.177 | 0.208| 
|A | linear | 50 |  - | 0.177 | 0.208|
|  A | poly | 10 |  1 | 0.175 | 0.206 |
 | A | poly | 10 |  2 | 0.176 | 0.202|
 |A | poly | 50 |  1 | 0.176 | 0.202|
   |  A | poly | 50 |  2 | 0.176 | 0.202|
   |A | rbf | 1 |  0.01 | 0.174 | 0.192 |
 |A | rbf | 1 |  0.02 | 0.175 | 0.192|
 |A | rbf | 5 |  0.01 | 0.177 | 0.202|
  | A | rbf | 5 |  0.02 | 0.174 | 0.204|
  |B | linear | 10 |  - | 0.129 | 0.134 |
 |B | linear | 50 |  - | 0.129 | 0.132|
  |B | poly | 0.1 |  1 | 0.061 | 0.082 |
  |B | poly | 0.1 |  2 | 0.062 | 0.084|
  |B | poly | 0.5 |  1 | 0.06 | 0.082|
  | B | poly | 0.5 |  2 | 0.059 | 0.08|
   |B | rbf | 100 |  1 | 0.0 | 0.002 |
  |B | rbf | 100 |  2 | 0.0 | 0.002|
  |B | rbf | 500 |  1 | 0.0 | 0.002|
   |B | rbf | 500 |  2 | 0.0 | 0.002|




In [None]:
#print(scores)

In [None]:
# Codewrite cell: 
# Generate plots of learned classifier for all three kernel types, on dataA and dataB.
# Plots should give both the learned classifier and the train data. 
# Similar to  Bishop Figure 4.5 (with just two classes here.)
# Total number of plots = 3 * 2 = 6
def plotter(index,C_best,Gamma_best):
    DB = {}
    x_train,y_train,x_test,y_test = Load_test_train(Dataset[index])
    fig, axis = plt.subplots(1, 3)
    fig.set_size_inches(30, 9, forward=True)
    axes = {'linear': axis[0], 'poly': axis[1], 'rbf': axis[2]}
    fig.suptitle('Data' + index, fontsize=25)
    '''
    n1, n2 = 300, 300
    x1 = np.linspace(-6.5,10,n1)
    x2 = np.linspace(-7.5,7.5,n2)
    X1, X2 = np.meshgrid(x1,x2)
    x1 = X1.reshape((n1*n2,1))
    x2 = X2.reshape((n1*n2,1))
    X_test = np.concatenate((x1,x2),axis=1)
    '''
    X,Y = np.meshgrid(np.arange(-2.5,2.5,0.1),np.arange(-2.5,2.5,0.1))
    test_instances = np.concatenate( (X.reshape((-1,1)), Y.reshape(-1,1)), axis=1 )
    
    for kernels in ['linear', 'poly', 'rbf']:
        c = C_best[kernels]
        Gamma = Gamma_best[kernels]
        if(kernels == 'linear'):
            SVM = svm.SVC(C = c,kernel = kernels)
        else:
            SVM = svm.SVC(C = c,kernel = kernels, gamma = Gamma)
        
        Classifier     = SVM.fit(x_train,y_train)
        DB_pred = Classifier.decision_function(test_instances)
        DB[kernels] = DB_pred.reshape(X.shape)
        
    
    for classifier_ind, ax in axes.items():
        contour = ax.contourf(X, Y, DB[classifier_ind], colors = ['#C5F9B8','#F8B8B8'])
        proxy = [plt.Rectangle((0,0),1,1,fc = colour.get_facecolor()[0]) for colour in contour.collections]
        contour_legend = ax.legend(proxy, ["Y = +1", "Y = -1"] , loc = "upper right")
        ax.add_artist(contour_legend)
        ax.set_ylabel("X2", fontsize=16)
        ax.set_xlabel("X1", fontsize=16) 
        ax.set_title(f"{classifier_ind} Support Vector Classifier", fontsize=16) 
        x_train_pos = x_train[y_train > 0]
        x_train_neg = x_train[y_train < 0]
        ax.scatter(x_train_pos[:,0], x_train_pos[:,1], c='green', marker = 'x', label='Y_train = 1')
        ax.scatter(x_train_neg[:,0], x_train_neg[:,1], c='red', marker = 'x', label='Y_train = -1')
        ax.legend(loc = "lower left")
        ax.axis('equal')
        
C_best = {'A':{'linear':10,'poly':10,'rbf':1},'B':{'linear':50,'poly':0.5,'rbf':100}}
gamma_best = {'A':{'linear':0,'poly':2,'rbf':0.01},'B':{'linear':0,'poly':2,'rbf':1}}

for index in ['A','B']:
    plotter(index,C_best[index],gamma_best[index])
    plt.show()
    

# SVM in Higher Dimension

Similar as previous part, write code for choosing best hyperparameters for each kernel type. You may use sklearn.svm for this purpose. (use a part of training set as validation set)


Since this data is in Higher dimension, you can not illustrate the learned classifier. 
So for each classification data set, report the best kernel and regularisation parameters for linear, RBF and Poly kernels. Report the training and test zero-one error for those hyperparameters.



In [None]:
## MAKE YOUR DIRECTORY STRUCTURE TO CONTAIN DATA FOLDER IN THE FOLLOWING WAY.

dataC = np.load("Data/data3.npz")
print (dataC.files)

## Training data = arr1, arr2 
## Test data = arr3, arr4


In [None]:
# CodeWrite cell
# Write SVM classifier using SKlearn
# write only functions here


In [None]:
# CodeWrite cell
# Write code here for doing validation (for kernel_param and regularisation_param)
# on a subset of the training set. 
# Also for generating the numbers that you report below.

Dataset = {'A':dataA,'B':dataB,'C':dataC}
#C_params = hyp_array(0.1)
C_params = [0.1,10, 1000]
#gamma_params = hyp_array(1,0.1)
gamma_params = [1,0.01, 0.0001]
kernel_params = ['linear','poly','rbf']
best_params = {}
scores = {}

for index in ['C']:
    best_params[index] = {}
    scores[index] = {}
    x_train,y_train,x_test,y_test = Load_test_train(Dataset[index])
    
    for kernels in kernel_params:
        scores[index][kernels] = {}
        kernel_label = []
        kernel_label.append(kernels)
        if(kernels == 'linear'):
            param_labels = {'C':C_params,
              'kernel':kernel_label }
        else:
            param_labels = {'C':C_params,
                'gamma': gamma_params,
                  'kernel':kernel_label }
        
        grid = GridSearchCV(SVM,param_labels,verbose = 3,return_train_score = True)
        grid.fit(x_train,y_train)
    
        best_params[index][kernels] = grid.best_params_
        if(kernels == 'linear'):
            C_val = best_params[index][kernels]['C']
            gamma_val = 0
        else:
            C_val = best_params[index][kernels]['C']
            gamma_val = best_params[index][kernels]['gamma']
            
            if(kernels == 'linear'):
                SVM_2 = svm.SVC(C =C_val,kernel = kernels)
            else:
                 SVM_2 = svm.SVC(C =C_val,kernel = kernels,gamma = gamma_val)
            SVM_2.fit(x_train,y_train)
            ytrain_pred = SVM_2.predict(x_train)
            ytest_pred  = SVM_2.predict(x_test)
            scores[index][kernels][str(i)][str(j)]['train'] = zero_one_err(y_train,ytrain_pred)
            scores[index][kernels][str(i)][str(j)]['test']  = zero_one_err(y_test,ytest_pred)
        

TextWrite Cell: Give your observations and the list of hyperparameter choices and train zero-one error and test zero-one error for all three kernel choices, for all datasets.