#Motivation:

In the larger scope of ML algorithms, we've now determinted that the most appropriate model for our situation is the SVM. 

However, these models have many intricacies  that vastly change their preformance and usabilty, thus, we will further our model selection to pick the most appropriate parameters for our model.

For this, we will take a "Darwinist" approach. By testing all these parameteres seperatly against a "baseline" SVM model (as defined by sklearn), and picking the best preforming one, we should be able to hypothetically create the "best" model for out dataset. However, there are some rammifications of not testing certain parameters together, which we will detail and talk about.

The SVM parameters we are evaulation are as such:

*  The type of kernel
*  The degree for polynomial kernels
*  The gamma kernel coefficent
* The shrinking heuristic
* The strength of the regularization parameter


However, it is also worth nothing that we won't experiment with some parameters. For example, we will be using the same tolerance for stoping criterion (1e-3) as we want to keep the bound on the relative error of our models comparable. 

In [78]:

# function used to evaluate different SVM models
def evalModel(model, model_name, verbose = False):
  # every model is evaluated against the same test dataset

  # use the model passed as a parameter to make predictions, which we will use to judge the model 
  predicted = model.predict(X_test)

  # metrics used here are: Accuracy, Recall, Precision, ROC/AUC and F1.
  # these are the industry standard and provide a proper, unbiased benchmark for models.
  accuracy_score = metrics.accuracy_score(y_test, predicted)
  recall_score = metrics.recall_score(y_test, predicted)
  precision_score = metrics.precision_score(y_test, predicted)
  roc_auc_score = metrics.roc_auc_score(y_test, predicted)
  f1_score = metrics.f1_score(y_test, predicted)

  if(verbose):
    print("Metrics for model name: " + model_name)
    print("Accuracy score: " + accuracy_score.astype(str))
    print("Recall score: " + recall_score.astype(str))
    print("Precision_score: " + precision_score.astype(str))
    print("ROC/AUC score: " + roc_auc_score.astype(str))
    print("F1 score: " + f1_score.astype(str))
    print("\n")
    
  # return data for sorting later
  return [model_name,accuracy_score,recall_score,precision_score,roc_auc_score,f1_score]




### Manual Feature

In [None]:
X_train = 

In [5]:
# here we have our "baseline" SVM, as defined by sklearn. 
# let's get its preformance to compare it to our other parameters:

baseline_SVM = svm.SVC()
baseline_SVM.fit(X_train, y_train)
evalModel(baseline_SVM,"Baseline SVM", True);

Metrics for model name: Baseline SVM
Accuracy score: 0.8301666666666667
Recall score: 0.3333333333333333
Precision_score: 0.7068676716917923
ROC/AUC score: 0.6481833544571186
F1 score: 0.45303274288781537




#Comparing the different kernels

In [6]:
# our baseline SVM has a radial basis function kernel, so lets test the other kernel used in the algorithm

# our data isn't linearly separable, so using a linear kernel isn't feasable and therefore not worth testing 

# SVM with a polynomial kernel (default degree = 3 )
poly_SVM = svm.SVC(kernel="poly")
poly_SVM.fit(X_train, y_train)
evalModel(poly_SVM,"Polynomial SVM", True)


# SVM with a sigmoid kernel 
sig_SVM = svm.SVC(kernel="sigmoid")
sig_SVM.fit(X_train, y_train)
evalModel(sig_SVM,"Sigmoid SVM", True);

Metrics for model name: Polynomial SVM
Accuracy score: 0.8176666666666667
Recall score: 0.2330173775671406
Precision_score: 0.7057416267942583
ROC/AUC score: 0.6035175607734309
F1 score: 0.3503562945368171


Metrics for model name: Sigmoid SVM
Accuracy score: 0.695
Recall score: 0.3175355450236967
Precision_score: 0.29385964912280704
ROC/AUC score: 0.5567398891151437
F1 score: 0.30523917995444194




Observation: 

The polynomial kernel SVM seems to prefrom better than the baseline radial basis function one. 

Whereas the Sigmoid SVM seems to preform worst in every metric compared to the 2 others. 

#Degrees for polynomial kernel
The polynomial kernel seemed to preform well, so let's see if we can increase further the preformance of the model by fine tuning the degree of the polynomial kernel. 

In [7]:
# our original testing with a polynomial kernel had degree 3
# choosing and testing this is crucial. as too small a kernel will lead to underfitting and too big a kernel will lead to overfitting

# here, we will test different degrees of  polynomial kerenels, from 2 to 4. 
# this range has been selected as 2 is the minimum for a polynomial funciton, and with a degree above 4, not only do training times get too long, but we risk overfitting
# too long to train above 4

# store the metrics of our test for sorting and concluding
history_result_polynomial_kernel = []

# loop to test our polynomial kernel with different degrees
for k in range(2, 5): 

  poly_SVM = svm.SVC(kernel="poly", degree = k)
  poly_SVM.fit(X_train, y_train)
  history_result_polynomial_kernel.append(evalModel(poly_SVM,"Polynomial SVM with degree:" + str(k)))


# helper function to sort our results by accuracy 
def Sort(array): 
    # sorts a 2D array using the 2nd element (our accuracy) in descending order
    array.sort(key = lambda x: x[1],  reverse=True) 
    return array 

# sort our results
sorted_results = Sort(history_result_polynomial_kernel)

# print our degrees in sorted order
print("Sorted order of polynomial SVMs by accuracy")
for result in sorted_results:
  print(result[0])
  print("Accuracy score: " + result[1].astype(str))
  print("Recall score: " + result[2].astype(str))
  print("Precision_score: " + result[3].astype(str))
  print("ROC/AUC score: " + result[4].astype(str))
  print("F1 score: " + result[5].astype(str))

  print("\n")



Sorted order of polynomial SVMs by accuracy
Polynomial SVM with degree:3
Accuracy score: 0.8176666666666667
Recall score: 0.2330173775671406
Precision_score: 0.7057416267942583
ROC/AUC score: 0.6035175607734309
F1 score: 0.3503562945368171


Polynomial SVM with degree:4
Accuracy score: 0.8105
Recall score: 0.2037914691943128
Precision_score: 0.6666666666666666
ROC/AUC score: 0.5882708930255467
F1 score: 0.31215970961887474


Polynomial SVM with degree:2
Accuracy score: 0.7955
Recall score: 0.04265402843601896
Precision_score: 0.782608695652174
ROC/AUC score: 0.5197427303143339
F1 score: 0.08089887640449439




Observation: 

Increasing the degree of the polynomial kernel seems to ameliorate the metrics of this type of model

However, we know that increasing the degree of the polynomial kernel makes our model more prone to overfitting, and thus should be considered if we were to move forwards with this model. 

In [8]:
# the default kernel coefficent(gamma) for our SVM is "scale"(1 / (n_features * X.var())
# here, we are testing "auto", which uses 1 / n_features

# the kernel coeffiagamma kernel coefficeint for rbf, poly and sigmoid
# first testing has gamma  =  scale, here we test for auto


# baseline SVM
auto_baseline_SVM = svm.SVC(gamma = "auto")
auto_baseline_SVM.fit(X_train, y_train)
evalModel(auto_baseline_SVM,"Baseline SVM with auto gamma", True);

# "auto" gamma and a polynomial kernel is nearly impossible to train 
# auto_poly_SVM = svm.SVC(kernel="poly",gamma = "auto")
# auto_poly_SVM.fit(X_train, y_train)
# evalModel(auto_poly_SVM,"poly SVM  with auto gamma ", True)

# sigmoid kernel SVM
auto_sig_SVM = svm.SVC(kernel="sigmoid", gamma = "auto")
auto_sig_SVM.fit(X_train, y_train)
evalModel(auto_sig_SVM,"sigmoid SVM with auto gamma", True);


Metrics for model name: Baseline SVM with auto gamma
Accuracy score: 0.8301666666666667
Recall score: 0.3333333333333333
Precision_score: 0.7068676716917923
ROC/AUC score: 0.6481833544571186
F1 score: 0.45303274288781537


Metrics for model name: sigmoid SVM with auto gamma
Accuracy score: 0.695
Recall score: 0.3175355450236967
Precision_score: 0.29385964912280704
ROC/AUC score: 0.5567398891151437
F1 score: 0.30523917995444194




Obersvation:

The results with the radial basis function are incredible in every metric. 

We could further hypothesise that this method would work even better with a polynomial kernel of degree 4, however the training times are too long to consider this. 

On the other hand, this method seems to preform very poorly with a sigmoid kernel.

#Shrinking  parameter
This parameter is used to shorten the training time by solve the optimization problem a bit more loosely. 

We are testing it to see if it impacts the preformance of our models significantly

In [9]:
# test shrinking parameter, default = true

# baseline rbf kernel SVM
non_shrink_baseline_SVM = svm.SVC(shrinking = False)
non_shrink_baseline_SVM.fit(X_train, y_train)
evalModel(non_shrink_baseline_SVM,"Baseline SVM", True);

# polynomial kernel SVM
non_shrink_poly_SVM = svm.SVC(kernel="poly",shrinking = False)
non_shrink_poly_SVM.fit(X_train, y_train)
evalModel(non_shrink_poly_SVM,"poly SVM", True)

# sigmoid kernel SVM
non_shrink_sig_SVM = svm.SVC(kernel="sigmoid",shrinking = False)
non_shrink_sig_SVM.fit(X_train, y_train)
evalModel(non_shrink_sig_SVM,"sigmoid SVM", True);

Metrics for model name: Baseline SVM
Accuracy score: 0.8301666666666667
Recall score: 0.3333333333333333
Precision_score: 0.7068676716917923
ROC/AUC score: 0.6481833544571186
F1 score: 0.45303274288781537


Metrics for model name: poly SVM
Accuracy score: 0.8175
Recall score: 0.2330173775671406
Precision_score: 0.7040572792362768
ROC/AUC score: 0.6034119418465193
F1 score: 0.3501483679525223


Metrics for model name: sigmoid SVM
Accuracy score: 0.695
Recall score: 0.3175355450236967
Precision_score: 0.29385964912280704
ROC/AUC score: 0.5567398891151437
F1 score: 0.30523917995444194




Observation:

The shrinking  parameter dosen't change the preformance of models significantly. 

As this parameter has an impact preformance, we will therefore keep it as default(True) for our final model.  

#Regularization parameter
The regularization parameter is very important to avoid overfitting the model to our dataset.  The strength of the regularization is inversely proportional to C, and must be strictly positive. The penalty is a squared l2 penalty.

We will test different values for this parameter and see its impact on the model preformance. 

In [10]:
# all our previous models had as default the regularization parameter C =1. 
# as our C must be stricly positive, we'll test our baseline model with different values (from 2 to 8), and we should expect the accruacy to stop increasing after a point.

# store the metrics of our test for sorting and concluding
history_result_reg_param = []

# loop to test our baseline with 
for n in range(1, 8): 

  SVM = svm.SVC(C = n)
  SVM.fit(X_train, y_train)
  history_result_reg_param.append(evalModel(SVM,"baseline SVM with regularization parameter C = " + str(n)))


# sort our results
sorted_results_reg = Sort(history_result_reg_param)

# print our degrees in sorted order
print("Sorted order of baseline SVMs by accuracy")
for result in sorted_results_reg:
  print(result[0])
  print("Accuracy score: " + result[1].astype(str))
  print("Recall score: " + result[2].astype(str))
  print("Precision_score: " + result[3].astype(str))
  print("ROC/AUC score: " + result[4].astype(str))
  print("F1 score: " + result[5].astype(str))

  print("\n")



Sorted order of baseline SVMs by accuracy
baseline SVM with regularization parameter C = 4
Accuracy score: 0.8315
Recall score: 0.33649289099526064
Precision_score: 0.7135678391959799
ROC/AUC score: 0.6501856089957292
F1 score: 0.4573268921095008


baseline SVM with regularization parameter C = 3
Accuracy score: 0.8308333333333333
Recall score: 0.334913112164297
Precision_score: 0.7102177554438861
ROC/AUC score: 0.649184481726424
F1 score: 0.4551798174986581


baseline SVM with regularization parameter C = 5
Accuracy score: 0.8306666666666667
Recall score: 0.3341232227488152
Precision_score: 0.709731543624161
ROC/AUC score: 0.648789537018683
F1 score: 0.4543501611170784


baseline SVM with regularization parameter C = 1
Accuracy score: 0.8301666666666667
Recall score: 0.3333333333333333
Precision_score: 0.7068676716917923
ROC/AUC score: 0.6481833544571186
F1 score: 0.45303274288781537


baseline SVM with regularization parameter C = 2
Accuracy score: 0.83
Recall score: 0.33254344391785

Observation: 

As expected with a stronger regularization parameter, the metrics for the model are improved with a stronger regularization term, but only to a point where increasing it further  decreases the quality of the model, thus telling us that it has become overbearing. 




#Conclusion

Let's take our parameters and evaluate them in the context of the broader SVM model:

*  The type of kernel: 
  * Best preformance by polynomial
  * Decent by baseline
  * Below average for sigmoid 

*  The degree for polynomial kernels:
 * Best preformance by polynomial
  * Decent by baseline
  * Below average for sigmoid 

*  The gamma kernel coefficent
 * Best preformance by polynomial
  * Decent by baseline
  * Below average for sigmoid 
  
* The shrinking heuristic
 * Best preformance by polynomial
  * Decent by baseline
  * Below average for sigmoid 

* The strength of the regularization 
 * Best preformance by polynomial
  * Decent by baseline
  * Below average for sigmoid parameter


In [11]:
# baseline SVM
auto_baseline_SVM = svm.SVC(gamma = "auto")
auto_baseline_SVM.fit(X_train, y_train)
evalModel(auto_baseline_SVM,"Baseline SVM with auto gamma", True);

# baseline SVM
auto_baseline_SVM = svm.SVC(gamma = "auto", C=4)
auto_baseline_SVM.fit(X_train, y_train)
evalModel(auto_baseline_SVM,"Baseline SVM with auto gamma and reg param 4", True);




Metrics for model name: Baseline SVM with auto gamma
Accuracy score: 0.8301666666666667
Recall score: 0.3333333333333333
Precision_score: 0.7068676716917923
ROC/AUC score: 0.6481833544571186
F1 score: 0.45303274288781537


Metrics for model name: Baseline SVM with auto gamma and reg param 4
Accuracy score: 0.8315
Recall score: 0.33649289099526064
Precision_score: 0.7135678391959799
ROC/AUC score: 0.6501856089957292
F1 score: 0.4573268921095008


