# STEP 4.2: Modeling Support Vector Machine (SVM)

## Description of the methodology
> * Finalize bining of Target Variable
* Create Train and Test datasets
* Rationale about the types of SVM Kernel selected
* Create a SVM pipeline(s)
* Define key parameters
* Run the model(s) on sub-train data set and test accuracy on the validation data set
* Select 2 most accurate models based on the hyper-parameters, run it to get the confusion matrix
* Evaluate impact of cross-validation if we would see some overfitting with the standard train/valid approach
* Select Best SVM model candidate and apply it to a larger train/test dataset

## Import libraries

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import scipy.fftpack as sp
%matplotlib inline
import matplotlib.pyplot as plt

import re
from sklearn.preprocessing import Normalizer
import os
from sklearn.linear_model import SGDRegressor
from sklearn.dummy import DummyRegressor
from sklearn.linear_model import HuberRegressor
from scipy.linalg import lstsq # multiple linear regression
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error as mse
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import LabelEncoder
from sklearn import preprocessing
from scipy.stats import kurtosis
from scipy.stats import skew
from scipy import stats
from sklearn.preprocessing import power_transform
from sklearn.preprocessing import KBinsDiscretizer

from sklearn import datasets, linear_model
import statsmodels.api as sm
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import QuantileTransformer
from sklearn.cluster import KMeans
from sklearn.naive_bayes import ComplementNB, MultinomialNB

from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neighbors import NearestNeighbors
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.dummy import DummyClassifier
from sklearn.neighbors import KNeighborsClassifier
%matplotlib inline
import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import ParameterGrid

from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import SGDClassifier

from sklearn.model_selection import KFold
from sklearn.model_selection import cross_validate
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegressionCV
from sklearn.naive_bayes import MultinomialNB, GaussianNB
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

from sklearn.svm import LinearSVC
from sklearn.svm import SVC

from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_auc_score
from sklearn.metrics import classification_report
from sklearn.metrics import roc_curve
import scikitplot as skplt

import random
from sklearn import ensemble

from sklearn.model_selection import StratifiedShuffleSplit

import warnings
warnings.filterwarnings("ignore")
warnings.simplefilter("ignore")


# Activate Seaborn style
sns.set()

## Load the file for analysis

In [2]:
# Importing the file and creating a dataframe
master_modeling=pd.read_csv("master_modeling.csv",low_memory=False, skipinitialspace=True)#, sep='\t'

In [3]:
# display all columns
pd.set_option('display.max_columns', None)

In [4]:
# remove the Unnamed column
master_modeling.drop('Unnamed: 0', axis=1, inplace=True)
master_modeling.shape

(194484, 351)

In [5]:
# Create a dataframe for the modeling phase (without text and not relevant features)
df_modeling=master_modeling.drop(['Title', 'Post_ID','Snippet'], axis = 1)

In [6]:
df_modeling.shape

(194484, 348)

## Definition of # of classes for the Target Variable 'All_Impact'

> * We will split the variable in 3 classes using Scikit Learn preprocessing function KBinDiscretizer with the following parameters: number of bins 3, encode: ordinal and strategy: quantile
* Oridinal has been selected as we are trying to model a hierarchy between low and high tweet impact
* Quantile implies an even number of data points per class which would shape the model to learn about features for each class equally (avoiding unbalance classes)
* We may reconsider some of the value of the parameters depending on the modeling results

In [7]:
ai_bin=master_modeling[['ALL_Impact']]

In [8]:
# Process binizer
est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='quantile')
est.fit(ai_bin)
new_ai = est.transform(ai_bin)

In [9]:
# Call the edge of the different 3 bins
est.bin_edges_[0]

array([ 0., 30., 41., 80.])

In [10]:
new_ai_df=pd.DataFrame(new_ai)

In [11]:
new_ai_df.shape

(194484, 1)

In [12]:
df_modeling['All_impact bin']=new_ai_df

In [13]:
df_modeling['All_impact bin'].value_counts()

2.0    69658
1.0    63049
0.0    61777
Name: All_impact bin, dtype: int64

In [14]:
df_modeling['All_impact bin'].head()

0    0.0
1    0.0
2    0.0
3    2.0
4    2.0
Name: All_impact bin, dtype: float64

In [15]:
# Remove the original All_Impact feature
df_modeling2=df_modeling.drop(['ALL_Impact','TW_Hashtags','ALL_Author','TW_Account_Name'], axis=1)

In [16]:
# Transform new All Impact feature type into int64
df_modeling2['All_impact bin']=df_modeling2['All_impact bin'].astype(np.int64)

In [17]:
df_modeling2.shape

(194484, 345)

## Create a train, validation and test datasets (from the main Train set of data)
> * I am facing a lack of computing resources (laptop with i7 Intel chip and 16 Go RAM, no GPU) which implies a very long time for training models, especially with the tuning of hyper-parameters. As a consequence, I have combined my computing resources with Google Colaboratory in order to tune several parameters in parallel.
* **The overall dataset is divided in 3 buckets:**
* Bucket 1 (train/test): split for training the Best Selected model (in case of more important computing resources)
* Bucket 2 (train1/valid1): split for training the Best model candidate of a given class (no cross-validation)
* Bucket 3 (train2/valid2): split for hyper-parameter tuning leading to select the Best model candidate (cross-validation maybe considered in some cases)
* We could limit the risk of overfitting by using a cross-validation approach. However, we may run the risk of very demanding computing resources as we will combine hyper-parameter optimization (GridSearch) and large dataset (194484 rows x 344 variables).
* A compromised approach would be to use the standard train/test dataset split and leverage cross-validation for the validation phase in the process for selecting the best model.

### Create X and y arrays

In [18]:
# Create an array from df_modeling2 excluding the target variable All impact bin
X=df_modeling2.drop(['All_impact bin'], axis=1)
# X=np.array(X)
X.shape

(194484, 344)

In [19]:
# Create y array for the target variable All impact bin
y=df_modeling2['All_impact bin']
# y=np.array(y)
y.shape

(194484,)

In [20]:
# Convert the type of the input matrix to float
X = X.astype(np.float)

# Create train set
X_tr_main, X_test, y_tr_main, y_test = train_test_split(X,
    y,
    test_size=0.2, random_state=0)

# Create validation and test sets for best model selected for a given class
X_tr_2nd, X_valid1, y_tr_2nd, y_valid1 = train_test_split(
    X_tr_main, y_tr_main, test_size=20000, train_size = 60000, random_state=0)

# Create validation and test sets for hyper-parameter tuning and selection of the best model candidate
X_tr_3rd, X_valid2, y_tr_3rd, y_valid2 = train_test_split(
    X_tr_2nd, y_tr_2nd, test_size=1500, train_size = 5000, random_state=0)

print('Train:', X_tr_main.shape, y_tr_main.shape)
print('Test:', X_test.shape, y_test.shape)
print('Train1:', X_tr_2nd.shape, y_tr_2nd.shape)
print('Valid1:', X_valid1.shape, y_valid1.shape)
print('Train2:', X_tr_3rd.shape, y_tr_3rd.shape)
print('Valid2:', X_valid2.shape, y_valid2.shape)

Train: (155587, 344) (155587,)
Test: (38897, 344) (38897,)
Train1: (60000, 344) (60000,)
Valid1: (20000, 344) (20000,)
Train2: (5000, 344) (5000,)
Valid2: (1500, 344) (1500,)


In [21]:
pd.value_counts(y_valid2, normalize=True)

2    0.368667
1    0.318000
0    0.313333
Name: All_impact bin, dtype: float64

### Create a single file with X_tr_2nd and y_tr_2nd Train dataframe (features and target variables) which will be used for the final evaluation on the Master Test dataset

In [22]:
X_tr_2nd.shape

(60000, 344)

In [23]:
y_tr_2nd=pd.DataFrame(y_tr_2nd)

In [24]:
y_tr_2nd.head()

Unnamed: 0,All_impact bin
19131,2
40331,0
31775,1
167278,0
93271,1


In [25]:
# Merge X and y variable in one dataframe
svm_train_60k=pd.merge(X_tr_2nd,y_tr_2nd, right_index=True, left_index=True)

In [26]:
# svm_train_60k.to_csv('svm_train_60k.csv')

## Rationale about the types of SVM Kernel selected
> *  We will perform first a SVC with linear kernel tuning C which will set a baseline. We will move to SVC with RBF kernel optimizing C and Gamma.
* This approach will hepl to understand the pertinence of a linear approach vs non-linear, especially as we are dealing with transformed text data

### **Create a SVM with Linear Kernel pipeline**

In [27]:
# Create SVM pipeline
pipe1_svm = Pipeline([
    ('scaler', StandardScaler()), # with standardization StandardScaler()
    ('PCA', PCA(n_components=200)), # 200 components to explain 95% of the variance (see first part of this notebook)
    ('svc_linear', SVC(kernel='linear', random_state=0))  
])

In [28]:
# Get parameters
# pipe1_svm.get_params()

### Define the grid of parameters

In [29]:
# Grid of parameters
grid1_svm = ParameterGrid({
    'PCA__n_components':[200], # nb of components explaining 95% of the variance
    'svc_linear__C':[0.01, 0.01, 0.1], # range of C defining the model complexity # [0.001,0.01,0.1]
    'svc_linear__decision_function_shape':['ovo', 'ovr'] # testing 2 approaches as we have rather balanced # data points per class
})

# Print the number of combinations
print('Number of combinations:', len(grid1_svm))

Number of combinations: 6


### Run the model on on sub-train data set (5 000 tweets) and test accuracy on the validation data set (1 500 tweets)

In [30]:
#  Save accuracy on train and validation sets
train_scores=[]
valid_scores = []

# Enumerate combinations starting from 1
for i, params_dict in enumerate(grid1_svm, 1):
    # Print progress
    print('Combination {}/{}'.format(
        i, len(grid1_svm) # Total number of combinations
    ))
    
    # Set parameters
    pipe1_svm.set_params(**params_dict)

    # Fit SVM
    pipe1_svm.fit(X_tr_3rd, y_tr_3rd)

    # Save accuracy on validation set   
    params_dict['accuracy_train']= pipe1_svm.score(X_tr_3rd, y_tr_3rd)
    params_dict['accuracy_valid'] = pipe1_svm.score(X_valid2, y_valid2)
    
    # Save result
    train_scores.append(params_dict)
    valid_scores.append(params_dict)

print('done')

Combination 1/6
Combination 2/6
Combination 3/6
Combination 4/6
Combination 5/6
Combination 6/6
done


In [31]:
# Create DataFrame with test scores
scores_df = pd.DataFrame(valid_scores)
# Print scores
scores_df.sort_values(by='accuracy_valid', ascending=False)

Unnamed: 0,PCA__n_components,svc_linear__C,svc_linear__decision_function_shape,accuracy_train,accuracy_valid
5,200,0.1,ovr,0.8978,0.876667
4,200,0.1,ovo,0.9024,0.868667
2,200,0.01,ovo,0.8708,0.856667
1,200,0.01,ovr,0.876,0.854667
3,200,0.01,ovr,0.8796,0.854667
0,200,0.01,ovo,0.8746,0.852


### Evaluation confusion matrix on Top 2 models (based on accuracy): SVM with Linear Kernel

### 1st model (Most accurate model #4 hyper-parameters; train dataset: 5 000, valid dataset: 1 500)

In [32]:
# Create SVM pipeline
pipe_svm1 = Pipeline([
    ('scaler', StandardScaler()), # with standardization StandardScaler()
    ('PCA', PCA(n_components=200)), # 200 components to explain 95% of the variance (see first part of this notebook)
    ('SVM', SVC(kernel='linear', C = 0.1, decision_function_shape = 'ovo', random_state=0))  
])

In [33]:
# Fit SVM
model_svm1=pipe_svm1.fit(X_tr_3rd, y_tr_3rd)

In [34]:
# Make prediction on X_valid dataset
y_pred_svm1=pipe_svm1.predict(X_valid2)

In [35]:
# Confusions report
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_valid2, y_pred_svm1, target_names=target_names))

              precision    recall  f1-score   support

     class 0       0.89      0.91      0.90       470
     class 1       0.81      0.79      0.80       477
     class 2       0.90      0.91      0.90       553

    accuracy                           0.87      1500
   macro avg       0.87      0.87      0.87      1500
weighted avg       0.87      0.87      0.87      1500



### 2nd model same as Model 1 (Most accurate model #4; train dataset: 60 000, valid dataset: 20 000)

In [36]:
# Create SVM pipeline
pipe_svm1 = Pipeline([
    ('scaler', StandardScaler()), # with standardization StandardScaler()
    ('PCA', PCA(n_components=200)), # 200 components to explain 95% of the variance (see first part of this notebook)
    ('SVM', SVC(kernel='linear', C = 0.1, decision_function_shape = 'ovo', random_state=0))  
])

In [37]:
# Fit SVM
model_svm1=pipe_svm1.fit(X_tr_2nd, y_tr_2nd)

In [38]:
# Make prediction on X_valid dataset
y_pred_svm1=pipe_svm1.predict(X_valid1)

In [39]:
# Confusions report
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_valid1, y_pred_svm1, target_names=target_names))

              precision    recall  f1-score   support

     class 0       0.95      0.96      0.96      6349
     class 1       0.88      0.87      0.87      6512
     class 2       0.92      0.92      0.92      7139

    accuracy                           0.92     20000
   macro avg       0.92      0.92      0.92     20000
weighted avg       0.92      0.92      0.92     20000



### Save SVM Liner Kernel model for further visualization and selection

In [40]:
svm_Linear_acc=0.90
c1_svm_Linear_f1 = 0.93
c2_svm_Linear_f1 = 0.85
c3_svm_Linear_f1 = 0.93

%store svm_Linear_acc
%store c1_svm_Linear_f1
%store c2_svm_Linear_f1
%store c3_svm_Linear_f1

Stored 'svm_Linear_acc' (float)
Stored 'c1_svm_Linear_f1' (float)
Stored 'c2_svm_Linear_f1' (float)
Stored 'c3_svm_Linear_f1' (float)


### Conclusions
> * The Best model candidate for SVM with Linear Kernel classifier is the **model 4**: SVM Linear Kernel, C = 0.1, decision function: ovo
* There was a larger difference between the performances of the model trained on 5 000 and then on 60 000 tweets (respectively 0.86 vs 0.91) at the begii/nning of the modeling phse. This gap has been narrowing due to multiple iterations in order to tune the model. The accuracy is now the same and there is a slight improvement on the f1 scores

### **Create a SVM with RBF Kernel pipeline**

In [41]:
# Create SVM pipeline
pipe2_svm_rbf = Pipeline([
    ('scaler', StandardScaler()), # with standardization StandardScaler()
    ('PCA', PCA(n_components=200)), # 200 components to explain 95% of the variance
    ('SVM_RBF', SVC(kernel='rbf', random_state=0))  
])

In [42]:
# Get parameters
pipe2_svm_rbf.get_params()

{'memory': None,
 'steps': [('scaler',
   StandardScaler(copy=True, with_mean=True, with_std=True)),
  ('PCA',
   PCA(copy=True, iterated_power='auto', n_components=200, random_state=None,
       svd_solver='auto', tol=0.0, whiten=False)),
  ('SVM_RBF', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
       decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
       kernel='rbf', max_iter=-1, probability=False, random_state=0,
       shrinking=True, tol=0.001, verbose=False))],
 'verbose': False,
 'scaler': StandardScaler(copy=True, with_mean=True, with_std=True),
 'PCA': PCA(copy=True, iterated_power='auto', n_components=200, random_state=None,
     svd_solver='auto', tol=0.0, whiten=False),
 'SVM_RBF': SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
     decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
     kernel='rbf', max_iter=-1, probability=False, random_state=0,
     shrinking=True, tol=0.001, verbose=False),
 'scaler__copy': True

### Define the grid of parameters

In [43]:
# Grid of parameters
grid2_svm_rbf = ParameterGrid({
    'PCA__n_components':[200], # nb of components explaining 95% of the variance
    'SVM_RBF__C':[10, 50], # range of C defining the model complexity
    'SVM_RBF__gamma':[0.001, 0.0001,0.00001], # try lower than 0.001
    'SVM_RBF__decision_function_shape':['ovr', 'ovo'], # OnevsRest (ovr)
})

# Print the number of combinations
print('Number of combinations:', len(grid2_svm_rbf))

Number of combinations: 12


### Run the model on on sub-train data set (5 000 tweets) and test accuracy on the validation data set (1 500 tweets)

In [44]:
#  Save accuracy on train and validation sets
train_scores=[]
valid_scores = []

# Enumerate combinations starting from 1
for i, params_dict in enumerate(grid2_svm_rbf, 1):
    # Print progress
    print('Combination {}/{}'.format(
        i, len(grid2_svm_rbf) # Total number of combinations
    ))
    
    # Set parameters
    pipe2_svm_rbf.set_params(**params_dict)

    # Fit a Decision Tree classifier
    pipe2_svm_rbf.fit(X_tr_3rd, y_tr_3rd)

    # Save accuracy on validation set   
    params_dict['accuracy_train']= pipe2_svm_rbf.score(X_tr_3rd, y_tr_3rd)
    params_dict['accuracy_valid'] = pipe2_svm_rbf.score(X_valid2, y_valid2)
    
    # Save result
    train_scores.append(params_dict)
    valid_scores.append(params_dict)

print('done')

Combination 1/12
Combination 2/12
Combination 3/12
Combination 4/12
Combination 5/12
Combination 6/12
Combination 7/12
Combination 8/12
Combination 9/12
Combination 10/12
Combination 11/12
Combination 12/12
done


In [45]:
# Create DataFrame with test scores
scores_df = pd.DataFrame(valid_scores)
# Print scores
scores_df.sort_values(by='accuracy_valid', ascending=False)

Unnamed: 0,PCA__n_components,SVM_RBF__C,SVM_RBF__decision_function_shape,SVM_RBF__gamma,accuracy_train,accuracy_valid
10,200,50,ovo,0.0001,0.8882,0.854
7,200,50,ovr,0.0001,0.889,0.853333
0,200,10,ovr,0.001,0.961,0.842
3,200,10,ovo,0.001,0.9576,0.837333
4,200,10,ovo,0.0001,0.8518,0.835333
1,200,10,ovr,0.0001,0.8478,0.834
9,200,50,ovo,0.001,0.9934,0.828667
8,200,50,ovr,1e-05,0.829,0.827333
11,200,50,ovo,1e-05,0.832,0.822667
6,200,50,ovr,0.001,0.9928,0.821333


### Evaluation of confusion matrix on most accurate model: SVM with RBF Kernel

In [46]:
# Create SVM pipeline
pipe_svm_rbf1 = Pipeline([
    ('scaler', StandardScaler()), # with standardization StandardScaler()
    ('PCA', PCA(n_components=200)), # 200 components to explain 95% of the variance
    ('SVM_RBF', SVC(kernel='rbf', C=50, decision_function_shape = 'ovo', gamma = 0.0001, random_state=0))  
])

In [47]:
# Fit SVM
model_svm_rbf1=pipe_svm_rbf1.fit(X_tr_3rd, y_tr_3rd)

In [48]:
# Make prediction on X_valid dataset
y_pred_svm_rbf1=pipe_svm_rbf1.predict(X_valid2)

In [49]:
# Confusions report
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_valid2, y_pred_svm_rbf1, target_names=target_names))

              precision    recall  f1-score   support

     class 0       0.90      0.89      0.90       470
     class 1       0.79      0.77      0.78       477
     class 2       0.88      0.91      0.89       553

    accuracy                           0.86      1500
   macro avg       0.86      0.86      0.86      1500
weighted avg       0.86      0.86      0.86      1500



### Save SVM RBF Kernel model for further visualization and selection

In [50]:
svm_rbf_acc=0.85
c1_svm_rbf_f1 = 0.89
c2_svm_rbf_f1 = 0.77
c3_svm_rbf_f1 = 0.88

%store svm_rbf_acc
%store c1_svm_rbf_f1
%store c2_svm_rbf_f1
%store c3_svm_rbf_f1

Stored 'svm_rbf_acc' (float)
Stored 'c1_svm_rbf_f1' (float)
Stored 'c2_svm_rbf_f1' (float)
Stored 'c3_svm_rbf_f1' (float)


## Define a cross-validation object with a grid of parameters for the SVC with RBF kernel
> * Even if it does not seem that the SVM Model with RBF Kernel overfits train dataset, we could evaluate the performance of models using cross-validation with 5 folds (however, it will be more demanding in terms of computaional resources)

In [51]:
# Create cross-validation object
gridCV_svm_rbf = GridSearchCV(pipe2_svm_rbf, [{
    'PCA__n_components':[200], # nb of components explaining 95% of the variance (previous run with 61 components; 80% explained has been tested)
    'SVM_RBF__C':[10, 50], # range of C defining the model complexity (tested but not good: 1, 0.1)
    'SVM_RBF__gamma':[0.001, 0.0001,0.00001],
    'SVM_RBF__decision_function_shape':['ovr', 'ovo'], # OnevsOne (ovo), OnevsRest (ovr)
}],cv=5)

# Fit estimator
gridCV_svm_rbf.fit(X_tr_3rd, y_tr_3rd)

GridSearchCV(cv=5, error_score='raise-deprecating',
             estimator=Pipeline(memory=None,
                                steps=[('scaler',
                                        StandardScaler(copy=True,
                                                       with_mean=True,
                                                       with_std=True)),
                                       ('PCA',
                                        PCA(copy=True, iterated_power='auto',
                                            n_components=200, random_state=None,
                                            svd_solver='auto', tol=0.0,
                                            whiten=False)),
                                       ('SVM_RBF',
                                        SVC(C=50, cache_size=200,
                                            class_weight=None, coef0=0.0,
                                            decision_funct...
                                            max_iter=

In [52]:
# Get the results with "cv_results_"
gridCV_svm_rbf.cv_results_.keys()

dict_keys(['mean_fit_time', 'std_fit_time', 'mean_score_time', 'std_score_time', 'param_PCA__n_components', 'param_SVM_RBF__C', 'param_SVM_RBF__decision_function_shape', 'param_SVM_RBF__gamma', 'params', 'split0_test_score', 'split1_test_score', 'split2_test_score', 'split3_test_score', 'split4_test_score', 'mean_test_score', 'std_test_score', 'rank_test_score'])

In [53]:
# Collect results in a DataFrame
df_svc_linear = pd.DataFrame.from_items([
    ('PCA', gridCV_svm_rbf.cv_results_['param_PCA__n_components']),
    ('C', gridCV_svm_rbf.cv_results_['param_SVM_RBF__C']),
    ('Gamma', gridCV_svm_rbf.cv_results_['param_SVM_RBF__gamma']),
    ('Decision', gridCV_svm_rbf.cv_results_['param_SVM_RBF__decision_function_shape']),
    ('mean_te', gridCV_svm_rbf.cv_results_['mean_test_score']),
    ('std_te_score', gridCV_svm_rbf.cv_results_['std_test_score']),
    
])
df_svc_linear.sort_values(by='mean_te', ascending=False)

Unnamed: 0,PCA,C,Gamma,Decision,mean_te,std_te_score
10,200,50,0.0001,ovo,0.826,0.018414
7,200,50,0.0001,ovr,0.8228,0.010739
3,200,10,0.001,ovo,0.816,0.016507
0,200,10,0.001,ovr,0.8148,0.019759
9,200,50,0.001,ovo,0.8068,0.011877
6,200,50,0.001,ovr,0.803,0.019967
1,200,10,0.0001,ovr,0.7996,0.013153
4,200,10,0.0001,ovo,0.7964,0.017941
11,200,50,1e-05,ovo,0.7874,0.016314
8,200,50,1e-05,ovr,0.7862,0.017551


##### Conclusions on SVM RBF with cross-validation
> * It is not really conclusive as the 2nd best model (with an equivalent set of hyper-parameter tuning) is performing less (accuracy on test 0.8262) than the same one with the standard train/valid dataset (acc. 0.85); original evaluation prior to multiple iterations for tuning the original model.
* As a consequence, we will select the standard approach which provides slightly better results and being less computing resource demanding.