# Analysis Goal :

To perform hyper-parameter tuning using HyperOpt for SVM model

## Hyperparameter Tuning with hyperopt

Hyperparameter tuning is an important step for maximizing the performance of a model. Hyperparameters are certain values/weights that determine the learning process of an algorithm. Several Python packages have been developed specifically for this purpose. Scikit-learn provides a few options, GridSearchCV and RandomizedSearchCV being two of the more popular options. Outside of scikit-learn, the Optunity, Spearmint and hyperopt packages are all designed for optimization. In this task, we will focus on the hyperopt package which provides algorithms that are able to outperform randomized search.


In [12]:
import numpy as np
import pandas as pd
from sklearn import svm
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.metrics import confusion_matrix, accuracy_score, f1_score
from hyperopt import hp, fmin, tpe, Trials, STATUS_OK
from hyperopt import space_eval

In [15]:
class Model():

  def __init__(self,train_path, test_path, label='Class'):
    train = pd.read_csv(train_path)
    test = pd.read_csv(test_path)
    # remove :1000 during production
    self.X_train = train.iloc[:1000,:-1]
    self.X_test = test.iloc[:1000,:-1]
    self.y_train = train.iloc[:1000,:][label]
    self.y_test = test.iloc[:1000,:][label]
    #self.data = self.data.rename(columns={"Unnamed: 0": "T0"})

    # defining function for scaling data 
  def scaler(self):
    sc = StandardScaler()
    self.scaled_X_train = sc.fit_transform(self.X_train)
    self.scaled_X_test = sc.transform(self.X_test)

    #definig the SVM model and corresponding parameters
  def svm(self, hyper_parameter={}):
    
    clf = svm.SVC(**hyper_parameter, class_weight =\
                    'balanced', random_state=0)
   
    clf.fit(self.scaled_X_train, self.y_train)
    self.y_pred = clf.predict(self.scaled_X_test)
    return f1_score(self.y_test, self.y_pred)
    
# Defining the space for hyperparameter tuning.
space4svm = {
    'C': hp.uniform('C', 0, 20),
    'kernel': hp.choice('kernel', ['linear', 'sigmoid', 'poly', 'rbf']),
    'gamma': hp.choice('gamma', ['scale', 'auto']),
    'degree': hp.quniform('degree', 3, 10, 1),
    'coef0': hp.uniform('coef0', 0, 10)
}


def objective(space4svm):
    """
    hp.choice(label, options) — Returns one of the options, which should be a list or tuple.
    hp.randint(label, upper) — Returns a random integer between the range [0, upper).
    hp.uniform(label, low, high) — Returns a value uniformly between low and high.
    hp.quniform(label, low, high, q) — Returns a value round(uniform(low, high) / q) * q,
    i.e it rounds the decimal values and returns an integer.
    hp.normal(label, mean, std) — Returns a real value that’s normally-distributed with
    mean and standard deviation sigma.
    """
    model = Model(
        "data_transformed_annotated_20pct.csv",
        "data_transformed_annotated_10pct.csv")
        #"data_transformed.csv",
        #"data_transformed.csv")
    model.scaler()
    space4svm['degree'] = int(space4svm['degree'])
    print(space4svm)
    f1 = model.svm(space4svm)
    print(f1)
    return {'loss': -f1, 'status': STATUS_OK}


# Initialize trials object.
trials = Trials()
# using seed to get repeatable results.
SEED = 123
# run the hyper paramter tuning.
best = fmin(fn=objective,
            space=space4svm,
            algo=tpe.suggest,
            max_evals=30,
            trials=trials,
            rstate=np.random.RandomState(SEED))

print(best)
print(space_eval(space4svm, best))


{'C': 13.991786432776268, 'coef0': 2.371507705029048, 'degree': 7, 'gamma': 'scale', 'kernel': 'linear'}               
1.0                                                                                                                    
{'C': 9.269271564690971, 'coef0': 0.8746958685940198, 'degree': 6, 'gamma': 'scale', 'kernel': 'linear'}               
1.0                                                                                                                    
{'C': 0.47346374374090283, 'coef0': 8.364423581948998, 'degree': 8, 'gamma': 'scale', 'kernel': 'rbf'}                 
0.8571428571428571                                                                                                     
{'C': 1.7718619582441852, 'coef0': 1.6216340381955197, 'degree': 8, 'gamma': 'scale', 'kernel': 'poly'}                
1.0                                                                                                                    
{'C': 12.76735262360134, 'coef0': 4.1687

# Conclusion:

This task is done in a team of 2 students. The given dataset was analyzed and modelled using SVM Model. Hyperparameters were tuned using hyperopt. Hyperparameter tuning is an important step in building a learning algorithm model. Best parameters for SVM model are 'C': 13.991786432776268, 'coef0': 2.371507705029048, 'degree': 7.0, 'gamma': 'scale', 'kernel': 'linear'.