<h1><center>Support Vector Machine</center></h1>

In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split 
from sklearn.metrics import accuracy_score, recall_score, precision_score
from sklearn.utils import shuffle

Below is the code of the cost function:
$$J(w) = \frac{1}{2}||w||^2 + C\left[\frac{1}{N}\sum\limits_{i}^{n}max(0, 1-y_i * (w \cdotp x_i + b))\right]$$
The first part of the function corresponds to the margin, in fact the width between the two (positive and negative) hyperplanes is equal to: 
$$width = (x_+ - x_-)* \frac{w}{||w||}$$ and joining the previous equation with the following equations of the two hyperplanes:
$$y_i*(w x_+ + b) -1 = 0$$ and $$ y_i*(w x_- + b) -1 = 0$$ with $$y_i = \begin{cases}
  1 & \text{for } x_+  \\   
  -1 & \text{for } x_-
\end{cases}$$
we obtain $$width = \frac{2}{||w||}$$ and we have to maximize the width which is the same to minimize w and the trick is to transform  $$\text{min }w$$ into $$ min\frac{1}{2}||w||^2$$ 
The second part of the function which begins by C is called Hinge loss function and we have to minimize the sum which corresponds to distance between positive (or negative) hyperplane and our training set. C is a regularization parameter, larger C results in narrow margin and a smaller in a wider margin. N is just the number of lines we have in ours features.

In [2]:
def compute_cost_function(W, X, Y):
    distances = 1 - Y * (np.dot(X, W))
    distances[distances < 0] = 0    # the max between 0 and the distance is kept
    hinge_loss = C * (np.sum(distances) / X.shape[0])
    return 1 / 2 * np.dot(W, W) + hinge_loss ## = cost

Below is the code of the gradient of the cost function:

We can simplify $$max(0, 1-y_i * (w \cdotp x_i + b))$$ into $$max(0, 1-y_i * (W \cdotp X_i))$$ with $$W =(w,b)$$ and $$X = (x_i,1)$$
and we obtain with the previous simplification:

$$J(w) = \frac{1}{2}||w||^2 + C\left[\frac{1}{N}\sum\limits_{i}^{n}max(0, 1-y_i * (W \cdotp X_i))\right]$$
and finally, we have the following gradient of the cost function:

$$\nabla_w J(w) = \frac{1}{N}\sum\limits_{i}^{n}
\begin{cases}
  w & \text{if } max(0, 1-y_i * (W \cdotp X_i))=0 \\   
  w-Cy_ix_i    & \text{otherwise}
\end{cases}$$

In [3]:
def compute_cost_function_gradient(W, X, Y):
    Y = np.array([Y])
    X = np.array([X]) 
    distances = 1 - (Y * np.dot(X, W))
    grad = np.zeros(len(W))
    for index, value in enumerate(distances):
        if max(0, value) == 0:
            dist = W
        else:
            dist = W - (C * Y[index] * X[index])
        grad += dist
    return grad/len(Y)

Now, we have to compute the stochastic gradient descent function:
For this purpose, we have to minimise the two parts of the following equation:
$$J(w) = \frac{1}{2}||w||^2 + C\left[\frac{1}{N}\sum\limits_{i}^{n}max(0, 1-y_i * (W \cdotp X_i))\right]$$
The gradient is the direction of the inscrease of the function J(w). We need to go to the direction of the decrease, that's why we calculate the gradient of the cost function from the train set. Particularly, we perform the gradient descent by substracting a learning rate multiplied by the gradient of the cost function from the weight initialized with zero value. And we compute the cost for all the 2^n. In this way, we can determine the weight by repeating the procedure a number of times we decide (here 2048 cycles). We can add a criterion to stop before the max_cycles value by comparing the difference between the previous cost and the new cost and if this mesurement is smaller than a percentage of the old cost, we stop the cycles and return directly the weigts.

In [4]:
def compute_stochastic_gradient_descent(features, outputs):
    max_cycles, rate = 2049, 0.001 
    weights = np.zeros(features.shape[1])
    parameter, cycle = 0, 0
    previous_cost = float("inf")
    while cycle <max_cycles:
        cycle += 1
        X, Y = shuffle(features, outputs)
        for index, value_of_X in enumerate(X):
            ascention = compute_cost_function_gradient(weights, value_of_X , Y[index])
            weights = weights - (learning_rate * ascention)
        if cycle == 2**parameter:
            cost = compute_cost_function(weights, features, outputs)
            print("nb_of_cycles: {} and Cost: {}".format(cycle, cost))
            parameter +=1
            print('cost=', cost)   
            if abs(previous_cost - cost) < rate * previous_cost:
                cycle = max_cycles
            previous_cost = cost
        
    return weights

In [5]:
C = 10000
learning_rate = 0.000001
#read and display dataset
data = pd.read_csv('merge_cyto_periplasm.csv')
print(data)

              id        v1        v2        v3        v4        v5        v6  \
0     A0A2X4V775 -0.009765  0.200064 -0.104441 -0.939137 -0.015482 -0.243657   
1     A0A5M9R2F2 -0.044548  0.192516 -0.147199 -0.894085 -0.009355 -0.330328   
2     A0A4D6Y563 -0.063157  0.066017 -0.203133 -0.920998 -0.067741 -0.103863   
3         P76341  0.036377  0.241014 -0.051348 -0.937244  0.013017 -0.207041   
4         A0Z7F9  0.140453  0.229471 -0.070225 -0.960873 -0.028735 -0.324649   
...          ...       ...       ...       ...       ...       ...       ...   
3995  A0A484GB52 -0.020858  0.087987 -0.106980 -0.970410 -0.041472 -0.208587   
3996  A0A5Q4GKP0  0.045559  0.169432 -0.123162 -0.986888 -0.052270 -0.131644   
3997  A0A2E4GIK9 -0.056719  0.077032 -0.154847 -0.982715 -0.055467 -0.096901   
3998  A0A2M7CQW5  0.015895  0.210303 -0.065212 -0.978872 -0.021216 -0.095097   
3999  A0A368CC27  0.020618  0.109883 -0.113834 -0.971576 -0.032970 -0.167549   

            v7        v8        v9  ...

In [6]:
#convert existing labels into 1 et -1 labels
diag_map = {1: 1.0, 2: -1.0}
data['class'] = data['class'].map(diag_map)
print(data)

              id        v1        v2        v3        v4        v5        v6  \
0     A0A2X4V775 -0.009765  0.200064 -0.104441 -0.939137 -0.015482 -0.243657   
1     A0A5M9R2F2 -0.044548  0.192516 -0.147199 -0.894085 -0.009355 -0.330328   
2     A0A4D6Y563 -0.063157  0.066017 -0.203133 -0.920998 -0.067741 -0.103863   
3         P76341  0.036377  0.241014 -0.051348 -0.937244  0.013017 -0.207041   
4         A0Z7F9  0.140453  0.229471 -0.070225 -0.960873 -0.028735 -0.324649   
...          ...       ...       ...       ...       ...       ...       ...   
3995  A0A484GB52 -0.020858  0.087987 -0.106980 -0.970410 -0.041472 -0.208587   
3996  A0A5Q4GKP0  0.045559  0.169432 -0.123162 -0.986888 -0.052270 -0.131644   
3997  A0A2E4GIK9 -0.056719  0.077032 -0.154847 -0.982715 -0.055467 -0.096901   
3998  A0A2M7CQW5  0.015895  0.210303 -0.065212 -0.978872 -0.021216 -0.095097   
3999  A0A368CC27  0.020618  0.109883 -0.113834 -0.971576 -0.032970 -0.167549   

            v7        v8        v9  ...

In [7]:
#assign labels and features to different data frames
Y = data.loc[:, 'class']
X = data.iloc[:, 1:193]

In [8]:
# normalize values of features to avoid overflow
X_normalized = MinMaxScaler().fit_transform(X.values)
X = pd.DataFrame(X_normalized)

In [9]:
#insert a new column b full of 1 at the end
X.insert(loc=len(X.columns), column='b', value=1)

In [10]:
#split dataset to obtain train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

In [11]:
#train the model
print("training start")
W = compute_stochastic_gradient_descent(X_train.to_numpy(), y_train.to_numpy())
print("training end")
print("weights are: {}".format(W))

training start
nb_of_cycles: 1 and Cost: 799.9733285398822
cost= 799.9733285398822
nb_of_cycles: 2 and Cost: 558.163733566956
cost= 558.163733566956
nb_of_cycles: 4 and Cost: 475.9975838082105
cost= 475.9975838082105
nb_of_cycles: 8 and Cost: 455.8999913014187
cost= 455.8999913014187
nb_of_cycles: 16 and Cost: 378.2663782243924
cost= 378.2663782243924
nb_of_cycles: 32 and Cost: 327.32891585969065
cost= 327.32891585969065
nb_of_cycles: 64 and Cost: 301.23904110024125
cost= 301.23904110024125
nb_of_cycles: 128 and Cost: 322.8423226987466
cost= 322.8423226987466
nb_of_cycles: 256 and Cost: 259.04151276435647
cost= 259.04151276435647
nb_of_cycles: 512 and Cost: 280.1468152150309
cost= 280.1468152150309
nb_of_cycles: 1024 and Cost: 329.6807707259754
cost= 329.6807707259754
nb_of_cycles: 2048 and Cost: 263.7403646072589
cost= 263.7403646072589
training end
weights are: [ 7.50694442e-01  2.77537955e+00 -1.18015851e+00 -9.74146563e-01
  1.60852501e+00 -1.39915874e-01  9.62205857e-01 -2.4747656

In [12]:
print("apply the model on test set:")
y_train_predicted = np.array([])
for i in range(X_train.shape[0]):
    ypred = np.sign(np.dot(X_train.to_numpy()[i], W))
    y_train_predicted = np.append(y_train_predicted, ypred)

y_test_predicted = np.array([])
for i in range(X_test.shape[0]):
    ypred = np.sign(np.dot(X_test.to_numpy()[i], W))
    y_test_predicted = np.append(y_test_predicted, ypred)

print("accuracy of svm on test dataset: {}".format(accuracy_score(y_test, y_test_predicted)))
print("precision of svm on test dataset: {}".format(precision_score(y_test, y_test_predicted)))

apply the model on test set:
accuracy of svm on test dataset: 0.9825
precision of svm on test dataset: 0.9811320754716981
