# Predicting Heart Disease using Support Vector Machines
This code trains a non-linear SVM classifier $\mathbf w, b$ such that $\text{sgn}(\mathbf w^T\phi(\mathbf x)+b) = 1$ if a patient having attribs. $\mathbf x$ is expected to develop heart disease and $-1$ otherwise. From domain knowledge, it is known that a soft slackness of $l(\xi_i) = \log\left(1+\exp\left(\xi_i\right)\right)$ is necessary. This gives us the following optimization problem:
$$\min_{\mathbf w \in \mathbb R^d, b\in \mathbb R, \xi \in \mathbb R^n} \frac12 \mathbf w^T\mathbf w + C\displaystyle\sum_{i = 0}^{n-1} l(\xi_i)$$ $$\text{subject to } \xi_i = -y_i(\mathbf w^T\phi(\mathbf x_i) + b)$$
which is a convex non-linear programming problem. We use ```scipy.optimize.minimize``` to iteratively solve this, and compare our non-linear and slack classifiers' results with those produced by ```sklearn.svm.LinearSVC```.

## Importing necessary modules

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from scipy.optimize import minimize
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.svm import LinearSVC

## Performing dual Lagrangian optimization
To formulate the Lagrangian of the given optimization problem, we introduce n Lagrange multipliers $\alpha_i$ for each constraint $\xi_i = -y_i\times(\mathbf w^T\phi(\mathbf x_i)+b)$.

The Lagrangian function is given by:
$\mathcal L(w,b,\xi,\alpha) = \frac12 ||\mathbf w||^2 + C\displaystyle\sum_{i = 0}^{n-1} l(\xi_i) - \displaystyle\sum_{i=0}^{n-1} \alpha_i (\xi_i + y_i(\mathbf w^T\phi(\mathbf x_i) + b))$

where $l(x) = \log(1+e^x)$

### KKT conditions
The corresponding KKT conditions for the above optimization problem are:

#### Stationarity:

∇w L(w,b,xi,alpha) = 0 => $\mathbf w =\displaystyle\sum_{i=0}^{n-1}\alpha_iy_i \phi(x_i)$

∇b L(w,b,xi,alpha) = 0 => $\displaystyle\sum_{i=0}^{n-1}\alpha_iy_i = 0$

∇xi_i L(w,b,xi,alpha) = 0 => $Cl'(\xi_i)-\alpha_i = 0\implies \alpha_i = \frac{C}{1+\exp(-\xi_i)} \implies \xi_i = \log\left(\frac{\alpha_i}{C-\alpha_i}\right)$

#### Primal feasibility:

xi_i >= 0 for all i

#### Dual feasibility:

α_i >= 0 for all i

#### Complementary slackness:

α_i * xi_i = 0 for all i

#### Feasibility of the constraints:

y_i(w.T*phi(x_i) + b) >= 1 - xi_i for all i

Using the KKT conditions, we can eliminate $\mathbf w$ and $b$ from the Lagrangian to get the dual optimization problem.

## Dual optimization statement
The dual optimization problem is given by:

$$\displaystyle\max_{\alpha}\mathcal L_D$$
$$\text{subject to } 0\leq\alpha_i\ \forall i\in\{0,1,2,\dots n-1\}$$
And we must find $\mathcal L_D$ via that elimination, which comes from the stationary conditions. 

$$\tag 1 \frac12||\mathbf w||^2 = \frac12\left(\displaystyle\sum_{i=0}^{n-1}\alpha_i y_i \phi(x_i)\right)^T\left(\displaystyle\sum_{i=0}^{n-1}\alpha_i y_i \phi(x_i)\right) = \frac12\displaystyle\sum_{i=0}^{n-1}\displaystyle\sum_{j=0}^{n-1}\alpha_i\alpha_jy_iy_jK(x_i,x_j)$$
$$\tag 2 C\displaystyle\sum_{i=0}^{n-1}l(\xi_i) = C\displaystyle\sum_{i=0}^{n-1}\log(1+\exp\log\left(\frac{\alpha_i}{C-\alpha_i}\right)) = C\displaystyle\sum_{i=0}^{n-1}\log\left(\frac{C}{C-\alpha_i}\right)$$
$$\displaystyle\sum_{i=0}^{n-1}\alpha_i (\xi_i + y_i(\mathbf w^T\phi(\mathbf x_i) + b)) = \displaystyle\sum_{i=0}^{n-1}\alpha_i\xi_i + \displaystyle\sum_{i=0}^{n-1}\alpha_iy_if(\mathbf w, \mathbf x, b)$$
$$\tag 3 = \displaystyle\sum_{i=0}^{n-1}\alpha_i\xi_i = \displaystyle\sum_{i=0}^{n-1}\alpha_i\log\left(\frac{\alpha_i}{C-\alpha_i}\right)$$
$$\mathcal L_D= (1)+(2)-(3)$$

Here, $K(x_i, x_j) = \phi(x_i)\phi(x_j)^T$ is the kernel function that maps the input data to a higher-dimensional feature space.

In [None]:
def preprocess(data):
    # Separate the features and class 
    X = data.drop('target', axis=1)
    y = data['target']

    # y has labels 0 and 1. Convert them to -1 and 1
    y = 2 * y - 1

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

    # index of the splits are randomized. Reset the index
    X_train = X_train.reset_index(drop=True)
    X_test = X_test.reset_index(drop=True)
    y_train = y_train.reset_index(drop=True)
    y_test = y_test.reset_index(drop=True)


    # Normalize the data
    X_train = (X_train - X_train.mean()) / X_train.std()
    X_test = (X_test - X_test.mean()) / X_test.std()

    # convert the dataframes to numpy arrays
    X_train = X_train.values
    X_test = X_test.values
    y_train = y_train.values
    y_test = y_test.values

    return X_train, X_test, y_train, y_test

In [None]:
# Define the SVM class
class SVM_classifier:
    # Constructor takes the necessary hyper-parameters like C, the kernel type and the hyper-parameters related to the kernels as input and initialize the module
    def __init__(self, kernel='linear', C=1, degree=3, gamma=0.5):
        self.kernel = kernel
        self.C = C
        self.degree = degree
        self.gamma = gamma

    # Take the train data as input and learn the parameters
    def fit(self, X, y):
        # Define the kernel function
        def kernel(x1, x2):
            if self.kernel == 'linear':
                return np.dot(x1, x2)
            elif self.kernel == 'poly':
                return (np.dot(x1, x2) + 1) ** self.degree
            elif self.kernel == 'rbf':
                # return np.exp(-self.gamma * np.dot(x1 - x2, x1 - x2))
                return np.exp(-self.gamma * (np.linalg.norm(x1 - x2) ** 2))
            
        # Define the objective function
        def objective(a, K):
            return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
        
        # Define the constraint functions
        def zerofun(a):
            return np.dot(a, y)
        
        # Define the kernel matrix
        n_samples, n_features = X.shape
        K = np.zeros((n_samples, n_samples))
        for i in range(n_samples):
            for j in range(n_samples):
                K[i, j] = kernel(X[i], X[j])

        # use scipy.optimize.minimize to solve the quadratic programming problem
        # define the initial values of the lagrange multipliers
        init_a = np.full(n_samples, self.C/2)

        # define the bounds for the lagrange multipliers
        bounds = [(0, self.C) for _ in range(n_samples)]

        # define the constraints
        constraints = {'type': 'ineq', 'fun': zerofun}

        # minimize the objective function
        res = minimize(objective, init_a, args=(K,), method='SLSQP', bounds=bounds, constraints=constraints)

        # get the lagrange multipliers
        a = res.x

        # print("The lagrange multipliers are: ", a)

        # Get the support vectors
        sv = a > 1e-5
        ind = np.arange(len(a))[sv]

        # save the support vectors
        self.a = a[sv]
        self.sv = X[sv]
        self.sv_y = y[sv]

        # Get the intercept
        self.b = 0
        for n in range(len(a)):
            self.b += self.sv_y[n]
            self.b -= np.sum(self.a * self.sv_y * K[ind[n], sv])
        self.b /= len(a)


    # take the test data as input and return the predictions on the data
    def predict(self, X):
        y_pred = np.zeros(len(X))
        for i in range(len(X)):
            s = 0
            for a, sv_y, sv in zip(self.a, self.sv_y, self.sv):
                if self.kernel == 'linear':
                    s += a * sv_y * np.dot(X[i], sv)
                elif self.kernel == 'poly':
                    s += a * sv_y * (np.dot(X[i], sv) + 1) ** self.degree
                elif self.kernel == 'rbf':
                    s += a * sv_y * np.exp(-self.gamma * np.linalg.norm(X[i] - sv) ** 2)
            y_pred[i] = s
        return np.sign(y_pred + self.b)
    

In [None]:
# Takes data and the kernel type as input and loops through different values of C and returns the optimal value of C
def get_optimal_C(X, y, kernel):

    # split the data into training and validation data in ratio 90:10
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

    # define the list of values of C
    C_list = [0.01, 0.1, 1, 10, 100]

    opt_c = 0
    max_acc = 0
    opt_deg = 0
    opt_gamma = 0

    acc_vals = []

    # loop through the values of C
    if kernel=='linear':
        for c in C_list:
            # initialize the svm class
            svm = SVM_classifier(kernel=kernel, C=c)

            # fit the model
            svm.fit(X_train, y_train)

            # predict the labels on the test data
            y_pred = svm.predict(X_test)

            # get the classification report
            report = classification_report(y_test, y_pred, output_dict=True)

            # get the accuracy
            acc = report['accuracy']
            acc_vals.append(acc)

            # check if the accuracy is greater than the maximum accuracy
            if acc > max_acc:
                max_acc = acc
                opt_c = c
            
            print("C =", c, 'accuracy =', acc)
        # End for
        
    elif kernel=='poly':
        for degree in [2, 3, 5, 7]:
            for c in C_list:
                # initialize the svm class
                svm = SVM_classifier(kernel=kernel, C=c, degree=degree)

                # fit the model
                svm.fit(X_train, y_train)

                # predict the labels on the test data
                y_pred = svm.predict(X_test)

                # get the classification report
                report = classification_report(y_test, y_pred, output_dict=True)

                # get the accuracy
                acc = report['accuracy']
                acc_vals.append(acc)

                # check if the accuracy is greater than the maximum accuracy
                if acc > max_acc:
                    max_acc = acc
                    opt_c = c
                    opt_deg = degree
                print("C =", c, 'degree =', degree, 'accuracy =', acc)
            # End for
        # End for

    elif kernel=='rbf':
        for gamma in [0.01, 0.1, 0.5, 2]:
            for c in C_list:
                # initialize the svm class
                svm = SVM_classifier(kernel=kernel, C=c, gamma=gamma)

                # fit the model
                svm.fit(X_train, y_train)

                # predict the labels on the test data
                y_pred = svm.predict(X_test)

                # get the classification report
                report = classification_report(y_test, y_pred, output_dict=True)

                # get the accuracy
                acc = report['accuracy']
                acc_vals.append(acc)

                # check if the accuracy is greater than the maximum accuracy
                if acc > max_acc:
                    max_acc = acc
                    opt_c = c
                    opt_gamma = gamma     
                print("C =", c, 'gamma =', gamma, 'accuracy =', acc)
            # End for
        # End for     
    print('Optimal hyperparameters:')
    print("C =", opt_c)
    if opt_deg != 0:
        print("Degree =", opt_deg)
    if opt_gamma != 0:
        print("Gamma =", opt_gamma)

    return opt_c, max_acc, opt_deg, opt_gamma
        

## Driver Code

In [None]:
def print_report(X_train, y_train, X_test, y_test, kernel):
    C_opt, acc, degree_opt, gamma_opt = get_optimal_C(X_train, y_train, kernel=kernel)
    svm = SVM_classifier(kernel=kernel, C=C_opt, degree=degree_opt, gamma=gamma_opt)
    svm.fit(X_train, y_train)
    y_pred = svm.predict(X_test)
    out = classification_report(y_test, y_pred, output_dict=True)
    print(pd.DataFrame(out).drop(['accuracy'], axis=1))
    print("accuracy:", out['accuracy'])
    print()
    

In [None]:
df = pd.read_csv('heart.csv')
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52,1,0,125,212,0,1,168,0,1.0,2,2,3,0
1,53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
2,70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
3,61,1,0,148,203,0,1,161,0,0.0,2,1,3,0
4,62,0,0,138,294,1,1,106,0,1.9,1,3,2,0


In [None]:
X_train, X_test, y_train, y_test = preprocess(df)

In [None]:
# Linear kernel
print_report(X_train, y_train, X_test, y_test, kernel='linear')

  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 0.01 accuracy = 0.8536585365853658


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


In [None]:
# Polynomial kernel
print_report(X_train, y_train, X_test, y_test, kernel='poly')

In [None]:
# RBF kernel
print_report(X_train, y_train, X_test, y_test, kernel='rbf')

  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 0.01 gamma = 0.01 accuracy = 0.5975609756097561


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 0.1 gamma = 0.01 accuracy = 0.8292682926829268


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 1 gamma = 0.01 accuracy = 0.8414634146341463


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 10 gamma = 0.01 accuracy = 0.8780487804878049


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 100 gamma = 0.01 accuracy = 0.8902439024390244


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 0.01 gamma = 0.1 accuracy = 0.7073170731707317


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 0.1 gamma = 0.1 accuracy = 0.8536585365853658


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 1 gamma = 0.1 accuracy = 0.9024390243902439


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 10 gamma = 0.1 accuracy = 0.9634146341463414


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 100 gamma = 0.1 accuracy = 1.0


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


C = 0.01 gamma = 0.5 accuracy = 0.4878048780487805


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 0.1 gamma = 0.5 accuracy = 0.7804878048780488


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 1 gamma = 0.5 accuracy = 1.0


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 10 gamma = 0.5 accuracy = 1.0


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 100 gamma = 0.5 accuracy = 1.0


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


C = 0.01 gamma = 2 accuracy = 0.4878048780487805


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 0.1 gamma = 2 accuracy = 0.6219512195121951


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 1 gamma = 2 accuracy = 0.975609756097561


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 10 gamma = 2 accuracy = 0.975609756097561


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


C = 100 gamma = 2 accuracy = 0.9634146341463414
Optimal hyperparameters:
C = 100
Gamma = 0.1


  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))
  return 0.5 * np.sum(np.outer(a, a) * np.outer(y, y) * K) + self.C * np.sum(np.log(self.C / (self.C - a))) + np.sum(a * np.log(a / (self.C - a)))


             -1      1  macro avg  weighted avg
precision   1.0    1.0        1.0           1.0
recall      1.0    1.0        1.0           1.0
f1-score    1.0    1.0        1.0           1.0
support    98.0  107.0      205.0         205.0
accuracy: 1.0



In [None]:
# LinearSVC module
# initialize the LinearSVC class
max_acc = 0
for C in [0.01, 0.1, 1, 10, 100]:
    clf = LinearSVC(C=C)

    # fit the model
    clf.fit(X_train, y_train)

    # predict the labels on the test data
    y_pred = clf.predict(X_test)

    # convert numpy array to dataframe
    y_pred = pd.DataFrame(y_pred)

    # get the classification report
    out = classification_report(y_test, y_pred, output_dict=True)
    print("C =", C)
    print(pd.DataFrame(out).drop(['accuracy'], axis=1))
    print("accuracy:", out['accuracy'])
    max_acc = max(max_acc, out['accuracy'])
    print()

print('Maximum accuracy by the LinearSVC model:', max_acc)

C = 0.01
                  -1           1   macro avg  weighted avg
precision   0.915663    0.819672    0.867667      0.865560
recall      0.775510    0.934579    0.855045      0.858537
f1-score    0.839779    0.873362    0.856571      0.857308
support    98.000000  107.000000  205.000000    205.000000
accuracy: 0.8585365853658536

C = 0.1
                  -1           1   macro avg  weighted avg
precision   0.916667    0.826446    0.871556      0.869576
recall      0.785714    0.934579    0.860147      0.863415
f1-score    0.846154    0.877193    0.861673      0.862355
support    98.000000  107.000000  205.000000    205.000000
accuracy: 0.8634146341463415

C = 1
                  -1           1   macro avg  weighted avg
precision   0.916667    0.826446    0.871556      0.869576
recall      0.785714    0.934579    0.860147      0.863415
f1-score    0.846154    0.877193    0.861673      0.862355
support    98.000000  107.000000  205.000000    205.000000
accuracy: 0.8634146341463415

C 

