Part A: Linear Support Vector Machine (SVM)
Dataset: You will use the Iris dataset for binary classification. Use petal length and
petal width features of the Iris dataset, and determine whether a sample is Iris Virginica
or not.
URL: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html
1. Implement a Linear_SVC model class for performing binary classification. The
model should implement the batch Gradient Descent (GD) algorithm.
 [40 pts]
a) __init__(self, C=1, max_iter=100, tol=None, learning_rate=‘constant’,
learning_rate_init=0.001, t_0=1, t_1=1000, early_stopping=False,
validation_fraction=0.1, **kwargs)
This method is used to initialize the data members of the class when an object
of class is created. For example, self.C = C
 Arguments:
C : float
It provides the regularization/penalty coefficient.
max_iter : int
Maximum number of iterations. The GD algorithm iterates until
convergence (determined by ‘tol’) or this number of iterations.
tol : float
Tolerance for the optimization.
learning_rate : string (default ‘constant’)
It allows to specify the technique to set the learning rate: constant
learning for all iterations, or varying learning rate given by a learning rate
schedule function.
3
- ‘constant’: a constant learning rate given by ‘learning_rate_init’.
- ‘adaptive’: gradually decreases the learning rate based on a
schedule. Write a function that would be used if learning_rate is set
to ‘adaptive’. When ‘adaptive’ is used, the ‘learning_rate_init’
parameter has no effect as the learning rate varies by a learning rate
schedule function. It uses the ‘t_0’ and ‘t_1’ parameters (see below).
-
 Pseudocode for the “adaptive” learning_rate function:
 Write a function that decreases learning rate gradually during each iteration:
Learning rate = !_#
$!%&'!$()*!_+
 where t_0 and t_1 are two constants that you need to determine empirically.
 Choose constant t_0 and t_1 such that initially the learning rate is large
enough.
learning_rate_init : double
The initial learning rate value if learning_rate is set to ‘constant’. It
controls the step-size in updating the weights. It has no effect is the
‘learning_rate’ is ‘adaptive’.
early_stopping : Boolean, default=False
Whether to use early stopping to terminate training when validation score
is not improving. If set to True, it will automatically set aside a fraction
of training data as validation and terminate training when validation
score is not improving.
validation_fraction : float, default=0.1
The proportion of training data to set aside as validation set for early
stopping. Must be between 0 and 1. Only used if early_stopping is True.
b) fit(self, X, Y):
Implement the batch GD algorithm in the fit method. The weight vector and the
intercept/bias should be denoted by w and b, respectively. Store the cost values
for each iteration so that later you can use it to create a learning curve.
Arguments:
X : ndarray
A numpy array with rows representing data samples and columns
representing features.
Y : ndarray
A 1D numpy array with labels corresponding to each row of the feature
matrix X.
Note: the “fit” method should update the following parameters:
4
 self.intercept_ = np.array([b])
 self.coef_ = np.array([w])
 self.support_vectors_ =
 The “fit” method should display the total number of iterations using a print
statement.
Returns:
self
c)
 predict(self, X)
 Arguments:
X : ndarray
A numpy array containing samples to be used for prediction. Its rows
represent data samples and columns represent features.
Returns:
1D array of predicted class labels for each row in X.
Note: the “predict” method uses the self.coef_[0] and self.intercept_[0] to make
predictions.
Binary Classification using Linear_SVC Classifier
2. Read the Iris data using the sklearn.datasets.load_iris method. Create the data
matrix X by using two features: petal length and petal width. Recode the binary
target such that Iris-Virginica samples are 1, and other samples are 0.
[1 pts]
3. Partition the data into train and test set (80% - 20%). Use the “Partition”
function from your previous assignment.
[2 pts]
4. Model selection via Hyper-parameter tuning: Use the kFold function from
previous assignment to find the optimal values for the following
hyperparameters.
 [5 pts]

C
learning_rate
learning_rate_init (when ‘constant’ learning_rate is used)
5
max_iter
tol
5. Train the model using optimal values for the hyperparameters and evaluate on the
test data. Report the test accuracy and test confusion matrix.
[5 pts]
6. Plot the learning curve.
 [5 pts]
7. Plot the decision boundary and show the support vectors using the
“decision_boundary_support_vectors” function given in:
https://github.com/rhasanbd/Support-Vector-Machine-Classifier-BeginnersSurvival-Kit/blob/master/Support%20Vector%20Machine-1-
Linearly%20Separable%20Data.ipynb
 [12 pts]
Note that if your test accuracy is less than 95% you will lose 10% of the total
obtained points. If your test accuracy is less than 90% you will lose 30% of the
total obtained points.
8. [Extra Credit for 478 and Mandatory for 878] Implement early stopping in
the “fit” method of the Linear_SVC model. You will have to use the following
two parameters of the model: early_stopping and validation_fraction. Also note
that when training the model using early stopping it should generate an early
stopping curve. [10 pts] 

In [1]:
import numpy as np 
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt

from sklearn.datasets import load_iris
from sklearn.svm import LinearSVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score,confusion_matrix, precision_score, recall_score, f1_score, roc_curve, precision_recall_curve, classification_report


In [2]:
iris = load_iris()
X = iris["data"][:, (2, 3)]  
y = (iris["target"] == 2).astype(np.int)[:,None]

In [3]:
Z = np.hstack([X,y])
np.random.shuffle(Z)
X= Z[:,0:2]
y = Z[:,2]

In [4]:
# Standarize the data
X = (X - np.mean(X))/np.std(X)

def split_trainTest(X,y,t):
    train_size = int((1-t) * X.shape[0])   
    return X[:train_size],X[train_size:],y[:train_size],y[train_size:]

X_train, X_test, y_train, y_test = split_trainTest(X,y,t=0.2)

In [5]:
class Linear_SVC:
    def __init__(self, C=1, max_iter=100, tol=None, learning_rate='constant',learning_rate_init=0.001, t_0=1, t_1=500, early_stopping=False, validation_fraction=0.1,**kwargs):
        self.C = C
        self.w = np.random.rand(2,1)
        self.b = 0
        #self.w = np.reshape(np.array([[1.60381071, 3.05536121]]),(2,1))
        #self.b = -1.13503736      
        self.epochs = max_iter
        self.lr_sc = learning_rate
        self.learning_rate = learning_rate_init
        self.tol = tol 
        self.t_0 = t_0
        self.t_1 = t_1
        self.loss = 1e5
        self.intercept_ = None
        self.coef_ = None
        self.support_vectors_ = None
        self.early_stopping = early_stopping
        self.validation_fraction = validation_fraction
    
    def fit(self,X,y):
        ##ADd early_stopping like Assignment 3
        epch_counter = 0
        
        val_loss = []
        self.validation_score=1e5
        
        if self.early_stopping:
            X, X_valid, y, y_valid = split_trainTest(X, y, self.validation_fraction)
        
        # y was re-assigned into 1 and -1.
        t = (2 * y - 1)[:,None]
        
        
        while epch_counter < self.epochs:
            loss = self.loss
            epch_counter += 1
            validation_score = self.validation_score
            #print(f'Epoch: {epch_counter}')
            #pdb.set_trace()
            w = self.w
            b = self.b
            if(self.lr_sc == "adaptive"):
                self.learning_rate = self.t_0 /(epch_counter + self.t_1)    
        
            idx_sv = ((t * ((X @ w) + b)) < 1).ravel()
            X_sv = X[idx_sv]
            t_sv = t[idx_sv]
            
            self.loss = ((0.5 * (w.T @ w))+ (self.C*(np.sum(1- (t_sv * (X_sv @ w))) - b* np.sum(t_sv)))).item()
            #self.loss = ((0.5 * (w.T @ w))+ (self.C*(np.sum(1- X_sv@w) - b* np.sum(t_sv)))).item()
            #self.loss = ((0.5 * (w.T @ w))+ (self.C*np.sum((np.maximum(0,1-t*(X@w+b)))))).item()
            # print(self.loss)
            dw = w - (self.C * np.sum(t_sv * X_sv))
            db = -self.C * np.sum(t_sv)
            
            self.w = w - self.learning_rate * dw
            self.b = b - self.learning_rate * db
            self.support_vectors_ = X_sv
            ## Assing intercept,Coef and Support vec
            
            if self.early_stopping:
                t_val = (2 * y_valid - 1)[:,None]
                self.validation_score = ((0.5 * (self.w.T @ self.w))+ (self.C * (np.sum(1- (X_valid @ self.w)) - b * np.sum(t_val)))).item()
                val_loss.append(self.validation_score)
                # print("self.validation_score", self.validation_score)
                # print("self.loss", self.loss)
                if (self.validation_score > validation_score):
                    print(f'\nEarly Stopping at : {epch_counter}\n')
                    plt.plot(range(epch_counter), val_loss, "--", color='darkorange', lw = 2)
                    plt.xlabel("epoch")
                    plt.ylabel("Loss : J(w)")
                    break
            
            if (self.tol != None) and (np.abs(self.loss - loss) < self.tol):         
                break
                    
        self.coef_ = np.array([w])
        self.intercept_ = np.array([b])
        
    def predict(self,X):
        return ((X@self.coef_[0] + self.intercept_[0]) >= 1).astype(np.int)
        
        

In [6]:
def sFold(folds,data,labels,model,error_fuction,**model_args):
    if(labels.shape == (labels.shape[0],)):
        labels = np.expand_dims(labels,axis=1)
    dataset = np.concatenate([data,labels],axis=1)
    s_part = s_partition(dataset,folds)
    pred_y = []
    true_y = []
    err_func = []
    for idx,val in enumerate(s_part):
        test_y = val[:,-1]
        #test_y = np.expand_dims(test_y, axis=1)
        test = val[:,:-1]
        train = np.concatenate(np.delete(s_part,idx,0))
        label = train[:,-1]
        train = train[:,:-1]        
        #model.fit(train,label,**model_args) 
        model.fit(train,label)  
        pred = model.predict(test)
        
        pred_y.append(pred)
        true_y.append(test_y[:,None])
        err_func.append(error_fuction(pred,test_y[:,None]))
    avg_error = np.array(err_func).mean()
    result = {'Expected labels':true_y, 'Predicted labels': pred_y,'Average error':avg_error }
    return result

def s_partition(x,s):
    return np.array_split(x,s)

def accuracy(x,y):
    x,y = np.array(x),np.array(y)
    pred = (x == y).astype(np.int)
    return pred.mean()

In [7]:
def findOptimalSVM():
    validation_accuracy = np.empty((4,6,2,5,5))
    learning_rate_init = [0.1 , 0.01, 0.001, 0.0001]
    C = [0.01,0.5 , 1 , 10, 50, 100]
    learning_rate= ['constant', 'adaptive']
    max_iter = [10, 35, 50, 100, 500]
    tol = [None, 0.1, 1, 0.01, 5]
    maxScore = 0
    for i,lri in enumerate(learning_rate_init):
        for j,c in enumerate(C):
            for k,lr in enumerate(learning_rate):
                for l,mi in enumerate(max_iter):
                    for m,tl in enumerate(tol):
                        model_args = {'learning_rate' : lr,'C':c,'learning_rate_init' : lri,'max_iter' : mi,'tol' : tl}
                        lrSvM = Linear_SVC(**model_args) 
                        result = sFold(5,X,y,lrSvM, error_fuction = accuracy,**model_args)
                        validation_accuracy[i,j,k,l,m] = result['Average error']
                        if validation_accuracy[i,j,k,l,m] > maxScore:
                            maxScore = validation_accuracy[i,j,k,l,m]
                            index = [i,j,k,l,m]

    print(index)
    a,b,c,d,e= index
    print('optimal learning_rate_init: ',learning_rate_init[a])
    print('optimal C: ',C[b])
    print('optimal learning_rate: ',learning_rate[c])
    print('optimal max_iter',max_iter[d])
    print('optimal tol',tol[e])
    print('optimal value',validation_accuracy[a,b,c,d,e])
    opt_dic = {'learning_rate' : learning_rate[c],'C':C[b],'learning_rate_init' : learning_rate_init[a],'max_iter' : max_iter[d],'tol' : tol[e]}   
    return opt_dic

In [8]:
opt_mod = findOptimalSVM()

[1, 4, 0, 1, 0]
optimal learning_rate_init:  0.01
optimal C:  50
optimal learning_rate:  constant
optimal max_iter 35
optimal tol None
optimal value 0.9800000000000001


In [None]:
model = Linear_SVC(**opt_mod)

In [None]:
opt_mod

In [None]:
model.fit(X_train,y_train)

In [None]:
def decision_boundary_support_vectors(svm_clf, X):
    
    xmin, xmax = X.min() - 1, X.max() + 1
    w = svm_clf.coef_[0]
    b = svm_clf.intercept_[0]
    

    # At the decision boundary, w1*x1 + w2*x2 + b = 0
    # => x2 = -(b + w1* x1)/w1
    x1 = np.linspace(xmin, xmax, 100)    
    decision_boundary = -(b + w[0]*x1)/w[1]
    shifting_factor_for_margin = 1/w[1]
    upper_margin = decision_boundary + shifting_factor_for_margin
    lower_margin = decision_boundary - shifting_factor_for_margin

    svs = svm_clf.support_vectors_
    plt.scatter(svs[:, 0], svs[:, 1], s=200, facecolors='g', label="Support Vectors")
    plt.plot(x1, decision_boundary, "k-", linewidth=2)
    plt.plot(x1, upper_margin, "k--", linewidth=2)
    plt.plot(x1, lower_margin, "k--", linewidth=2)

In [None]:
plt.figure(figsize=(16,8))
plt.plot(X_train[:, 0][y_train==1], X_train[:, 1][y_train==1], "bo", label="Class 0")
plt.plot(X_train[:, 0][y_train==0], X_train[:, 1][y_train==0], "ro", label="Class 1")

decision_boundary_support_vectors(model, X_train)

plt.xlabel("x1", fontsize=14)
plt.ylabel("x2", fontsize=14)
plt.legend(loc="best", fontsize=14)
plt.title("BGD_linearSVM", fontsize=16)
plt.show()

In [None]:
pred = model.predict(X_train)
print("Train Accuracy: ", accuracy_score(pred, y_train))


y_test_predicted = model.predict(X_test)
print("Test Accuracy: ", accuracy_score(y_test_predicted, y_test))


print("\nTest Confusion Matrix:")
print(confusion_matrix(y_test, y_test_predicted))


# early_stopping

In [None]:
X, y = X_train,y_train
model_early = Linear_SVC(**opt_mod, early_stopping = True, validation_fraction=0.2)
model_early.fit(X,y)

In [None]:
print(model_early.coef_,model_early.intercept_)
support_vectors = model_early.support_vectors_
support_vectors

In [None]:
plt.figure(figsize=(16,8))
plt.plot(X[:, 0][y==1], X[:, 1][y==1], "bo", label="Class 0")
plt.plot(X[:, 0][y==0], X[:, 1][y==0], "ro", label="Class 1")

decision_boundary_support_vectors(model_early, X)

plt.xlabel("x1", fontsize=14)
plt.ylabel("x2", fontsize=14)
plt.legend(loc="best", fontsize=14)
plt.title("BGD_linearSVM", fontsize=16)
plt.show()

In [None]:
pred = model_early.predict(X)
print("Train Accuracy: ", accuracy_score(pred, y))


y_test_predicted = model_early.predict(X_test)
print("Test Accuracy: ", accuracy_score(y_test_predicted, y_test))


print("\nTest Confusion Matrix:")
print(confusion_matrix(y_test, y_test_predicted))