# Homework 5: Logistic Regression and Support Vector Machines

by Natalia Frumkin and Karanraj Chauhan with help from B. Kulis, R. Manzelli, and A. Tsiligkardis

## Problem 1: SVM Toy Example

Given the following two-class data set:

**Class -1: **
A = (1,1)
B = (2,3)

**Class +1: **
C = (2,5)
D = (4,2)

<ol type="a">
  <li>Plot the data.</li>
  <li>Plot the hyperplane described by w = $(3,2)^T, b = -12$</li>
  <li>Calculate the $l_2$ distance of data point C from the hyperplane.</li>
  <li>Determine if the hyperplane linearly separates the data. Explain.</li>
  <li>Calculate the hard margin SVM hyperplane in canonical form.</li>
  <li>Which, if any, data points lie on the SVM hyperplane?</li>
</ol>

## Problem 2: Logistic Regression

<p>In this problem, we will use a logistic regression model to classify emails as "spam" (1) or "non-spam" (0). Recall that the hypothesis/decision rule in a logistic regression model is given by</p>

$$h_\theta(x) = \sigma(\theta^Tx) \\ \text{where } \sigma  \text{ is the sigmoid function}$$

<p>Since logistic regression does not have a closed form solution, we will use gradient descent to obtain the parameters $\theta$. We will use the negative log likelihood loss with L2 regularization as the loss function. Mathematically, the loss function $l(\theta)$ for a given set of parameters $\theta$ will be,</p>

$$l(\theta) = NLL(\theta) + \frac{\lambda}{2}||\theta||^2 \\ \text{where } NLL(\theta) = -\sum_{i=1}^{n} y_i\log(h(x_i)) + (1 - y_i)\log(1 - h(x_i))$$

<p>The good news is, you won't have to worry about these equations for implementing gradient descent (hurray!). However, what you will need is the gradient or the derivative of the loss function. For a given $n$$ x $$d$ matrix $X$ of data, $n$ x $1$ vector of labels (0/1) $y$, and corresponding $n$ x $1$ vector of predictions $\hat{y}$, the loss function gradient is</p>

$$\nabla l(\theta) = (\hat{y} - y)^{T} \cdot X + \lambda \cdot \theta$$

<ol type="a">
    <li>Load the dataset file spambase_data.csv using pandas. The last column in the data is the true labels column i.e. the $y$ vector (1 means spam, 0 means not spam), and the rest of the data is the features matrix i.e. the $X$ matrix. Split the dataset into a train set and a test set. Note: train/test ratio of 0.8/0.2 has been known to work, but you are welcome to try other values.</li>
    <li>Using the loss gradient equation above, implement gradient descent (use only the train set for this) to find the parameters $\theta$ of the logistic regression model. Note: $learning$ $rate = 0.00001$, $\lambda$ = $10$, and $number$ $of$ $steps = 3000$ have been known to give a decent accuracy but you are welcome to try other values, especially for $number$ $of$ $steps$.</li>
    <li>Report the correct classification rate (CCR) of the model on train data and test data. The CCR is defined as $$CCR = \frac{num\_correct\_predictions}{num\_samples}$$</li>   
</ol>

In [174]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [68]:
# read in raw dataset
df=pd.read_csv('./spambase_data.csv')

df1 = df.loc[:, df.columns != '57']

X_train, X_test, y_train, y_test = train_test_split(df1, df.iloc[:,-1], test_size=0.2)

# split into train and test sets

#print(X_train)

In [200]:
# fit logistic regression model
def sigmoid(scores):
    return 1 / (1 + np.exp(-scores))

def logistic_regression(features, target, num_steps, learning_rate):
    
    intercept = np.ones((features.shape[0], 1))
    features = np.hstack((intercept, features))
    weights = np.zeros(features.shape[1])
    for step in range(num_steps):
        #scores = np.dot(weights, features.transpose())
        scores = np.dot(features, weights)
        predictions = sigmoid(scores)

        # Update weights with gradient
        #output_error_signal = target - predictions
        output_error_signal =  predictions - target 
        gradient = np.dot( output_error_signal.T, features) + np.dot(10,weights)
        #gradient = np.dot(features.transpose(),output_error_signal) + np.dot(10,weights)
        weights -= learning_rate * gradient
        
    return weights



In [201]:
# predict on test data and train data and calculate CCR

weights = logistic_regression(X_train, y_train, num_steps = 3000, learning_rate = 0.00001)
weights_test = logistic_regression(X_test, y_test, num_steps = 3000, learning_rate = 0.00001)
#data = np.hstack((np.ones((X_train.shape[0],1)), X_train))
bias = weights[0]
y = range(1, 58)
w_final = [weights[i] for i in y]
w_final = np.asmatrix(w_final)

results = np.round(sigmoid(np.dot(w_final,X_train.transpose()) + bias))
results = results.tolist()
train_score = accuracy_score(y_train, results[0])

bias1 = weights_test[0]
y = range(1, 58)
w_test = [weights_test[i] for i in y]
w_test = np.asmatrix(w_test)

results_test = np.round(sigmoid(np.dot(w_test,X_test.transpose()) + bias1))
results_test = results_test.tolist()
test_score = accuracy_score(y_test, results_test[0])

print('Train CCR: ',train_score,'Test CCR: ', test_score)

  This is separate from the ipykernel package so we can avoid doing imports until


Train CCR:  0.6391304347826087 Test CCR:  0.742671009771987
