# Homework 5: Logistic Regression and Support Vector Machines

by Natalia Frumkin and Karanraj Chauhan with help from B. Kulis, R. Manzelli, and A. Tsiligkardis

## Problem 1: SVM Toy Example

Given the following two-class data set:

**Class -1: **
A = (1,1)
B = (2,3)

**Class +1: **
C = (2,5)
D = (4,2)

<ol type="a">
  <li>Plot the data.</li>
  <li>Plot the hyperplane described by w = $(3,2)^T, b = -12$</li>
  <li>Calculate the $l_2$ distance of data point C from the hyperplane.</li>
  <li>Determine if the hyperplane linearly separates the data. Explain.</li>
  <li>Calculate the hard margin SVM hyperplane in canonical form.</li>
  <li>Which, if any, data points lie on the SVM hyperplane?</li>
</ol>

## Problem 2: Logistic Regression

<p>In this problem, we will use a logistic regression model to classify emails as "spam" (1) or "non-spam" (0). Recall that the hypothesis/decision rule in a logistic regression model is given by</p>

$$h_\theta(x) = \sigma(\theta^Tx) \\ \text{where } \sigma  \text{ is the sigmoid function}$$

<p>Since logistic regression does not have a closed form solution, we will use gradient descent to obtain the parameters $\theta$. We will use the negative log likelihood loss with L2 regularization as the loss function. Mathematically, the loss function $l(\theta)$ for a given set of parameters $\theta$ will be,</p>

$$l(\theta) = NLL(\theta) + \frac{\lambda}{2}||\theta||^2 \\ \text{where } NLL(\theta) = -\sum_{i=1}^{n} y_i\log(h(x_i)) + (1 - y_i)\log(1 - h(x_i))$$

<p>The good news is, you won't have to worry about these equations for implementing gradient descent (hurray!). However, what you will need is the gradient or the derivative of the loss function. For a given $n$$ x $$d$ matrix $X$ of data, $n$ x $1$ vector of labels (0/1) $y$, and corresponding $n$ x $1$ vector of predictions $\hat{y}$, the loss function gradient is</p>

$$\nabla l(\theta) = (\hat{y} - y)^{T} \cdot X + \lambda \cdot \theta$$

<ol type="a">
    <li>Load the dataset file spambase_data.csv using pandas, and then split the dataset into a train set and a test set. Note: train/test ratio of 0.8/0.2 has been known to work, but you are welcome to try other values.</li>
    <li>Using the loss gradient equation above, implement gradient descent (use only the train set for this) to find the parameters $\theta$ of the logistic regression model. Note: $learning$ $rate = 0.00001$, $\lambda$ = $10$, and $number$ $of$ $steps = 3000$ have been known to give a decent accuracy but you are welcome to try other values, especially for $number$ $of$ $steps$.</li>
    <li>Report the correct classification rate (CCR) of the model on train data and test data. The CCR is defined as $$CCR = \frac{num\_correct\_predictions}{num\_samples}$$</li>   
</ol>

In [2]:
import numpy as np
import pandas as pd

In [312]:
# read in raw dataset
spam_data = pd.read_csv("spambase_data.csv", names = ['0','1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','37','38','39','40','41','42','43','44','45','46','47','48','49','50','51','52','53','54','55','56', "class"])
spam_data = spam_data.drop(spam_data.index[0])
X = spam_data
y = spam_data.pop("class")


# split into train and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .2)

X_train = X_train.values
y_train = y_train.values
X_test = X_test.values
y_test = y_test.values


#Augmenting matrix to include bias
X_aug = np.append(X_train, np.ones((3680, 1)), axis=1)

#X_test = np.append(X_test, np.ones((921, 1)), axis=1)
#print(X_test.shape)


In [313]:
# fit logistic regression model

#Randomly generate starting w values. Last w value is for bias
w = np.random.rand(58, 1)


steps = 30000
rate = .00001
lam = 10


for i in range(steps):
    y_hat = 1 / (1 + np.exp(-(X_aug[:, 0:57]@w[0:57, :] + w[57])))
    #y_hat = np.sign(X_aug[:, 0:57]@w[0:57, :] + w[57])
    y_train = y_train.reshape(3680, 1)

    #print(i)
    #print((X_aug.T @ (y_hat - y_train)).shape)
    #print((lam*w).shape)
    grad = X_aug.T @ (y_hat - y_train) + lam*w
    #print(grad)
    w = w - rate*grad

#print(w)

In [314]:
# predict on train data and calculate CCR
y_pred = 1 / (1 + np.exp(-(X_aug[:, 0:57]@w[0:57, :] + w[57])))

count = 0
index = 0
for i in y_pred:
    if i >= 0.5 and y_train[index] > 0:
        count += 1
    elif i < 0.5 and y_train[index] == 0:
        count += 1
    index += 1
    
print("Train CCR: ", count/index)


# predict on test data and calculate CCR
y_pred2 = 1 / (1 + np.exp(-(X_test@w[0:57, :] + w[57])))

count2 = 0
index2 = 0
for i in y_pred2:

    if i >= 0.5 and y_test[index2] > 0:
        count2 += 1
    elif i < 0.5 and y_test[index2] == 0:
        count2 += 1
    index2 += 1
    
print("Test CCR: ", count2/index2)

Train CCR:  0.8152173913043478
Test CCR:  0.7947882736156352
