# Homework 5: Logistic Regression and Support Vector Machines

by Natalia Frumkin and Karanraj Chauhan with help from B. Kulis, R. Manzelli, and A. Tsiligkardis

## Problem 1: SVM Toy Example

Given the following two-class data set:

**Class -1: **
A = (1,1)
B = (2,3)

**Class +1: **
C = (2,5)
D = (4,2)

<ol type="a">
  <li>Plot the data.</li>
  <li>Plot the hyperplane described by w = $(3,2)^T, b = -12$</li>
  <li>Calculate the $l_2$ distance of data point C from the hyperplane.</li>
  <li>Determine if the hyperplane linearly separates the data. Explain.</li>
  <li>Calculate the hard margin SVM hyperplane in canonical form.</li>
  <li>Which, if any, data points lie on the SVM hyperplane?</li>
</ol>

## Problem 2: Logistic Regression

<p>In this problem, we will use a logistic regression model to classify emails as "spam" (1) or "non-spam" (0). Recall that the hypothesis/decision rule in a logistic regression model is given by</p>

$$h_\theta(x) = \sigma(\theta^Tx) \\ \text{where } \sigma  \text{ is the sigmoid function}$$

<p>Since logistic regression does not have a closed form solution, we will use gradient descent to obtain the parameters $\theta$. We will use the negative log likelihood loss with L2 regularization as the loss function. Mathematically, the loss function $l(\theta)$ for a given set of parameters $\theta$ will be,</p>

$$l(\theta) = NLL(\theta) + \frac{\lambda}{2}||\theta||^2 \\ \text{where } NLL(\theta) = -\sum_{i=1}^{n} y_i\log(h(x_i)) + (1 - y_i)\log(1 - h(x_i))$$

<p>The good news is, you won't have to worry about these equations for implementing gradient descent (hurray!). However, what you will need is the gradient or the derivative of the loss function. For a given $n$$ x $$d$ matrix $X$ of data, $n$ x $1$ vector of labels (0/1) $y$, and corresponding $n$ x $1$ vector of predictions $\hat{y}$, the loss function gradient is</p>

$$\nabla l(\theta) = (\hat{y} - y)^{T} \cdot X + \lambda \cdot \theta$$

<ol type="a">
    <li>Load the dataset file spambase_data.csv using pandas, and then split the dataset into a train set and a test set. Note: train/test ratio of 0.8/0.2 has been known to work, but you are welcome to try other values.</li>
    <li>Using the loss gradient equation above, implement gradient descent (use only the train set for this) to find the parameters $\theta$ of the logistic regression model. Note: $learning$ $rate = 0.00001$, $\lambda$ = $10$, and $number$ $of$ $steps = 3000$ have been known to give a decent accuracy but you are welcome to try other values, especially for $number$ $of$ $steps$.</li>
    <li>Report the correct classification rate (CCR) of the model on train data and test data. The CCR is defined as $$CCR = \frac{num\_correct\_predictions}{num\_samples}$$</li>   
</ol>

In [1]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split

In [2]:
# read in raw dataset
raw_data = pd.read_csv("spambase_data.csv").values

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(raw_data[:,:-1], raw_data[:,-1], test_size=0.2)

In [3]:
# fit logistic regression model
theta = np.zeros(X_train.shape[1])
num_steps = 3000
sigma = 10 
learn_rate = 0.00001

for i in range(num_steps):
    y_hat = 1/(1 + np.exp(-(np.dot(X_train, theta))))
    gradient = np.dot(X_train.T, (y_hat[0] - y_train)) + (sigma*theta)
    theta -= learn_rate*gradient    

  


In [4]:
# predict on test data and train data and calculate CCR
y_hat_train = 1/(1 + np.exp(-(np.dot(X_train, theta))))
y_hat_test = 1/(1 + np.exp(-(np.dot(X_test, theta))))
train_CCR = 0
test_CCR = 0

for i in range(y_train.shape[0]):
    if((y_train[i] == 1 and y_hat_train[i] == 1) or (y_train[i] != 1 and y_hat_train[i] != 1)):
        train_CCR += 1
train_CCR /= y_train.shape[0]

for i in range(y_test.shape[0]):
    if((y_test[i] == 1 and y_hat_test[i] == 1) or (y_test[i] != 1 and y_hat_test[i] != 1)):
        test_CCR += 1
test_CCR /= y_test.shape[0]

print("Train CCR: ", train_CCR)
print("Test CCR: ", test_CCR)

Train CCR:  0.5195652173913043
Test CCR:  0.5418023887079262


  
  This is separate from the ipykernel package so we can avoid doing imports until
