# CIS 5200: Machine Learning
## Homework 2

In [99]:
import os
import sys

# For autograder only, do not modify this cell.
# True for Google Colab, False for autograder
NOTEBOOK = (os.getenv('IS_AUTOGRADER') is None)
if NOTEBOOK:
    print("[INFO, OK] Google Colab.")
else:
    print("[INFO, OK] Autograder.")
    sys.exit()

[INFO, OK] Google Colab.


## Penngrader setup

In [100]:
# %%capture
!pip install penngrader-client



In [101]:
%%writefile config.yaml
grader_api_url: 'https://23whrwph9h.execute-api.us-east-1.amazonaws.com/default/Grader23'
grader_api_key: 'flfkE736fA6Z8GxMDJe2q8Kfk8UDqjsG3GVqOFOa'

Overwriting config.yaml


In [102]:
from penngrader.grader import PennGrader

# PLEASE ENSURE YOUR PENN-ID IS ENTERED CORRECTLY. IF NOT, THE AUTOGRADER WON'T KNOW WHO
# TO ASSIGN POINTS TO YOU IN OUR BACKEND
STUDENT_ID = 72249835 # YOUR PENN-ID GOES HERE AS AN INTEGER #
SECRET = STUDENT_ID

grader = PennGrader('config.yaml', 'CIS5200_FALL_2023_HW2', STUDENT_ID, SECRET)

PennGrader initialized with Student ID: 72249835

Make sure this correct or we will not be able to store your grade


# Dataset: Wine Quality Prediction

Some research on blind wine tasting has suggested that [people cannot taste the difference between ordinary and pricy wine brands](https://phys.org/news/2011-04-expensive-inexpensive-wines.html). Indeed, even experienced tasters may be as consistent as [random numbers](https://www.seattleweekly.com/food/wine-snob-scandal/).

In this problem set, we will train some simple linear models to predict wine quality. We'll be using the data from [this repository](https://archive.ics.uci.edu/ml/datasets/Wine+Quality) for both the classification and regression tasks. The following cells will download and set up the data for you.

In [103]:
%%capture
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality.names

In [104]:
from sklearn.model_selection import train_test_split
import pandas as pd
import torch

red_df = pd.read_csv('winequality-red.csv', delimiter=';')

X = torch.from_numpy(red_df.drop(columns=['quality']).to_numpy())
y = torch.from_numpy(red_df['quality'].to_numpy())

# Split data into train/test splits
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

# Normalize the data to have zero mean and standard deviation,
# and add bias term
mu, sigma = X_train.mean(0), X_train.std(0)
X_train, X_test = [ torch.cat([((x-mu)/sigma).float(), torch.ones(x.size(0),1)], dim=1)
                    for x in [X_train, X_test]]

# Transform labels to {-1,1} for logistic regression
y_binary_train, y_binary_test = [ (torch.sign(y - 5.5)).long()
                                  for y in [y_train, y_test]]
y_regression_train, y_regression_test = [ y.float() for y in [y_train, y_test]]

# 1. Logistic Regression

In this first problem, you will implement a logistic regression classifier to classify good wine (`y=1`) from bad wine (`y=-1`). Your professor has arbitrarily decided that good wine has a score of at least 5.5. The classifier is split into the following components:

1. Loss (3pts) & gradient (3pts) - given a batch of examples $X$ and labels $y$ and weights for the logistic regression classifier, compute the batched logistic loss and gradient of the loss with *with respect to the model parameters $w$*. Note that this is slightly different from the gradient in Homework 0, which was with respect to the sample $X$.
2. Fit (2pt) - Given a loss function and data, find the weights of an optimal logistic regression model that minimizes the logistic loss
3. Predict (3pts) - Given the weights of a logistic regression model and new data, predict the most likely class

We provide an generic gradient-based optimizer for you which minimizes the logistic loss function, you can call it with `LogisticOptimizer().optimize(X,y)`. It does not need any parameter adjustment.

Hint: The optimizer will minimize the logistic loss. So this value of this loss should be decreasing over iterations.

In [105]:
class LogisticOptimizer:
    @staticmethod
    def logistic_loss(X, y, w):
        # Given a batch of samples and labels, and the weights of a logistic
        # classifier, compute the batched logistic loss.
        #
        # X := Tensor(float) of size (m,d) --- This is a batch of m examples of
        #     of dimension d
        #
        # y := Tensor(int) of size (m,) --- This is a batch of m labels in {-1,1}
        #
        # w := Tensor(float) of size(d,) --- This is the weights of a logistic
        #     classifer.
        #
        # Return := Tensor of size (m,) --- This is the logistic loss for each
        #     example.

        # Fill in the rest
        loss = torch.log(1 + torch.exp(-y * (X @ w)))
        return loss

    @staticmethod
    def logistic_gradient(X, y, w):
        # Given a batch of samples and labels, compute the batched gradient of
        # the logistic loss.
        #
        # X := Tensor(float) of size (m,d) --- This is a batch of m examples of
        #     of dimension d
        #
        # y := Tensor(int) of size (m,) --- This is a batch of m labels in {-1,1}
        #
        # w := Tensor(float) of size(d,) --- This is the weights of a logistic
        #     classifer.
        #
        # Return := Tensor of size (m,d) --- This is the logistic gradient for each
        #     example.
        #
        # Hint: A very similar gradient was calculated in Homework 0.
        # However, that was the sample gradient (with respect to X), whereas
        # what we need here is the parameter gradient (with respect to w).

        # Fill in the rest
        gradient =  ((-y * torch.exp(-y * (X @ w))) / (1 + torch.exp(-y * (X @ w))))[:,None] * X
        return gradient

    def optimize(self, X, y, niters=100):
        # Given a dataset of examples and labels, minimizes the logistic loss
        # using standard gradient descent.
        #
        # This optimizer is written for you, and you only need to implement the
        # logistic loss and gradient functions above.
        #
        # X := Tensor(float) of size (m,d) --- This is a batch of m examples of
        #     of dimension d
        #
        # y := Tensor(int) of size (m,) --- This is a batch of m labels in {-1,1}
        #
        # Return := Tensor of size(d,) --- This is the fitted weights of a
        #     logistic regression model

        m,d = X.size()
        w = torch.zeros(d)
        print('Optimizing logistic function...')
        for i in range(niters):
            loss = self.logistic_loss(X,y,w).mean()
            grad = self.logistic_gradient(X,y,w).mean(0)
            w -= grad
            if i % 50 == 0:
                print(i, loss.item())
        print('Optimizing done.')
        return w

def logistic_fit(X, y, optimizer=LogisticOptimizer):
    # Given a dataset of examples and labels, fit the weights of the logistic
    # regression classifier using the provided loss function and optimizer
    #
    # X := Tensor(float) of size (m,d) --- This is a batch of m examples of
    #     of dimension d
    #
    # y := Tensor(int) of size (m,) --- This is a batch of m labels in {-1,1}
    #
    # Return := Tensor of size (d,) --- This is the fitted weights of the
    #     logistic regression model
    opt = LogisticOptimizer()

    # Fill in the rest
    weight = opt.optimize(X, y)

    return weight

def logistic_predict(X, w):
    # Given a dataset of examples and fitted weights for a logistic regression
    # classifier, predict the class
    #
    # X := Tensor(float) of size(m,d) --- This is a batch of m examples of
    #    dimension d
    #
    # w := Tensor(float) of size (d,) --- This is the fitted weights of the
    #    logistic regression model
    #
    # Return := Tensor of size (m,) --- This is the predicted classes {-1,1}
    #    for each example
    #
    # Hint: Remember that logistic regression expects a label in {-1,1}, and
    # not {0,1}

    # Fill in the rest
    pred = torch.sign(X @ w)

    return pred


In [106]:
# Test your code on the wine dataset!
# How does your solution compare to a random linear classifier?
# Your solution should get around 75% accuracy on the test set.
torch.manual_seed(42)

d = X_train.size(1)
logistic_weights = {
    'zero': torch.zeros(d),
    'random': torch.randn(d),
    'fitted': logistic_fit(X_train, y_binary_train)
}

for k,w in logistic_weights.items():
    yp_binary_train = logistic_predict(X_train, w)
    acc_train = (yp_binary_train == y_binary_train).float().mean()

    print(f'Train accuracy [{k}]: {acc_train.item():.2f}')

    yp_binary_test = logistic_predict(X_test, w)
    acc_test = (yp_binary_test == y_binary_test).float().mean()

    print(f'Test accuracy [{k}]: {acc_test.item():.2f}')

Optimizing logistic function...
0 0.6931473016738892
50 0.518214225769043
Optimizing done.
Train accuracy [zero]: 0.00
Test accuracy [zero]: 0.00
Train accuracy [random]: 0.54
Test accuracy [random]: 0.53
Train accuracy [fitted]: 0.75
Test accuracy [fitted]: 0.75


### Autograder
Be sure you can pass the following four test cases!

In [107]:
grader.grade(test_case_id = 'logistic_loss', answer = LogisticOptimizer.logistic_loss)
grader.grade(test_case_id = 'logistic_gradient', answer = LogisticOptimizer.logistic_gradient)
grader.grade(test_case_id = 'logistic_fit', answer = logistic_fit)
grader.grade(test_case_id = 'logistic_predict', answer = logistic_predict)

Correct! You earned 3/3 points. You are a star!

Your submission has been successfully recorded in the gradebook.
Correct! You earned 3/3 points. You are a star!

Your submission has been successfully recorded in the gradebook.
Correct! You earned 2/2 points. You are a star!

Your submission has been successfully recorded in the gradebook.
Correct! You earned 3/3 points. You are a star!

Your submission has been successfully recorded in the gradebook.


# 2. Linear Regression with Ridge Regression

In this second problem, you'll implement a linear regression model. Similarly to the first problem, implement the following functions:

1. Loss (3pts) - Given a batch of examples $X$ and labels $y$, compute the batched mean squared error loss for a linear model with weights $w$.
2. Fit (4pts) - Given a batch of examples $X$ and labels $y$, find the weights of the optimal linear regression model
3. Predict (3pts) - Given the weights $w$ of a linear regression model and new data $X$, predict the most likely label

This time, you are not given an optimizer for the fitting function since this problem has an analytic solution. Make sure to test your solution with non-zero ridge regression parameters.

In [108]:
def regression_loss(X, y, w):
    # Given a batch of linear regression outputs and true labels, compute
    # the batch of squared error losses. This is *without* the ridge
    # regression penalty.
    #
    # X := Tensor(float) of size (m,d) --- This is a batch of m examples of
    #     of dimension d
    #
    # y := Tensor(int) of size (m,) --- This is a batch of m real-valued labels
    #
    # w := Tensor(float) of size(d,) --- This is the weights of a linear
    #     classifer
    #
    # Return := Tensor of size (m,) --- This is the squared loss for each
    #     example

    # Fill in the rest
    loss = (y - X @ w)**2

    return loss

def regression_fit(X, y, ridge_penalty=1.0):
    # Given a dataset of examples and labels, fit the weights of the linear
    # regression classifier using the provided loss function and optimizer
    #
    # X := Tensor(float) of size (m,d) --- This is a batch of m examples of
    #     of dimension d
    #
    # y := Tensor(float) of size (m,) --- This is a batch of m real-valued
    #     labels
    #
    # ridge_penalty := float --- This is the parameter for ridge regression
    #
    # Return := Tensor of size (d,) --- This is the fitted weights of the
    #     linear regression model
    #
    # Fill in the rest
    m,d = X.size()
    I = torch.eye(d)
    weight = torch.inverse(X.T @ X + ridge_penalty * m * I) @ X.T @ y
    return weight


def regression_predict(X, w):
    # Given a dataset of examples and fitted weights for a linear regression
    # classifier, predict the label
    #
    # X := Tensor(float) of size(m,d) --- This is a batch of m examples of
    #    dimension d
    #
    # w := Tensor(float) of size (d,) --- This is the fitted weights of the
    #    linear regression model
    #
    # Return := Tensor of size (m,) --- This is the predicted real-valued labels
    #    for each example
    #
    # Fill in the rest
    pred = X @ w

    return pred

In [109]:
# Test your code on the wine dataset!
# How does your solution compare to a random linear classifier?
# Your solution should get an average squard error of about 8.6 test set.
torch.manual_seed(42)

d = X_train.size(1)
regression_weights = {
    'zero': torch.zeros(d),
    'random': torch.randn(d),
    'fitted': regression_fit(X_train, y_regression_train)
}

for k,w in regression_weights.items():
    yp_regression_train = regression_predict(X_train, w)
    squared_loss_train = regression_loss(X_train, y_regression_train, w).mean()

    print(f'Train accuracy [{k}]: {squared_loss_train.item():.2f}')

    yp_regression_test = regression_predict(X_test, w)
    squared_loss_test = regression_loss(X_test, y_regression_test, w).mean()

    print(f'Test accuracy [{k}]: {squared_loss_test.item():.2f}')

Train accuracy [zero]: 32.28
Test accuracy [zero]: 32.97
Train accuracy [random]: 29.64
Test accuracy [random]: 29.55
Train accuracy [fitted]: 8.37
Test accuracy [fitted]: 8.60


### Autograder
Be sure you can pass the following three test cases!

In [110]:
grader.grade(test_case_id = 'regression_loss', answer = regression_loss)
grader.grade(test_case_id = 'regression_fit', answer = regression_fit)
grader.grade(test_case_id = 'regression_predict', answer = regression_predict)

Correct! You earned 3/3 points. You are a star!

Your submission has been successfully recorded in the gradebook.
Correct! You earned 4/4 points. You are a star!

Your submission has been successfully recorded in the gradebook.
Correct! You earned 3/3 points. You are a star!

Your submission has been successfully recorded in the gradebook.


# SVM and Gradient Descent (10 pts)

In this problem, you'll implement (soft margin) support vector machines with gradient descent.
+ (2pts) Calculate the objective of the Soft SVM (primal)
+ (2pts) Calculate the gradient of the Soft SVM objective
+ (4pts) Implement a gradient descent optimizer. Your solution needs to converge to an accurate enough answer.
+ (2pts) Make predictions with the Soft SVM

Tips:
- This assignment is more freeform than previous ones. You're allowed to initialize the parameters of the SVM model however you want, as long as your implemented functions return the right values.
- You'll need to play with the values of step size and number of iterations to
converge to a good value.
- To debug your optimization, print the objective over iterations. Remember that the theory says as long as the learning rate is small enough, for strongly convex problems, we are guaranteed to converge at a certain rate. What does this imply about your solution if it is not converging?
- As a sanity check, you can get around 97.5% prediction accuracy and converge to an objective below 0.16.  

In [221]:
from scipy.optimize._lsq.dogbox import LinearOperator
class SoftSVM():
    def __init__(self, ndims):
        # Here, we initialize the parameters of your soft-SVM model for binary
        # classification. Don't change the weight and bias variables as the
        # autograder will assume that these exist.
        # ndims := integer -- number of dimensions
        # no return type

        self.weight = torch.zeros(ndims)
        self.bias = torch.zeros(1)
        self.weight.requires_grad = True
        self.bias.requires_grad = True

    def objective(self, X, y, l2_reg):
        # Calculate the objective of your soft-SVM model
        # X := Tensor of size (m,d) -- the input features of m examples with d dimensions
        # y := Tensor of size (m) -- the labels for each example in X
        # l2_reg := float -- L2 regularization penalty
        # Returns a scalar tensor (zero dimensional tensor) -- the loss for the model
        # Fill in the rest
        hinge = torch.mean(torch.max(1 - y * (X @ self.weight + self.bias), torch.zeros(1)))
        reg = l2_reg * torch.norm(self.weight, p=2)**2
        obj = hinge + reg

        return obj


    def gradient(self, X, y, l2_reg):
        # Calculate the gradient of your soft-SVM model
        # X := Tensor of size (m,d) -- the input features of m examples with d dimensions
        # y := Tensor of size (m) -- the labels for each example in X
        # l2_reg := float -- L2 regularization penalty
        # Return Tuple (Tensor, Tensor) -- the tensors corresponds to the weight
        # and bias parameters respectively
        # Fill in the rest
        hinge = torch.max(1 - y * (X @ self.weight + self.bias), torch.zeros(1))
        hinge_grad = -y * (hinge > 0).float()
        reg_grad = 2 * l2_reg * self.weight
        weight_grad = (X.T @ hinge_grad) / X.shape[0] + reg_grad
        bias_grad = torch.sum(hinge_grad) / X.shape[0]

        return weight_grad, bias_grad

    def optimize(self, X, y, l2_reg):
        # Calculate the gradient of your soft-SVM model
        # X := Tensor of size (m,d) -- the input features of m examples with d dimensions
        # y := Tensor of size (m) -- the labels for each example in X
        # l2_reg := float -- L2 regularization penalty

        # no return type

        # Fill in the rest
        for _ in range(10000):
            obj = self.objective(X, y, l2_reg)
            weight_grad, bias_grad = self.gradient(X, y, l2_reg)
            self.weight.data -= 0.001 * weight_grad
            self.bias.data -= 0.001 * bias_grad


    def predict(self, X):
        # Given an X, make a prediction with the SVM
        # X := Tensor of size (m,d) -- features of m examples with d dimensions
        # Return a tensor of size (m) -- the prediction labels on the dataset X

        # Fill in the rest
        pred= torch.sign(X @ self.weight + self.bias)
        return pred

In [222]:
from sklearn import datasets

#Load dataset
cancer = datasets.load_breast_cancer()
X,y = torch.from_numpy(cancer['data']), torch.from_numpy(cancer['target'])
mu,sigma = X.mean(0,keepdim=True), X.std(0,keepdim=True)
X,y = ((X-mu)/sigma).float(),(y - 0.5).sign() # prepare data
l2_reg = 0.1
print(X.size(), y.size())

# Optimize the soft-SVM with gradient descent
clf = SoftSVM(X.size(1))
clf.optimize(X,y,l2_reg)
print("\nSoft SVM objective: ")
print(clf.objective(X,y,l2_reg).item())
print("\nSoft SVM accuracy: ")
(clf.predict(X) == y).float().mean().item()

torch.Size([569, 30]) torch.Size([569])

Soft SVM objective: 
0.15918760001659393

Soft SVM accuracy: 


0.9753954410552979

### Autograder
Be sure you can pass the following four test cases!

In [223]:
grader.grade(test_case_id = 'SVM_objective', answer = SoftSVM)
grader.grade(test_case_id = 'SVM_gradient', answer = SoftSVM)
grader.grade(test_case_id = 'SVM_optimize', answer = SoftSVM)
grader.grade(test_case_id = 'SVM_predict', answer = SoftSVM)

Correct! You earned 2/2 points. You are a star!

Your submission has been successfully recorded in the gradebook.
Correct! You earned 2/2 points. You are a star!

Your submission has been successfully recorded in the gradebook.
Correct! You earned 4/4 points. You are a star!

Your submission has been successfully recorded in the gradebook.
Correct! You earned 2/2 points. You are a star!

Your submission has been successfully recorded in the gradebook.


# Submitting to Gradescope
Before submitting to Gradescope, make sure that selecting "Runtime" -> "Restart and run all" completes all cells without errors.

1. Go to the File menu and choose "Download .ipynb" and also "Download .py". Make sure these files are named homework3.ipynb and homework3.py, respectively
2. Go to GradeScope through the canvas page and ensure your class is "BAN_CIS-5200-001 202330"
3. Select Homework 2
4. Upload both files (the .ipynb and the .py)
5. PLEASE CHECK THE AUTOGRADER OUTPUT TO ENSURE YOUR SUBMISSION IS PROCESSED CORRECTLY! If this is the case, you should be all set with the programming component of this homework!