# CS182 HW3 Coding [40 points]

In this coding homework, you will be required to complete several models for binary classification and try to find the inplicit relationship of them by yourself. 

**Good luck!**


In [2]:
from sklearn import svm
from scipy import special
import numpy as np

In [3]:
X_train = np.loadtxt('data/X_train.txt')
X_val = np.loadtxt('data/X_val.txt')
X_test = np.loadtxt('data/X_test.txt')
y_train = np.loadtxt('data/y_train.txt')
y_val = np.loadtxt('data/y_val.txt')
y_test = np.loadtxt('data/y_test.txt')

w = np.loadtxt('data/w.txt')
w0 = np.loadtxt('data/w0.txt')

## (a) Simple Perceptron

(1) Activation functions and loss functions are important parts of each neural network, and there are multiple ways of calculating them. 

 **[3 points]** In this question, we ask you to implement the **sigmoid function** and **binary cross entroy loss function** serving for the binary classification.

In [4]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def BCEloss(y_pred, y):
    return - np.mean(y * np.log(y_pred) + (1 - y) * np.log(1 - y_pred))

(2) **[3 points]** In this question, we ask you to implement the **softmax function** and **cross entroy loss function** serving for the multiple classification.

In [5]:
def softmax(x):
    exps = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exps / np.sum(exps, axis=1, keepdims=True)

def cross_entropy_loss(y, y_pre):
    return -np.mean(np.sum(y * np.log(y_pre), axis=1))

(3) **[10 points]** Learning a simple perceptron with **batch GD** (using the given initializations $w^{init}$ and $w^{init}_{0}$) based on the training set ($X_{train}$, $y_{train}$): use the training set and the validation set to obtain a good learning rate (you can set the maximum for iterations to 50 and try different learning rate in [$10^{−4}$,  $10^{−8}$] ); output the learned model and evaluate its performance on the test set with the classification accuracy.

In [70]:
import copy
# BGD Implementation
max_iter = 50
learning_rates = [1e-4, 1e-5, 1e-6, 1e-7, 1e-8]

train_accs = []
train_w = []
train_w0 = []
best_lr = None
best_acc = 0

# compute the gradient of the loss function with respect to the weights
def grad(x, y, w, w0):
    y_pred = sigmoid(np.dot(x, w) + w0)
    grad_w = np.dot(x.T, (y_pred - y)) * (1 / (2 * x.shape[1]))
    return grad_w

# iterate and update the weights
def BatchGD(x, y, w, w0, lr, max_iter):
    for i in range(max_iter):
        w -= lr * grad(x, y, w, w0)
    return w

# predict with the updated weights
def predict(x, w, w0):
    y_pred = sigmoid(np.dot(x, w) + w0)
    return y_pred

# classification to binary
def classify(y_pred):
    y_pred_classes = np.round(y_pred)
    return y_pred_classes

for lr in learning_rates:
    w_curr = copy.deepcopy(w).reshape(-1, 1)
    w0_curr = copy.deepcopy(w0).reshape(-1, 1)
    BatchGD(X_train.T, y_train.reshape(-1, 1), w_curr, w0_curr, lr, max_iter)
    y_pred = predict(X_val.T, w_curr, w0_curr)
    y_pred_classes = classify(y_pred.reshape(-1, 1)).reshape(-1)
    acc = np.mean(y_pred_classes == y_val)
    train_accs.append(acc)
    train_w.append(w_curr)
    train_w0.append(w0_curr)

best_acc = max(train_accs)
best_lr = learning_rates[train_accs.index(best_acc)]

print("Best learning rate:", best_lr)
print("Best accuracy on train:", best_acc)
print("Best Model weights:\n", train_w[train_accs.index(best_acc)])
print("Best Model bias:", train_w0[train_accs.index(best_acc)])


Best learning rate: 0.0001
Best accuracy on train: 0.97
Best Model weights:
 [[-0.12671047]
 [-0.11962769]
 [-0.11669877]
 [-0.12621546]
 [-0.12178165]
 [-0.12109535]
 [-0.12320885]
 [-0.12065023]
 [-0.12631277]
 [-0.11899523]
 [-0.12389209]
 [-0.11669966]
 [-0.12400644]
 [-0.12341878]
 [-0.12122757]
 [-0.12278914]
 [-0.12500693]
 [-0.11854638]
 [-0.12232647]
 [-0.12065187]]
Best Model bias: [[0.]]


In [71]:
# Evaluation with Sigmoid function
# evaluate on test set
w_eval = train_w[train_accs.index(best_acc)]
w0_eval = train_w0[train_accs.index(best_acc)]
y_pred = predict(X_test.T, w_eval, w0_eval)
y_pred_classes = classify(y_pred.reshape(-1, 1)).reshape(-1)
acc = np.mean(y_pred_classes == y_test)
print("Accuracy on test:", acc)

Accuracy on test: 0.985


(4) **[10 points]** Learning a simple perceptron with **SGD** (using the given initializations $w^{init}$ and $w^{init}_{0}$) based on the training set ($X_{train}$, $y_{train}$): use the training set and the validation set to obtain a good learning rate(you can set the maximum for iterations and try different learning rate); output the learned model and evaluate its performance on the test set with the classification accuracy.

In [338]:
# SGD Implementation
import copy
max_iter = 50
learning_rates = [1e-4, 1e-8]

train_accs = []
train_w = []
train_w0 = []
best_lr = None
best_acc = 0

# compute the gradient of the loss function with respect to the weights
def grad(x, y, w, w0):
    y_pred = sigmoid(np.dot(x, w) + w0)
    grad_w = np.dot(x.reshape((x.shape[0], 1)) , (y_pred - y)).reshape((x.shape[0], 1)) * 1 / 2
    return grad_w

# iterate and update the weights
def StochasticGD(x, y, w, w0, lr, max_iter):
    for i in range(max_iter):
        index = np.random.randint(0, y.shape[0])
        w -= lr * grad(x[index], y[index], w, w0)
    return w

# predict with the updated weights
def predict(x, w, w0):
    y_pred = sigmoid(np.dot(x, w) + w0)
    return y_pred

# classification to binary
def classify(y_pred):
    y_pred_classes = np.round(y_pred)
    return y_pred_classes

for lr in learning_rates:
    w_curr = copy.deepcopy(w).reshape(-1, 1)
    w0_curr = copy.deepcopy(w0).reshape(-1, 1)
    StochasticGD(X_train.T, y_train.reshape(-1, 1), w_curr, w0_curr, lr, max_iter)
    y_pred = predict(X_val.T, w_curr, w0_curr)
    y_pred_classes = classify(y_pred.reshape(-1, 1)).reshape(-1)
    acc = np.mean(y_pred_classes == y_val)
    train_accs.append(acc)
    train_w.append(w_curr)
    train_w0.append(w0_curr)

best_acc = max(train_accs)
best_lr = learning_rates[train_accs.index(best_acc)]

print("Best learning rate:", best_lr)
print("Best accuracy on train:", best_acc)
print("Best Model weights:\n", train_w[train_accs.index(best_acc)])
print("Best Model bias:", train_w0[train_accs.index(best_acc)])

Best learning rate: 0.0001
Best accuracy on train: 0.97
Best Model weights:
 [[-0.00041511]
 [-0.00058675]
 [-0.00043264]
 [-0.00053583]
 [-0.00058144]
 [-0.00067721]
 [-0.00079495]
 [-0.00056218]
 [-0.00081143]
 [-0.00064239]
 [-0.00021545]
 [-0.00044725]
 [-0.00042415]
 [-0.00037599]
 [-0.00043494]
 [-0.00060382]
 [-0.00074746]
 [-0.00050512]
 [-0.00037839]
 [-0.00047951]]
Best Model bias: [[0.]]


In [339]:
# Evaluation with Sigmoid function
# evaluate on test set
w_eval = train_w[train_accs.index(best_acc)]
w0_eval = train_w0[train_accs.index(best_acc)]
y_pred = predict(X_test.T, w_eval, w0_eval)
y_pred_classes = classify(y_pred.reshape(-1, 1)).reshape(-1)
acc = np.mean(y_pred_classes == y_test)
print("Accuracy on test:", acc)

Accuracy on test: 0.985


## (b) SVM

(1) **[10 points]** Use the function **‘svm’** in package **‘sklearn’** to do the binary classification. Output the model and evaluate its performance on each dataset with the classification accuracy.

In [342]:
# SVM Implementation
clf = svm.SVC()
clf.fit(X_train.T, y_train)
y_train_pred = clf.predict(X_train.T)
y_val_pred = clf.predict(X_val.T)
y_test_pred = clf.predict(X_test.T)
print("Accuracy on train:", np.mean(y_train_pred == y_train))
print("Accuracy on val:", np.mean(y_val_pred == y_val))
print("Accuracy on test:", np.mean(y_test_pred == y_test))

Accuracy on train: 0.9864285714285714
Accuracy on val: 0.98
Accuracy on test: 0.965


## (c) Compare

(1) **[4 points]** Try to compare  models learned from (a)(3), (a)(4) and (b). Write down your explanation and data support.