# Logistic Regression Assignment (due 26 November)

In this practical you will learn how to apply logistic regression to the task of predicting two digits from the MNIST database: http://yann.lecun.com/exdb/mnist/. The database contains 60000 train images containing digits and 10000 test images. The images are of size 28 × 28. We will use the images in a vectorized form: a vector of size of 784. The code extracting the digits 0 and 1 is provided in the stubs.


In [None]:
%tensorflow_version 2.x

In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds

In [None]:
print(tf.__version__)


2.7.0


In [None]:
import numpy as np 

In [None]:
def data_preprocess(images, labels):

    # number of examples m  
    m = images.shape[0]
    
    # create vector of ones to concatenate to our data matrix (for intercept terms)
    ones = np.ones(shape=[m, 1])
    images = np.concatenate((ones, images), axis=1)
    
    # to retrieve the images and corresponding labels where the label is either 0 or 1, 
    # we define two logical vectors that can be used to subset our data_matrices
    logical_mask_0 = labels == 0
    logical_mask_1 = labels == 1
    
    images_zeros = images[logical_mask_0]
    labels_zeros = labels[logical_mask_0]
    images_ones = images[logical_mask_1]
    labels_ones = labels[logical_mask_1]
    
    X = np.concatenate((images_zeros, images_ones), axis=0)
    y = np.concatenate((labels_zeros, labels_ones), axis=0)
    
    # shuffle the data and corresponding labels in unison
    def _shuffle_in_unison(a, b):
        assert len(a) == len(b)
        p = np.random.permutation(len(a))
        return a[p], b[p]

    return _shuffle_in_unison(X,y)   

In [None]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train, x_test = x_train / 255.0, x_test / 255.0


In [None]:
print (x_train.shape)

(60000, 28, 28)


In [None]:
x_train = x_train.reshape([60000,784])
x_test = x_test.reshape([10000,784])
print(x_train.shape)


(60000, 784)


In [None]:
X,y = data_preprocess(x_train, y_train)
print('shape: ', X.shape)
print('shape: ', y.shape)

shape:  (12665, 785)
shape:  (12665,)


Define hyperparams: learning rate and gradient descent steps


In [None]:
learning_rate = 0.00001
gdc_steps = 100

    

Initialize your parameters W


In [None]:
# number of features n
n = X.shape[1]
# we need to define our model parameters to be learned. we use W (weights) instead of theta this time.

# mean and standard deviation
mu = 0
sigma = 0.01
w = np.random.normal(mu, sigma, n)


In [None]:
print(X.shape, w.shape)

(12665, 785) (785,)


Define the sigmoid function, your code here:


In [None]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

Define the loss function as provided in equation 12 (Logistic regression slides)


In [None]:
def compute_cross_entropy_loss(y, y_hat):
    return -(y @ np.log(y_hat) + (1 - y_hat) @ np.log(1 - y_hat))



Start optimization. During training you minimize the loss function. In every iteration your loss should decrease. You also want to look how many correct predictions you have at every iteration. Reminder: the belonging to class digit 1 is when your prediction, $\hat y$ is greater or equal to 0.5. 

When you test your prediction vector (containing zero and ones) with the labels (also zero and ones) you can use the equal function. 

Example:
prediction = (1, 0, 1, 1) and the true labels are y = (0, 0, 1, 0).

When you test on equality you get following result: correct = (0, 1, 1, 0). Your accuracy is: 0+1+1+0
4 = 0.5.
You compute the accuracy for the training and test.

In [None]:
for step in range(0, gdc_steps):
    print("Performing step " + str(step) + " of gradient descent.")
    # perform the dot product between the weights and the examples
    z = X @ w

    # apply the nonlinearity
    y_hat = sigmoid(z)

    # normally normalized with -1/m 
    loss = compute_cross_entropy_loss(y, y_hat)
    print("Loss at step " + str(step) + ": " + str(loss))

    # compute the error term
    error_term = y_hat - y
    
    # compute the gradient
    gradients = X.T @ error_term

    # update w using the gdc update rule
    w = w - learning_rate * gradients
    
    # compute the predictions and cast them to int values
    predictions = (y_hat > 0.5).astype(int)

    # compute mean accuracy
    accuracy = np.sum(np.equal(predictions, y)) / predictions.shape
    print("Accuracy at step " + str(step) + ": " + str(accuracy))
    


Performing step 0 of gradient descent.
Loss at step 0: 8937.261934070782
Accuracy at step 0: [0.83292538]
Performing step 1 of gradient descent.
Loss at step 1: 6992.11299594477
Accuracy at step 1: [0.99060403]
Performing step 2 of gradient descent.
Loss at step 2: 5565.675332985458
Accuracy at step 2: [0.99557837]
Performing step 3 of gradient descent.
Loss at step 3: 4714.030449118674
Accuracy at step 3: [0.99605211]
Performing step 4 of gradient descent.
Loss at step 4: 4127.733584149572
Accuracy at step 4: [0.9958942]
Performing step 5 of gradient descent.
Loss at step 5: 3694.985544045834
Accuracy at step 5: [0.99613107]
Performing step 6 of gradient descent.
Loss at step 6: 3360.2179435845933
Accuracy at step 6: [0.99613107]
Performing step 7 of gradient descent.
Loss at step 7: 3092.2174909340224
Accuracy at step 7: [0.99628899]
Performing step 8 of gradient descent.
Loss at step 8: 2871.978341926051
Accuracy at step 8: [0.99628899]
Performing step 9 of gradient descent.
Loss at

Evaluate model on test set

In [None]:
print("_______________________________")
print("Starting evaluation of test set")

X,y = data_preprocess(x_test, y_test)
z = X @ w
y_hat = sigmoid(z)
predictions = (y_hat>0.5).astype(np.int32)
accuracy = np.sum(np.equal(predictions, y)) / predictions.shape
print("Accuracy of test set: " + str(accuracy))

_______________________________
Starting evaluation of test set
Accuracy of test set: [0.99905437]
