# Logistic Regression Assignment (due 26 November)

In this practical you will learn how to apply logistic regression to the task of predicting two digits from the MNIST database: http://yann.lecun.com/exdb/mnist/. The database contains 60000 train images containing digits and 10000 test images. The images are of size 28 × 28. We will use the images in a vectorized form: a vector of size of 784. The code extracting the digits 0 and 1 is provided in the stubs.


In [2]:
%tensorflow_version 2.x

Colab only includes TensorFlow 2.x; %tensorflow_version has no effect.


In [3]:
import tensorflow as tf
import tensorflow_datasets as tfds

In [4]:
print(tf.__version__)


2.12.0


In [5]:
import numpy as np 

In [6]:
def data_preprocess(images, labels):

    # number of examples m  
    m = images.shape[0]
    
    print(m)
    # create vector of ones to concatenate to our data matrix (for intercept terms)
    ones = np.ones(shape=[m, 1])
    images = np.concatenate((ones, images), axis=1)
    
    # to retrieve the images and corresponding labels where the label is either 0 or 1, 
    # we define two logical vectors that can be used to subset our data_matrices
    logical_mask_0 = labels == 0
    logical_mask_1 = labels == 1
    
    images_zeros = images[logical_mask_0]
    labels_zeros = labels[logical_mask_0]
    images_ones = images[logical_mask_1]
    labels_ones = labels[logical_mask_1]
    
    X = np.concatenate((images_zeros, images_ones), axis=0)
    y = np.concatenate((labels_zeros, labels_ones), axis=0)
    
    # shuffle the data and corresponding labels in unison
    def _shuffle_in_unison(a, b):
        assert len(a) == len(b)
        p = np.random.permutation(len(a))
        print('length ', len(a))
        print(a.shape)
        print(a[p].shape)
        return a[p], b[p]

    return _shuffle_in_unison(X,y)   

In [7]:
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

In [8]:
print (x_train.shape)

(60000, 28, 28)


In [9]:
x_train = x_train.reshape([60000,784])
x_test = x_test.reshape([10000,784])
print(x_train.shape)

(60000, 784)


In [10]:
X,y = data_preprocess(x_train, y_train)
print('shape: ', X.shape)
print('shape: ', y.shape)

60000
length  12665
(12665, 785)
(12665, 785)
shape:  (12665, 785)
shape:  (12665,)


Define hyperparams: learning rate and gradient descent steps


In [11]:
learning_rate = 0.01;
gdc_steps = 1000

Initialize your parameters W


In [12]:
# number of features n
n = X.shape[1]
# we need to define our model parameters to be learned. we use W (weights) instead of theta this time.
mu, sigma = 0, 0.01 # mean and standard deviation
w = np.random.normal(mu, sigma, n)
print(n)

785


In [13]:
print(X.shape, w.shape)

(12665, 785) (785,)


Define the sigmoid function, your code here:
1/(1 + exp(-z*weights))


In [14]:
def sigmoid(z):
    return 1/(1 + np.exp(-z))

#print(np.shape(sigmoid(3)))

Define the loss function as provided in equation 12 (Logistic regression slides)
J(w)) = -log(p(yjX; w)) = - Integral von Xn bis i=1(p(yjXi; w))


In [15]:
def compute_cross_entropy_loss(y, y_hat):
    loss = - (y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat))

#print(compute_cross_entropy_loss(y, sigmoid(np.dot(w,n))))

Start optimization. During training you minimize the loss function. In every iteration your loss should decrease. You also want to look how many correct predictions you have at every iteration. Reminder: the belonging to class digit 1 is when your prediction, $\hat y$ is greater or equal to 0.5. 

When you test your prediction vector (containing zero and ones) with the labels (also zero and ones) you can use the equal function. 

Example:
prediction = (1, 0, 1, 1) and the true labels are y = (0, 0, 1, 0).

When you test on equality you get following result: correct = (0, 1, 1, 0). Your accuracy is: 0+1+1+0
4 = 0.5.
You compute the accuracy for the training and test.

In [18]:
for step in range(0, gdc_steps):
    print("Performing step " + str(step) + " of gradient descent.")
    # perform the dot product between the weights and the examples
    z = np.dot(X,w)
    print('z', z, z.shape)
    # apply the nonlinearity
    y_hat = sigmoid(z)
    print("y hat: " + str(y_hat))
    # normally normalized with -1/m 
    loss = compute_cross_entropy_loss(y, y_hat)
    print("Loss at step " + str(step) + ": " + str(loss))

    # compute the error term, i.e. the difference between labels and estimated labels y_hat, see equation 24 in the slides
    error_term = y_hat - y

    # compute the gradient. as our data matrix X is currently layed out as X_j_i, we got to transpose it 
    # see derived formula of the gradient calculation
    gradients =  (1/len(y)) * np.dot(np.transpose(X),error_term)
    print(X.shape)
    print(error_term.shape)
    print(gradients.shape)

    # update w using the gdc update rule
    w = w-learning_rate*gradients

    # compute the predictions and cast them to int values
    predictions = np.int32(y_hat >= 0.5)
    print(predictions)
    print(predictions.shape)
    # compute mean accuracy
    accuracy =  np.mean(predictions == y)
    print("Accuracy at step " + str(step) + ": " + str(accuracy))

[1;30;43mDie letzten 5000 Zeilen der Streamingausgabe wurden abgeschnitten.[0m
z [-5.87933686  5.89367306 -4.18064433 ... -6.7999168   6.44693735
 -4.31753963] (2115,)
y hat: [0.00278884 0.99725075 0.01505843 ... 0.00111263 0.99841714 0.01315723]
Loss at step 573: None
(2115, 785)
(2115,)
(785,)
[0 1 0 ... 0 1 0]
(2115,)
Accuracy at step 573: 0.9990543735224586
Performing step 574 of gradient descent.
z [-5.88010396  5.89455765 -4.18120618 ... -6.80082071  6.44782302
 -4.31820321] (2115,)
y hat: [0.00278671 0.99725317 0.0150501  ... 0.00111162 0.99841854 0.01314861]
Loss at step 574: None
(2115, 785)
(2115,)
(785,)
[0 1 0 ... 0 1 0]
(2115,)
Accuracy at step 574: 0.9990543735224586
Performing step 575 of gradient descent.
z [-5.88087075  5.89544159 -4.18176785 ... -6.80172425  6.44870799
 -4.31886669] (2115,)
y hat: [0.00278458 0.99725559 0.01504178 ... 0.00111062 0.99841994 0.01314001]
Loss at step 575: None
(2115, 785)
(2115,)
(785,)
[0 1 0 ... 0 1 0]
(2115,)
Accuracy at step 575: 0

Evaluate model on test set

In [17]:
print("_______________________________")
print("Starting evaluation of test set")

X,y = data_preprocess(x_test, y_test)
z = np.dot(X,w) 
y_hat = sigmoid(z)
predictions = (y_hat>0.5).astype(np.int32)
accuracy = np.mean(predictions==y)
print("Accuracy of test set: " + str(accuracy))

_______________________________
Starting evaluation of test set
10000
length  2115
(2115, 785)
(2115, 785)
Accuracy of test set: 0.9990543735224586
