# Logistic Regression

In this assignment you will implement logistic regression model for classifying cat vs non-cat images. First, let's import required packages.

In [2]:
import numpy as np
import h5py # for processing files stored in h5 format
import matplotlib # for plotting
from matplotlib import pyplot as plt # for plotting
from PIL import Image # for image processing

%matplotlib inline 

Data consists of a set of cat and non-cat images along with the corresponding labels. Test labels can be used for quantifying the performance of the model. The train and test data are stored in h5 format. So we will use file reader from h5py package and convert the data to numy array. 

In [3]:
def load_train_data():
    train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
    train_x = np.array(train_dataset["train_set_x"][:]) # train images
    train_y = np.array(train_dataset["train_set_y"][:]) # train labels
    return train_x, train_y
    
def load_test_data():
    test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
    test_x = np.array(test_dataset["test_set_x"][:]) # test images
    test_y = np.array(test_dataset["test_set_y"][:]) # test labels
    return test_x, test_y

train_x, train_y = load_train_data()
test_x, test_y = load_test_data()

Lets print the size of the data.

In [None]:
print(f'train data size: train_x {train_x.shape}, train_y {train_y.shape}')
print(f'test data size: test_x {test_x.shape}, test_y {test_y.shape}')


So, there are 209 training images with labels and 50 test images. Each image is a color image of size 64 x 64. Now lets plot a few training images picked randomly along with its labels.

In [None]:
class_to_name = {1:'cat', 0:'not-a-cat'}
fig = plt.figure(figsize = (12, 12))
rows, columns = 4, 4

for i in range(1, rows*columns+1):
    axes = fig.add_subplot(rows, columns, i)
    image_index = np.random.randint(train_x.shape[0])
    plt.imshow(train_x[image_index])
    axes.set_xticks([])
    axes.set_yticks([])
    plt.xlabel(class_to_name[train_y[image_index]])
    

Currently each training is 64 x 64 x 3 i.e train_x has shape (m, 64, 64, 3) where m is number of training samples. We need the train_x to have shape nx x m where nx is the number of input features ie 64*64*3 = 12288. You are required to complete the function flatten below that will take a m x h x w x c numpy array, flatten it to (h*w*c) x m numpy array and return it. Hint: numpy.reshape method

In [7]:
def flatten(z):
    m, h, w, c = z.shape
    nx = h*w*c
    ###Fill your code below. ###    


In [None]:
# you can test your function by uncommenting the following line.
# print(flatten(train_x).shape) 

Now that data is ready, you will implement logistic regression. You may look at the python style pseudo-code in lecture notes for clarity. The code below will mostly follow that pseudo-code. You have to complete forward, backward and sigmoid functions below. Further you have to complete few lines in the train loop in main function that trains the model.

In [None]:
def forward(a, w, b):
    """
      Forward propagates a through the logistic unit given w and b
      a: I/p to logistic unit of shape nx x m
      w: weight vector of shape nx x 1
      b: bias which is a scalar
      
      returns anew: the output from logistic unit of shape 1 x m
              cache: a tuple that contains input a
    """
    
    # fill rhs of following 3 lines. no extra lines of code required.
    z =                 # linear computation; np.dot or np.matmul or the operator @ will be helpful
                        # learn about numpy broadcasting; it will be helpful for adding b to dot product of w and a
        
    anew =             # non-linear activation on z
    cache =           # a tuple that contains input a
    return anew, cache

def backward(da, a, cache):
    """
      Backward propagates da through the logistic unit given a and cache
      da: derivative of loss with respect to logistic output; shape is 1 x m
      a: o/p to logistic unit of shape 1 x m
      cache: a tuple that contains input to logistic unit; i/p to logistic unit is of shape nx x m
      
      returns dw: derivative of loss with respect to w; shape is nx x 1
              db: derivative of loss with respect to b; db is a scalar
    """
    
    # fill rhs of following 4 lines. no extra lines of code required.
    aprev =                     # extract from cache the i/p to logistic unit
    dz =                        # compute dz using chain rule as product of incoming grad da and local grad 
    dw =                        # compute dw; np.sum will be helpful  
    db =                        # compute db; np.sum will be helpful   
    return dw, db

def sigmoid(z):
    """
      Computes sigmoid of the given np array z
      z: np array of any shape
      
      returns sigmoid of z
    """
    # replace ...... below by correct return value; np.exp will be helpful
    return ......

def main(): # main function to train the model
    # load train data
    a0, y = load_train_data()
    a0 = flatten(a0)
    a0 = a0/255. # normalize the data to [0, 1]
    nx, m = a0.shape    
    
    # set some hyperparameters and epsilon
    alpha = 0.01    
    miter = 500
    epsilon = 1e-6
    
    #intialize weight and bias parameters
    w = np.random.randn(nx, 1)*.01
    b = 0
    
    # train loop
    # fill rhs in the body of the for loop
    for i in range(miter):
        a1, cache =             # forward propagation        
        L =                     # compute loss; np.sum or np.mean, np.log will be useful
        da1 =                   # derivative of loss with respect to a1
        dw, db =                # backward propagation
        w -=                    # update w
        b -=                    # update b
        if not i%100: # print loss every 100 iterations
            print(f'Loss at iteration {i}:\t{np.asscalar(L):.4f}')
        
    return w, b

if __name__ == '__main__':
    w, b = main()    

Now we will test our model on both train and test data.

In [None]:
def predict(x, w, b):
    a = sigmoid( w.T @ x + b)
    predictions = np.zeros_like(a)
    predictions[a > 0.5] = 1
    return predictions

def test_model(x, y, w, b):
    predictions = predict(x, w, b)
    acc = np.mean(predictions == y)
    acc = np.asscalar(acc)
    return acc

x, y = load_train_data()
x = flatten(x)
x = x/255. # normalize the data to [0, 1]
print(f'train accuracy: {test_model(x, y, w, b) * 100:.2f}%')

x, y = load_test_data()
x = flatten(x)
x = x/255. # normalize the data to [0, 1]
print(f'test accuracy: {test_model(x, y, w, b) * 100:.2f}%')

# Questions
1. Can you bring down the loss more? (Hint: Try more iterations. If loss oscillates, try different learning rates)
2. Write a simple python code to plot loss against number of iterations for learning rate alpha = 0.005. State your observations from the plot.
3. What happens to the model if weights are intialized to zero? Explain your observations.
4. What is the range of probabilities for cat images in the test data? What is the range of probabilities for cat images in the train data? How do they compare?

Note: All questions will be answered in the jupyter notebook only. Wherever code is required you write and run the code in a code cell. For text, write and render in a markdown cell.