**Copyright: © NexStream Technical Education, LLC**.
All rights reserved

**Gradient Descent and the Perceptron**

In this project, you will integrate gradient descent to a perceptron model with logistic regression (i.e. using the sigmoid activation and binary cross entropy cost function) on a synthetic dataset. Then in a subsequent assignment you will extend what you develop here and apply it to an image dataset.  The code you will develop implements an unvectorized version (Part-1), then a vectorized version (Part-2), and finally you will compare the efficiency between the two versions (Part-3).
The code developed in this unit will form the base for the backpropagation step which will be integrated into your DNN framework.

Please complete the following steps in the Colab Script.  The reference script below provides template code and hints to help with each step.

The following instructions are identified as Steps in the text cells preceding their corresponding code cell. Read through the instructions and write/fill-in the appropriate code in the cells.



##Part A:  Dataset Creation and Preprocessing

In this section, you will set up your drive, create a synthetic dataset, and preprocess the data in preparation for the regression and descent algorithms.  Please follow the steps outlined in the following cells and fill in your code where prompted.

**Step A-1:**
- Mount your Google drive.
- Upload the file tf_image_utils.py from the materials folder provided with this course and copy it to your project directory.
- Import the numpy module as np and the random module.
- Initialize a random seed.  This will be used for checking your outputs.

In [1]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
%cd /content/drive/MyDrive/Colab Notebooks/
#See the reference cp command below.  Update this to your own drive path.
#!cp drive/MyDrive/MachineLearning/DNN/tf_image_utils.py .

#!cp '/content/drive/MyDrive/ComputerScience/MachineLearning/Neural Networks/DNN/tf_image_utils.py' .

Mounted at /content/drive
/content/drive/MyDrive/Colab Notebooks


In [2]:
# Imports

import numpy as np
import tensorflow as tf
import tf_image_utils
from tf_image_utils import load_data, preprocess, as_numpy
import tensorflow_datasets as tfds


**Step A-2:** Create a Synthetic Dataset

- Use the following code cells to generate synthetic data and create the train and test datasets.
- Write the function "see_shapes" to print the shapes of the datasets, then call the function to display the shapes.
Your shapes output should be the following:
  - Number of training examples: m_train =  80
  - Number of testing examples: m_test =  20
  - Height/Width of each image: num_px =  24
  - Each image is of size:  24 x 24
  - X_train shape:  (80, 24, 24)
  - Y_train shape:  (1, 80)
  - X_test shape:  (20, 24, 24)
  - Y_test shape:  (1, 20)
- Verify the doctest modules included in the cells run without any errors

In [3]:
# Creates synthetic data.

def create_X(m):
  np.random.seed(3) # do not change - for grading purposes
  data = np.zeros((m, 24, 24))
  for i in range(m):
    for j in range(24):
      data[i][j] = np.random.randint(0, 255, 24)
  return np.array(data)

In [4]:
# Creates synthetic labels.

def create_Y(m, X):
  labels = np.zeros((1, m))
  for i in range(m):
    if (X[i][0][0] % 2 == 0):
      labels[:,i] = 1
    else:
      labels[:,i] = 0
  return labels

In [5]:
def create_dataset(m):
  X = create_X(m)
  Y = create_Y (m, X)
  return X, Y

In [6]:
X_train, Y_train = create_dataset(80)
X_test, Y_test = create_dataset(20)

In [7]:
# function to see the shapes of the data.
def see_shapes(X_train, X_test, Y_train, Y_test):

### BEGIN CODE HERE

  #m_train = None
  #m_test =  None
  #image_size = None
  m_train = X_train.shape[0]
  m_test =  X_test.shape[0]
  image_size = X_train.shape[1]
  return m_train, m_test, image_size

 ### END CODE HERE


In [8]:
# Call the function to see the shapes of your data.

### BEGIN CODE HERE

#m_train, m_test, image_size = None

m_train, m_test, image_size = see_shapes(X_train, X_test, Y_train, Y_test)

### END CODE HERE

print ("Number of training examples: m_train = ", m_train)
print ("Number of testing examples: m_test = ", m_test)
print ("Height/Width of each image: num_px = ", image_size)
print ("Each image is of size: ", image_size, "x", image_size)
print ("X_train shape: ", X_train.shape)
print ("Y_train shape: ", Y_train.shape)
print ("X_test shape: ", X_test.shape)
print ("Y_test shape: ", Y_test.shape)

import doctest
"""
  >>> print(m_train)
  80
  >>> print(m_test)
  20
  >>> print(image_size)
  24
  >>> print (X_train.shape)
  (80, 24, 24)
  >>> print (Y_train.shape)
  (1, 80)
  >>> print (X_test.shape)
  (20, 24, 24)
  >>> print (Y_test.shape)
  (1, 20)
"""

doctest.testmod()


sys.settrace() should not be used when the debugger is being used.
This may cause the debugger to stop working correctly.
If this is needed, please check: 
http://pydev.blogspot.com/2007/06/why-cant-pydev-debugger-work-with.html
to see how to restore the debug tracing back correctly.
Call Location:
  File "/usr/lib/python3.10/doctest.py", line 1501, in run
    sys.settrace(save_trace)



Number of training examples: m_train =  80
Number of testing examples: m_test =  20
Height/Width of each image: num_px =  24
Each image is of size:  24 x 24
X_train shape:  (80, 24, 24)
Y_train shape:  (1, 80)
X_test shape:  (20, 24, 24)
Y_test shape:  (1, 20)


TestResults(failed=0, attempted=7)

**Step A-3 :** Preprocess the datsets - flatten, normalize
- Write the function "flatten" to flatten the dataset then call the function to see the shapes of the flattened data.   A trick to accomplish this in a single line is to set X_train = X_train.reshape(X_train.shape[0], -1).T. What is going on in this line? First, we call the .reshape method on X_train to begin the process of reshaping it. Then, we say that we want the first dimension of our new X_train to be the same as the first dimension of the old X_train. That is, we say that we want the first dimension of X_train to remain “m”, or the number of examples in the set. Then, the “-1” returns the product of all other dimensions. This gives us a matrix of 80 rows by 576 columns. The final step is to transpose the matrix by using the .T method, which switches the number of rows and columns. Do the same for X_test and then run the cell to call the flatten() function.

- You should see the following output:
  - Flattened X_train shape: (576, 80)
  - Y_train shape: (1, 80)
  - Flattened X_test shape: (576, 20)
  - Y_test shape: (1, 20)

- Write the function "normalize" to normalize the dataset then call the function to see the shapes of the flattened data. That is, divide each of the samples by the maximum value corresponding to the size of the pixels (in this case, it'll be 2^8 - 1 or 255).
- Verify the doctest modules run for both the flatten and normalize calls without any errors

In [9]:
# Flatten the dataset.
def flatten(X_train, X_test):

  ### BEGIN CODE HERE ###

  #X_train = None
  #X_test =  None

  X_train = X_train.reshape(X_train.shape[0], -1).T
  X_test = X_test.reshape(X_test.shape[0], -1).T

  return X_train, X_test, X_train.shape, Y_train.shape, X_test.shape, Y_test.shape

  ### END CODE HERE ###


In [10]:
# Call the function to flatten the datasets.

### BEGIN CODE HERE

#X_train, X_test, X_train.shape, Y_train_shape, X_test.shape, Y_test.shape = None

X_train, X_test, X_train_shape, Y_train_shape, X_test_shape, Y_test_shape = flatten(X_train, X_test)

### END CODE HERE

print ("Flattened X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("Flattened X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))

import doctest
"""
  >>> print(X_train.shape)
  (576, 80)
  >>> print(Y_train.shape)
  (1, 80)
  >>> print(X_test.shape)
  (576, 20)
  >>> print (Y_test.shape)
  (1, 20)
"""

doctest.testmod()


Flattened X_train shape: (576, 80)
Y_train shape: (1, 80)
Flattened X_test shape: (576, 20)
Y_test shape: (1, 20)


TestResults(failed=0, attempted=4)

In [11]:
# Basic normalization.

def normalize(X_train, X_test):

  ### BEGIN CODE HERE

  #X_train = None
  #X_test = None

  X_train = X_train / 255
  X_test = X_test / 255

  ### END CODE HERE

  return X_train, X_test

In [12]:
# Call your normalization function.

### BEGIN CODE HERE

#X_train, X_test = None

X_train, X_test = normalize(X_train, X_test)

### END CODE HERE

import doctest
"""
  >>> print(round(X_train[0][3], 2))
  0.12
  >>> print(round(Y_train[0][4], 2))
  1.0
  >>> print(round(X_test[0][0], 2))
  0.42
  >>> print (round(Y_test[0][1], 2))
  0.0
"""

doctest.testmod()



TestResults(failed=0, attempted=4)

## Part 1:  Unvectorized Implementation

In this section, you will implement unvectorized gradient descent.  Please follow the steps outlined in the following cells and fill in your code where prompted.

First, take a look at the pseudocode overview. This is the skeleton around which you should build your code. Make sure to refer back to this cell throughout if you get stuck, and for further clarification, refer back to the lecture video on Gradient Descent

An important note is that this is not a completely unvectorized implementation. You will be using vectors for the weights and gradients, but in conjunction with for loops. However, as you will see, this method is much less efficient than the fully vectorized approach you will implement later.


**Pseudocode Overview**

In [None]:
### PSEUDOCODE for unvectorized gradient descent

# Extract number of examples and number of features from X_train
# Initialize parameters
# Initialize list of costs

# Loop over iterations
# Initialize gradients vector
# Initialize cost accumulator
  # Loop over examples
    # Index into dataset to extract a single example's feature vector and label
    # Compute forward propagation steps to find the activation for this example
    # Compute the cost for this example and add it to the cumulative total
    # Calculate the derivative dz for this example

      # Loop over features
        # Calculate the derivative of each weight using its associated feature vector

    # Add the derivative with respect to the bias for that example to a cumulative total

  # Divide the cost and the gradients by the number of examples m to find the average cost and average update amount per weight.
  # Loop over features.
    # Update the weights.
  # Update the bias.

  # Append the cost for the iteration to the list.

# Return the learned parameters and the list of costs.

**Step 1-1** : Initialization and Example Extraction
Let’s begin by defining functions which initialize our parameters and extract a single example from the train set. Each example in our training data has 576 features, which means there will be one weight associated with each. While there are many ways of initializing weights, we are going to keep things simple by simply creating an array of 576 random numbers (multiplied by 0.01 to keep them small) as our weights vector. We can use the built-in numpy function np.random.randn to do this. You will need to pass the number of values (i.e. the number of weights) that you want in your array to randn(). When you can, try to avoid hard coding. For example, instead of hard coding the shape of the weights vector w as (576, 1), use a variable instead for the number of features in X_train. You can, however, just go ahead and set b=0, since this is a simple perceptron model (and so there will only be one bias since there is one node). Return the initialized w and b from this function.

Then call the initialize_parameters function and generate 5 weights.



In [13]:
# Initialize weights and biases.
def initialize_parameters(dim):
  np.random.seed(3) #do not change - for grading purposes

  #w = None
  #b = None
  w = np.random.randn(dim, 1) * 0.01
  b = 0


  return w, b

In [14]:
# Test for correct output.

w, b = initialize_parameters(5)
print ("w: ", w)
print ("b: ", b)


import doctest

"""
  >>> print(np.round(w[0], 3))
  [0.018]
  >>> print(np.round(w[4], 3))
  [-0.003]
  >>> print(np.round(b, 1))
  0
"""

doctest.testmod()




w:  [[ 0.01788628]
 [ 0.0043651 ]
 [ 0.00096497]
 [-0.01863493]
 [-0.00277388]]
b:  0


TestResults(failed=0, attempted=3)

**Step 1-2 :**  Extract an example (i.e. an input sample)
Next, let’s build a function to extract an example from the train set. Refer back to the pseudocode guide – you should see that this function will be called immediately after entering the for loop over the examples. Therefore, in a loop going from i=0 to i=m, we need a way to extract the ith example from the train set. To understand exactly what we are doing, let’s look back at the shapes of the data. After flattening, X_train is a matrix with 576 rows and 80 columns; that is, each column represents an example. So to extract a single example, we need to index the matrix by column. With numpy, we can index multidimensional arrays  by row and column. For example, if we want all the values in the 3rd column, we can index as follows: array[:,3]. Define the function extract() to return the ith column of X_train as x and the ith column of Y_train as y.


In [15]:
# Extract a single example.
def extract(X_train, Y_train, example):


  # Index into the array.
  #x = None
  #y = None

  x = X_train[:, example]
  y = Y_train[:, example]
  x = x.reshape(x.shape[0], 1)


  # Reshape
  x = x.reshape(x.shape[0], 1)
  return x, y

In [16]:
# Test for correct output.

x, y = extract(X_train, Y_train, 0)
print ("x: ", x)
print ("y: ", y)


import doctest

"""
  >>> print(np.round(x[17], 3))
  [0.651]
  >>> print(np.round(x[50], 3))
  [0.31]
  >>> print(np.round(y, 1))
  [1.]
"""

doctest.testmod()


x:  [[0.41568627]
 [0.59607843]
 [0.97647059]
 [0.51372549]
 [0.72156863]
 [0.78431373]
 [0.        ]
 [0.08235294]
 [0.99215686]
 [0.57647059]
 [0.79215686]
 [0.41960784]
 [0.97647059]
 [0.6627451 ]
 [0.54117647]
 [0.58431373]
 [0.46666667]
 [0.65098039]
 [0.87843137]
 [0.58039216]
 [0.6745098 ]
 [0.36470588]
 [0.65490196]
 [0.55686275]
 [0.97254902]
 [0.10196078]
 [0.31764706]
 [0.85490196]
 [0.58823529]
 [0.25882353]
 [0.00784314]
 [0.74901961]
 [0.23529412]
 [0.50588235]
 [0.70196078]
 [0.35294118]
 [0.27058824]
 [0.40784314]
 [0.43137255]
 [0.38039216]
 [0.61568627]
 [0.41568627]
 [0.59607843]
 [0.98431373]
 [0.24313725]
 [0.02745098]
 [0.97254902]
 [0.67058824]
 [0.12941176]
 [0.48235294]
 [0.30980392]
 [0.69019608]
 [0.14509804]
 [0.07843137]
 [0.36862745]
 [0.19215686]
 [0.58431373]
 [0.49803922]
 [0.80784314]
 [0.95686275]
 [0.10980392]
 [0.46666667]
 [0.21176471]
 [0.        ]
 [0.75294118]
 [0.91372549]
 [0.07058824]
 [0.45490196]
 [0.74901961]
 [0.64705882]
 [0.72156863]
 [

TestResults(failed=0, attempted=3)

### Forward Propagation

**Step 1-3** :
Now let’s compute the forward propagation steps. Let’s begin by defining a helper functions to compute activation. We’ll be using the sigmoid activation function for this model. Define a function which return the sigmoid of the input. You can see the formula for the sigmoid function on the slide.

Refer back to the pseudocode guide to remind yourself of where this feedforward function should go. We will be performing the feedforward step for a single example. That means we need to calculate the activation for that single example, which is a column vector of features we have extracted. First, calculate z by multiplying each weight by its associated feature in the example, and then adding the bias. An efficient way to do this is by calculating the dot product of the weights vector and the example feature vector, utilizing numpy (in a preview of the vectorization methods we will use later!).  The dot product of two vectors [a, b, c] and [x, y, z] is defined as (ax + by + cz). Notice that this is exactly what we want to achieve. Numpy’s np.dot() function computes matrix multiplication for 2D arrays. Right now, we have two 2D arrays, both as column vectors: w, of shape (576, 1) and x, of shape (576, 1). How can we use matrix multiplication to return the dot product of these two vectors?

It turns out there is nice relationship between the dot product and matrix multiplication. We can rewrite a.b as a*b.T. That is, the dot product of a and b is the same as a multiplied by b transpose. Recall that transposing a vector means to switch its dimensions (i.e. a 2x3 vector becomes a 3x2 vector). So, let’s go ahead and calculate the dot product of the weights w and the example x by passing w.T and x as arguments to np.dot. Recall from the unit on linear algebra that the order of matrix multiplication is very important. Np.dot(w.T, x) will give you a different result to np.dot(x, w.T). Think carefully about what exactly we want to achieve, and try writing out an example if you are still stuck on which order to use. Don’t forget to add the bias to the result of the dot product! Finally, pass z to sigmoid_activation to compute the activation for this example. Overall, the feedforward function should take in w, b, and x as arguments and return the activation a.


In [17]:
# Helper function for activation.
def sigmoid_activation(z):


  #  activation = 1/(1+e^-z)
  #a =  None

  a = 1 / (1 + np.exp(-z))


  return a

In [18]:
# Test for correct output.

a = sigmoid_activation(0)


import doctest

"""
  >>> print(np.round(sigmoid_activation(0), 2))
  0.5
  >>> print(np.round(sigmoid_activation(.7), 2))
  0.67
  >>> print(np.round(sigmoid_activation(-3), 2))
  0.05
"""

doctest.testmod()

TestResults(failed=0, attempted=3)

In [19]:
def feedforward(w, b, x):

  ### EBGIN CODE HERE

  # Forward propagation.
  #z = None
  #a = None

  #z = np.dot(w, x) + b
  z = (w * x) + b

  a = sigmoid_activation(z)

  ### END CODE HERE

  return a

In [20]:
print(a)

0.5


### Cost Computation

**Step 1-4** :

Next, let’s see how to compute the cost. Because we are implementing unvectorized gradient descent, you will find that it is easier not to write a function to calculate the cost. Nonetheless, let’s see how the cost is calculated. First, notice that we have declared epsilon to be a very small value (1e-8). We need this to prevent division by zero in the cost calculation. Since we are performing binary classification, we will want to use the binary cross-entropy loss function.
$$J=\sum [ylog(a) + (1-y)log(1-a)] $$
We then call np.squeeze on J to give the cost for broadcasting purposes – more on this later when we vectorize!
https://numpy.org/doc/stable/reference/generated/numpy.squeeze.html

In [21]:
# Initializations for test output.
# Do not change - for grading purposes
J = 0
a = np.array([0.839])
y = np.array([1.])

# Calculate cost.
epsilon = 1e-8

#J += None
#cost = None   #call np.squeeze on J

J += -y * np.log(a + epsilon) - (1 - y) * np.log(1 - a + epsilon)

cost = np.squeeze(-J)  #call np.squeeze on J

print(cost)

import doctest

"""
  >>> print(np.round(cost, 3))
  -0.176
"""

doctest.testmod()

-0.17554456059597978


TestResults(failed=0, attempted=1)

### Derivatives and Backpropagation

**Step 1-5** :

Finally, let’s consider the backpropagation and weight update steps. Similar to cost computation, we are mostly going tp avoid explicit functions here, since the backpropagation steps are awkwardly divided across for loops in unvectorized descent. The one function we will define, however, is a simple one which takes in the activation a and the corresponding label y for our chosen example, and calculates the derivative of the cost with respect to z. We won’t go into the details of the derivation here, but it turns out that dz = a – y.

We can calculate the derivative of J with respect to the ith weight dwi by calculating dwi = ith feature of x * dz.

We’ll complete weight update within the algorithm itself.


In [22]:
# Calculate derivative of the cost with respect to z.
def calc_dz(a, y):

  #dz = None

  dz = a - y

  return dz

In [23]:
# Test output.

dz = calc_dz(a, y)


import doctest

"""
  >>> print(np.round(dz, 3))
  [-0.161]
"""

doctest.testmod()

TestResults(failed=0, attempted=1)

You will complete the rest of the backpropagation steps inside the descent algorithm itself. Typically, we would create functions to perform backpropagation for us; however, since we are implementing unvectorized descent, there is no clean way to do this.

### Full Algorithm
**Step 1-6** :

We’re now ready to put everything together. You will want to refer back to the pseudocode guide regularly. First, let’s extract the number of examples and the number of features per example from X_train. We can do this by indexing into the shape of X_train. Refer back to the output of the flatten() function to see the shape of X_train after preprocessing.. Alternatively, make a new cell and run X_train.shape to work out how to index into it. Next, let’s initialize our parameters w and b using the function we built earlier. Remember that we need to pass a variable containing the number of features in the train set to this function – luckily, we have just extracted this! Now go ahead and initialize an empty list to which we can append the cost per iteration (this will be useful to keep track of how the cost changes during training).

We will now enter the first for loop which is for the number of iterations of descent. Inside this loop, we are going to implement one iteration of unvectorized descent. First, initialize the gradients. In code, the convention is to use dw1 for the derivative of the cost with respect to w1 (and so on). Initialize a gradients vector of zeroes using np.zeros as dw.  The ith component in this vector will be the derivative of the cost with respect to the ith weight. Remember to also initialize J, our cost accumulator, to 0.

Now we can enter the for loop over the examples (this should iterate from 0 to the number of examples). The first thing we need to do is extract a single example and its corresponding label. We can do this by calling our extract() function. Next, calculate the activation a by calling the feedforward() function.  Then, calculate the cost for this example using the code you wrote earlier. Don’t forget to include the small epsilon value and to call np.squeeze on J. Next, begin the process of backpropagation by using your calc_dz() function to calculate dz for this example.  Now we need to calculate the gradients with respect to the weights. First, loop over the features. Inside this loop, you will want to accumulate the values of dw1,dw2, and so on. We are accumulating here because, similarly to the cost computation, we are going to repeat this process for each example and thus calculate a cumulative total for all examples. Remember that dw1 = x1*dz. You should be able to use the for loop to iteratively calculate each derivative. Lastly, remember to accumulate db – however, remember to do this outside the for loop over features, but inside the for loop over examples. This is because we only want to calculate db once per example.

Now you will want to exit out of the for loop over examples too. The code you have written inside this loop will calculate the total cost for all examples, as well as the total gradient with respect to each parameter for all examples. That is why you must now divide each accumulator by the number of examples to get the average cost per example and average gradient with respect to each parameter. Lastly, we need to update the weights by looping over them and updating each weight with w = w – learning_rate*dw. Don’t forget to also update the bias!


In [24]:
print(X_train.shape)

(576, 80)


In [25]:
print(Y_train.shape)

(1, 80)


In [None]:
# Skeleton of the unvectorized algorithm.
"""
def unvectorized_descent(X_train, Y_train, num_iterations, learning_rate):

  # Extract number of examples and number of features.
  m = None
  num_features = None

  # Initialize parameters.
  w, b = None

  # Initialize  list of costs.
  costs = None

  for iteration in range(None, None):

    # Initialize cost and gradients.
    J = None
    dw = None
    db = None

    # Iterate over examples.
    for None in range(None, None):

      # Extract a single example.
      x, y = None

      # Compute forward propagation for a single example.
      a = None

      # Compute the cost for a single example.
      epsilon = None
      J += None
      cost = None

      # Backpropagation for a single example.
      # Calculate derivative.
      dz = None

      # Iterate over features.
      for None in range(None):

        # Calculate gradient.
        dw[None] += None

      # Calculate derivative of cost with respect to the bias.
      db += None

    # Divide by m.
    J = None
    dw = None
    db = None

    # Update parameters by looping over weights and derivatives.
    for None in range(None):
      w[None] = None

    b = None


    # Record the costs
    if iteration % 10 == 0:
        costs.append(cost)

    # Print the cost every 100 training iterations
    if iteration == 0:
      print ("Cost after iteration %i: %f" %(iteration, cost))

    elif iteration % 10 == 0:
      print ("Cost after iteration %i: %f" %(iteration, cost))

    # Print the cost every 100 training iterations
    if iteration == num_iterations:
      print ("Cost after iteration %i: %f" %(iteration, cost))

  return w, b, costs

  """

'\ndef unvectorized_descent(X_train, Y_train, num_iterations, learning_rate):\n\n  # Extract number of examples and number of features.\n  m = None\n  num_features = None\n\n  # Initialize parameters.\n  w, b = None\n\n  # Initialize  list of costs.\n  costs = None\n\n  for iteration in range(None, None):\n\n    # Initialize cost and gradients.\n    J = None\n    dw = None\n    db = None\n\n    # Iterate over examples.\n    for None in range(None, None):\n\n      # Extract a single example.\n      x, y = None\n\n      # Compute forward propagation for a single example.\n      a = None\n\n      # Compute the cost for a single example.\n      epsilon = None\n      J += None\n      cost = None\n\n      # Backpropagation for a single example.\n      # Calculate derivative.\n      dz = None\n\n      # Iterate over features.\n      for None in range(None):\n\n        # Calculate gradient.\n        dw[None] += None\n\n      # Calculate derivative of cost with respect to the bias.\n      db +=

In [69]:
# Skeleton of the unvectorized algorithm.

def unvectorized_descent(X_train, Y_train, num_iterations, learning_rate):

  # Extract number of examples and number of features.
  #m = None
  #num_features = None
  m, num_features = X_train.shape[1], X_train.shape[0]

  # Initialize parameters.
  w, b = initialize_parameters(num_features)

  # Initialize  list of costs.
  costs = []

  for iteration in range(0, num_iterations+1):

    # Initialize cost and gradients.
    J = 0
    dw = np.zeros((num_features, 1))
    db = 0


    # Iterate over examples.
    for i in range(0, m):

      # Extract a single example.
      x, y = extract(X_train, Y_train, 0)

      # Compute forward propagation for a single example.
      a = feedforward(w, b, x)

      # Compute the cost for a single example.
      epsilon = 1e-15
      J += -y * np.log(a + epsilon) - (1 - y) * np.log(1 - a + epsilon)
      cost = np.squeeze(J)


      # Backpropagation for a single example.
      # Calculate derivative.
      dz = calc_dz(a, y)

      # Iterate over features.
      for j in range(num_features):
        # Calculate gradient.
        dw[j] += x[j] * dz[0]  # Adjusted here

      # Calculate derivative of cost with respect to the bias.
      db += dz[0]

    # Divide by m.
    J = J/m
    dw = dw/m
    db = db/m


    # Update parameters by looping over weights and derivatives.
    for i in range(num_features):
      w[i] = w[i] - learning_rate * dw[i]

    b = b - learning_rate * db


  # Record the costs
    if iteration % 10 == 0:
        costs.append(cost[0])

    # Print the cost every 100 training iterations
    if iteration % 100 == 0:
      print("Cost after iteration %i: %f" %(iteration, cost[0]))
  return w, b, costs



In [70]:
# Call your function.

#w, b, costs = None

w, b, costs = unvectorized_descent(X_train, Y_train, 1001, 0.1)



import doctest

"""
  >>> print(np.round(costs[0], 3))
  55.748
  >>> print(np.round(costs[59], 3))
  2.667
  >>> print(np.round(w[17], 3))
  [-0.053]
  >>> print(np.round(b, 3))
  [0.014]
"""

doctest.testmod()

Cost after iteration 0: 55.154924
Cost after iteration 100: 7.281169
Cost after iteration 200: 3.629862
Cost after iteration 300: 2.397365
Cost after iteration 400: 1.785042
Cost after iteration 500: 1.420263
Cost after iteration 600: 1.178574
Cost after iteration 700: 1.006830
Cost after iteration 800: 0.878577
Cost after iteration 900: 0.779190
Cost after iteration 1000: 0.699929
**********************************************************************
File "__main__", line 3, in __main__
Failed example:
    print(np.round(costs[0], 3))
Expected:
    55.748
Got:
    55.155
**********************************************************************
File "__main__", line 5, in __main__
Failed example:
    print(np.round(costs[59], 3))
Expected:
    2.667
Got:
    1.199
**********************************************************************
File "__main__", line 7, in __main__
Failed example:
    print(np.round(w[17], 3))
Expected:
    [-0.053]
Got:
    [2.609]
**********************************

TestResults(failed=4, attempted=4)

Now call the unvectorized_descent() function. Pass the X_train, Y_train, number of iterations (set to 1001) and a learning rate of 0.1. You should see the cost decrease as the number of iterations increase.


In [26]:
def unvectorized_descent2(X_train, Y_train, num_iterations, learning_rate):
  # Extract number of examples and number of features.
  m, num_features = X_train.shape[1], X_train.shape[0]

  # Initialize parameters.
  w = np.zeros((num_features, 1))
  b = 0
  #w,b=intialize_parameters(num_features)

  # Initialize list of costs.
  costs = []

  for iteration in range(0, num_iterations+1):

    # Initialize cost and gradients.
    J = 0
    dw = np.zeros((num_features, 1))
    db = 0

    # Iterate over examples.
    for i in range(0, m):

      # Extract a single example.
      x = X_train[:, i].reshape(num_features, 1)
      y = Y_train[:, i]

      # Compute forward propagation for a single example.
      a = 1 / (1 + np.exp(-(np.dot(w.T, x) + b)))

      # Compute the cost for a single example.
      epsilon = 1e-15
      J += -(y * np.log(a + epsilon) + (1 - y) * np.log(1 - a + epsilon))
      cost = J / m

      # Backpropagation for a single example.
      # Calculate derivative.
      dz = a - y

      # Iterate over features.
      for j in range(num_features):
        # Calculate gradient.
        dw[j] += x[j] * dz[0]  # Adjusted here

      # Calculate derivative of cost with respect to the bias.
      db += dz[0]  # Adjusted here

    # Divide by m.
    J = J / m
    dw = dw / m
    db = db / m

    # Update parameters by looping over weights and derivatives.
    for i in range(num_features):
      w[i] = w[i] - learning_rate * dw[i]

    b = b - learning_rate * db

    # Record the costs
    if iteration % 10 == 0:
        costs.append(cost[0])

    # Print the cost every 100 training iterations
    if iteration % 100 == 0:
      print("Cost after iteration %i: %f" %(iteration, cost[0]))

  return w, b, costs


In [27]:
# Call your function.

#w, b, costs = None

w, b, costs = unvectorized_descent2(X_train, Y_train, 1001, 0.1)



import doctest

"""
  >>> print(np.round(costs[0], 3))
  55.748
  >>> print(np.round(costs[59], 3))
  2.667
  >>> print(np.round(w[17], 3))
  [-0.053]
  >>> print(np.round(b, 3))
  [0.014]
"""

doctest.testmod()

Cost after iteration 0: 0.693147
Cost after iteration 100: 0.148567
Cost after iteration 200: 0.088970
Cost after iteration 300: 0.062738
Cost after iteration 400: 0.048217
Cost after iteration 500: 0.039061
Cost after iteration 600: 0.032783
Cost after iteration 700: 0.028221
Cost after iteration 800: 0.024760
Cost after iteration 900: 0.022047
Cost after iteration 1000: 0.019865
**********************************************************************
File "__main__", line 3, in __main__
Failed example:
    print(np.round(costs[0], 3))
Expected:
    55.748
Got:
    [0.693]
**********************************************************************
File "__main__", line 5, in __main__
Failed example:
    print(np.round(costs[59], 3))
Expected:
    2.667
Got:
    [0.033]
**********************************************************************
File "__main__", line 7, in __main__
Failed example:
    print(np.round(w[17], 3))
Expected:
    [-0.053]
Got:
    [-0.041]
*******************************

TestResults(failed=3, attempted=4)

In [None]:
m, num_features = X_train.shape[1], X_train.shape[0]

w, b = initialize_parameters(num_features)

print(w)
print(b)

[[ 1.78862847e-02]
 [ 4.36509851e-03]
 [ 9.64974681e-04]
 [-1.86349270e-02]
 [-2.77388203e-03]
 [-3.54758979e-03]
 [-8.27414815e-04]
 [-6.27000677e-03]
 [-4.38181690e-04]
 [-4.77218030e-03]
 [-1.31386475e-02]
 [ 8.84622380e-03]
 [ 8.81318042e-03]
 [ 1.70957306e-02]
 [ 5.00336422e-04]
 [-4.04677415e-03]
 [-5.45359948e-03]
 [-1.54647732e-02]
 [ 9.82367434e-03]
 [-1.10106763e-02]
 [-1.18504653e-02]
 [-2.05649899e-03]
 [ 1.48614836e-02]
 [ 2.36716267e-03]
 [-1.02378514e-02]
 [-7.12993200e-03]
 [ 6.25244966e-03]
 [-1.60513363e-03]
 [-7.68836350e-03]
 [-2.30030722e-03]
 [ 7.45056266e-03]
 [ 1.97611078e-02]
 [-1.24412333e-02]
 [-6.26416911e-03]
 [-8.03766095e-03]
 [-2.41908317e-02]
 [-9.23792022e-03]
 [-1.02387576e-02]
 [ 1.12397796e-02]
 [-1.31914233e-03]
 [-1.62328545e-02]
 [ 6.46675452e-03]
 [-3.56270759e-03]
 [-1.74314104e-02]
 [-5.96649642e-03]
 [-5.88594380e-03]
 [-8.73882298e-03]
 [ 2.97138154e-04]
 [-2.24825777e-02]
 [-2.67761865e-03]
 [ 1.01318344e-02]
 [ 8.52797841e-03]
 [ 1.1081875

In [71]:
def unvectorized_descent(X_train, Y_train, num_iterations, learning_rate):
  # Extract number of examples and number of features.

  m, num_features = X_train.shape[1], X_train.shape[0]

  # Initialize parameters.

  np.random.seed(3) #do not change - for grading purposes

  #w = None
  #b = None
  w = np.random.randn(num_features, 1) * 0.01
  b = 0

  # Initialize list of costs.
  costs = []

  for iteration in range(0, num_iterations+1):

    # Initialize cost and gradients.
    J = 0

    dw = np.zeros((num_features, 1))


    db = 0

    # Iterate over examples.
    for i in range(0, m):

      # Extract a single example.
      x = X_train[:, i].reshape(num_features, 1)
      y = Y_train[:, i]

      # Compute forward propagation for a single example.
      a = 1 / (1 + np.exp(-(np.dot(w.T, x) + b)))

      # Compute the cost for a single example.
      epsilon = 1e-8
      J += -(y * np.log(a + epsilon) + (1 - y) * np.log(1 - a + epsilon))
      cost = J / m

      # Backpropagation for a single example.
      # Calculate derivative.
      dz = a - y

      # Iterate over features.
      for j in range(num_features):
        # Calculate gradient.
        dw[j] += x[j] * dz[0]  # Adjusted here

      # Calculate derivative of cost with respect to the bias.
      db += dz[0]  # Adjusted here

    # Divide by m.
    J = J / m
    dw = dw / m
    db = db / m

    # Update parameters by looping over weights and derivatives.
    for i in range(num_features):
      w[i] = w[i] - learning_rate * dw[i]

    b = b - learning_rate * db

    # Record the costs
    if iteration % 10 == 0:
        costs.append(cost[0])

    # Print the cost every 100 training iterations
    if iteration % 100 == 0:
      print("Cost after iteration %i: %f" %(iteration, cost[0]))

  return w, b, costs


In [72]:
# Call your function.

#w, b, costs = None

w, b, costs = unvectorized_descent(X_train, Y_train, 1001, 0.1)



import doctest

"""
  >>> print(np.round(costs[0], 3))
  55.748
  >>> print(np.round(costs[59], 3))
  2.667
  >>> print(np.round(w[17], 3))
  [-0.053]
  >>> print(np.round(b, 3))
  [0.014]
"""

doctest.testmod()

Cost after iteration 0: 0.696855
Cost after iteration 100: 0.148962
Cost after iteration 200: 0.089114
Cost after iteration 300: 0.062813
Cost after iteration 400: 0.048263
Cost after iteration 500: 0.039092
Cost after iteration 600: 0.032805
Cost after iteration 700: 0.028238
Cost after iteration 800: 0.024773
Cost after iteration 900: 0.022058
Cost after iteration 1000: 0.019873
**********************************************************************
File "__main__", line 3, in __main__
Failed example:
    print(np.round(costs[0], 3))
Expected:
    55.748
Got:
    [0.697]
**********************************************************************
File "__main__", line 5, in __main__
Failed example:
    print(np.round(costs[59], 3))
Expected:
    2.667
Got:
    [0.033]
**********************************************************************
1 items had failures:
   2 of   4 in __main__
***Test Failed*** 2 failures.


TestResults(failed=2, attempted=4)

In [None]:
costs[59]

array([0.03334324])

In [None]:
costs[0]

array([0.69685521])

### Unvectorized Predictions


**Step 1-7** :

Now let’s build a function to make predictions on the dataset. First, extract the number of examples again and initialize an array of zeroes to the shape (1, number of examples). This will be our array Y_hat, which is the array of our predictions. It is this shape because we will make a single prediction (0 or 1) for each example in the set. Now, loop over the examples in the set, and utilize the extract() function again to extract a single column x. We don’t need y this time, so you can use the blank identifier “_” to “throw away” the value of y when calling extract(), since it returns both x and y. Next, call feedforward() using your optimized parameters to calculate the activation for this example. Index into Y_hat using the multidimensional indexing technique you used in extract() and set the ith element of Y_hat (corresponding to the ith element of the test set), equal to a.

When this for loop completes, you will have an array Y_hat of shape (1, number of examples), where each element is the activation for the corresponding example. The final step is to turn these activations into predicted labels. Recall that the sigmoid function outputs a number between 0 and 1. You will need to decide how close the activation needs to be to 1 for it to count as a predicted label of 1. Since the model is perforing binary classification, you can set a threshold of 0.5, above which all activations are rounded up to 1,a nd below which they are rounded down to 0. There are many ways to do this, including an if/else block, or using np.where() for a single line implementation. Finally, return Y_hat.  You can now go ahead and call your prediction function on both X_train and X_test.

Lastly, you’ll want to write an evaluate() function which calculates the accuracy of your predictions. The general idea is that you should compare your predictions array with the ground truth label array. There are a few ways to do this. See if you can use  vector addition with the np.mean and np.abs functions to see how many predicted labels matched the ground truth labels, and return the accuracy as a percentage.


In [None]:
def unvectorized_prediction(w, b, X):
    m = X.shape[1]
    Y_hat = np.zeros((1, m))

    for i in range(m):
        x = X[:, i].reshape(w.shape[0], 1)
        a = 1 / (1 + np.exp(-(np.dot(w.T, x) + b)))

        # Set the right element of Y_hat to a
        # The right element can be accessed using [:, example]
        Y_hat[:, i] = a

    # Change all elements to 0 or 1 using np.where()
    Y_hat = np.where(Y_hat > 0.5, 1, 0)

    return Y_hat


In [None]:
# Check your prediction function is working properly.


#predictions_train = None
#predictions_test = None


predictions_train = unvectorized_prediction(w, b, X_train)
predictions_test = unvectorized_prediction(w, b, X_test)

#print (predictions_test)


import doctest

"""
  >>> print(predictions_test)
  [[1 0 0 1 1 1 0 0 1 1 1 0 1 1 1 0 1 1 1 0]]
"""

doctest.testmod()

TestResults(failed=0, attempted=1)

### Evaluation

**Step 1-8** :

Finally, call the evaluate() function. You should see that your network performs extremely well on the training data, at around 100%.  Recall that a high train score but lower test score would be a sign of overfitting.  Check your model by evaluating the train and test sets.


In [None]:
def evaluate(w, b, Y_hat, Y):

  #accuracy = None
  accuracy = np.mean(Y_hat == Y) * 100
  return accuracy

  print("Accuracy: {} %".format(accuracy))

In [None]:
# Calculate train accuracy.

#train_accuracy = None

train_accuracy = evaluate(w, b, predictions_train, Y_train)

import doctest

"""
  >>> print('Train accuracy:', train_accuracy)
  Train accuracy: 100.0
"""

doctest.testmod()

TestResults(failed=0, attempted=1)

In [None]:
# Calculate test accuracy.

#test_accuracy = None

test_accuracy = evaluate(w, b, predictions_train, Y_train)

"""
  >>> print('Test accuracy:', test_accuracy)
  Test accuracy: 100.0
"""

doctest.testmod()

TestResults(failed=0, attempted=1)

Congratulations! You have now implemented unvectorized descent to test and classify on a synthetic dataset. As you probably realized, the algorithm was fairly complicated, despite the dataset being simplistic.

In Part 2, we will use vectorization to make our algorithm much more efficient, allowing us to apply it to a real dataset.