In [1]:
from IPython.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

## This lab is adapted from programming assignments of the class "Neural Networks and Deep Learning" on Coursera
## https://www.coursera.org/learn/neural-networks-deep-learning

# Lab overview

## Objectives

After finishing this lab, we would be familiar with:
- Some basic functions and operators widely used in machine learning/ deep learning and how to implement them using Numpy
- Vector/ matrix operations and broadcasting 
- Writing efficient code by using vectorization

# Exercise

You only need to modify "(≈ X lines of code)" in the below exercises. Bear in mind that
X is our approximation for the work needed to be done, 
you definetely can write shorter/longer code.


Start by printing "Lab 1 is awesome" in the next cell


In [3]:
# (≈ 1 line of code)
# test = 
# YOUR CODE STARTS HERE

# YOUR CODE ENDS HERE

print(word)




**Expected output**:
Lab 1 is awesome

In [2]:
# You need these libraries for following exercises

import math
import numpy as np
import time
from test_lab_1 import *

## Building basic functions with Numpy



### Basic sigmoid

$sigmoid(x) = \frac{1}{1+e^{-x}}$ also known as logistic function. Sigmoid function is broadly used not only in machine learning (logistic regression) but also in deep learning as activation function.

You need to build a function that returns the sigmoid value of a real number x. Use math.exp(x) for exponential function.

In [3]:
def basic_sigmoid(x):
    """
    Compute sigmoid of x.

    Arguments:
    x -- A scalar

    Return:
    s -- sigmoid(x)
    """
    assert type(x) == int or type(x) == float, 'Input must be integer or real number'
    
    # (≈ 1 line of code)
    # s = 
    # YOUR CODE STARTS HERE
    
    # YOUR CODE ENDS HERE
    
    return s

In [4]:
print("basic_sigmoid(1) = " + str(basic_sigmoid(1)))

test_basic_sigmoid(basic_sigmoid)

basic_sigmoid(1) = 0.7310585786300049
All test cases are pass.


"math" is a great library for mathematical tasks but it can only deal with scalar. In practice, machine learning/ deep learning inputs are vectors and matrices. That's why Numpy is more advantageous.

### Sigmoid

Implement the sigmoid function using Numpy. The input x can be either a scalar, a vector or a matrix

$$
\text{For } x \in \mathbb{R}^{m \times n} \text{,     } sigmoid(x) = sigmoid
\left(\begin{array}{cccc}
a_{11} & a_{12} & \cdots & a_{1n} \\
a_{21} & a_{22} & \cdots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m1} & a_{m2} & \cdots & a_{mn}
\end{array}\right)
=
\left(\begin{array}{cccc}
\frac{1}{1+e^{-a_{11}}} & \frac{1}{1+e^{-a_{12}}} & \cdots & \frac{1}{1+e^{-a_{1n}}} \\
\frac{1}{1+e^{-a_{21}}} & \frac{1}{1+e^{-a_{22}}} & \cdots & \frac{1}{1+e^{-a_{2n}}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{1}{1+e^{-a_{m1}}} & \frac{1}{1+e^{-a_{m2}}} & \cdots & \frac{1}{1+e^{-a_{mn}}}
\end{array}\right)
$$

In [5]:
def sigmoid(x):
    """
    Compute the sigmoid of x

    Arguments:
    x -- A scalar or numpy array of any size

    Return:
    s -- sigmoid(x)
    """
    
    assert type(x) == int or type(x) == float or type(x) == np.ndarray, \
            'Input must be integer/ real number/ numpy array'
    
    # (≈ 1 line of code)
    # s = 
    # YOUR CODE STARTS HERE
    
    # YOUR CODE ENDS HERE
    
    return s

In [6]:
t_x = np.array([1, 2, 3])
print("sigmoid(t_x) = " + str(sigmoid(t_x)))

test_sigmoid(sigmoid)

sigmoid(t_x) = [0.73105858 0.88079708 0.95257413]
All test cases are pass.


### Sigmoid derivative

As you may know, gradient plays a key role in backpropagation algorithm. In this exercise, you need to implement the sigmoid_derivative() to compute the derivative of sigmoid function with respect to its input.

$$sigmoid\_derivative(x) = \sigma'(x) = \sigma(x) (1 - \sigma(x))\tag{2}$$

Your code may involve two steps:
1. Compute sigmoid of x. You can use the function in previous exercise
2. Compute derivative of sigmoid(x): $\sigma'(x) = s(1-s)$

In [7]:
def sigmoid_derivative(x):
    """
    Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x.
    You can store the output of the sigmoid function into variables and then use it to calculate the gradient.
    
    Arguments:
    x -- A scalar or numpy array

    Return:
    ds -- Your computed gradient.
    """
    assert type(x) == int or type(x) == float or type(x) == np.ndarray, \
            'Input must be integer/ real number/ numpy array'
    
    #(≈ 2 lines of code)
    # s = 
    # ds = 
    # YOUR CODE STARTS HERE
    
    # YOUR CODE ENDS HERE
    
    return ds

In [8]:
t_x = np.array([1, 2, 3])
print ("sigmoid_derivative(t_x) = " + str(sigmoid_derivative(t_x)))

test_sigmoid_derivative(sigmoid_derivative)

sigmoid_derivative(t_x) = [0.19661193 0.10499359 0.04517666]
All test cases are pass.


### Reshaping arrays

In many machine learning algorithms, especially in computer vision tasks, before feeding your data to the model,
you need to reshape or change the input dimension. For example, an RGB image can be represented by a 
3-dimensional (3D) array of shape (length, height, depth = 3) might need to be converted to a vector of shape 
(length * height * 3, 1) so that the algorithm can "read" it.

Implenment image2vector() function that convert a 3D (length, height, 3) array to 1D (length * height * 3, 1) one. You may find the built-in function "shape" useful.

In [9]:
def image2vector(image):
    """
    Argument:
    image -- a numpy array of shape (length, height, depth)
    
    Returns:
    v -- a vector of shape (length*height*depth, 1)
    """
    
    assert len(t_image.shape) == 3, 'Image must be a numpy array of shape (length, height, depth)'
    
    # (≈ 1 line of code)
    # v =
    # YOUR CODE STARTS HERE
    
    # YOUR CODE ENDS HERE
    
    return v

In [10]:
# This is a 3 by 3 by 2 array, typically images will be (num_px_x, num_px_y,3) where 3 represents the RGB values
t_image = np.array([[[ 0.67826139,  0.29380381],
                     [ 0.90714982,  0.52835647],
                     [ 0.4215251 ,  0.45017551]],

                   [[ 0.92814219,  0.96677647],
                    [ 0.85304703,  0.52351845],
                    [ 0.19981397,  0.27417313]],

                   [[ 0.60659855,  0.00533165],
                    [ 0.10820313,  0.49978937],
                    [ 0.34144279,  0.94630077]]])

print ("image2vector(image) = " + str(image2vector(t_image)))

test_image2vector(image2vector)

image2vector(image) = [[0.67826139]
 [0.29380381]
 [0.90714982]
 [0.52835647]
 [0.4215251 ]
 [0.45017551]
 [0.92814219]
 [0.96677647]
 [0.85304703]
 [0.52351845]
 [0.19981397]
 [0.27417313]
 [0.60659855]
 [0.00533165]
 [0.10820313]
 [0.49978937]
 [0.34144279]
 [0.94630077]]
All test cases are pass.


### Normalizing rows

Data normalizing is a well-known technique we usually use in Machine Learning models. Normalization helps 
gradient descent converges faster (refer to this paper if you want to learn more
https://arxiv.org/abs/1502.03167). Here, by normalization we mean changing x to $ \frac{x}{\| x\|} $ (dividing each row vector of x by its norm).

For example, if 
$$x = \begin{bmatrix}
        0 & 6 & 8 \\
        3 & 1 & 2 \\
\end{bmatrix}$$ 
then 
$$\| x\| = \text{np.linalg.norm(x, axis=1, keepdims=True)} = \begin{bmatrix}
    10 \\
    \sqrt{14} \\
\end{bmatrix}$$
and
$$ x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix}
    0 & \frac{3}{5} & \frac{4}{5} \\
    \frac{3}{\sqrt{14}} & \frac{1}{\sqrt{14}} & \frac{2}{\sqrt{14}} \\
\end{bmatrix}$$ 

Note that you can divide matrices of different sizes and it works fine: this is called broadcasting, `keepdims=True` the result will broadcast correctly against the original x.

Implement normalize_rows() to normalize the rows of a matrix. Note that after normalizing, each row of x should be 
a vector of unit length (meaning length = 1)

In [11]:
def normalize_rows(x):
    """
    Implement a function that normalizes each row of the matrix x (to have unit length).
    
    Argument:
    x -- A numpy matrix of shape (n, m)
    
    Returns:
    x -- The normalized (by row) numpy matrix. You are allowed to modify x.
    """
    
    assert len(x.shape) == 2, 'Input must be a numpy array of shape (n,m)'
    
    #(≈ 2 lines of code)
    # Compute x_norm as the norm 2 of x. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True)
    # x_norm =
    # Divide x by its norm.
    # x =
    # YOUR CODE STARTS HERE
    
    # YOUR CODE ENDS HERE

    return x

In [12]:
x = np.array([[0, 3, 4],
              [1, 6, 4]])
print("normalizeRows(x) = " + str(normalize_rows(x)))

test_normalize_rows(normalize_rows)

normalizeRows(x) = [[0.         0.6        0.8       ]
 [0.13736056 0.82416338 0.54944226]]
All test cases are pass.


### Softmax function

Softmax is a normalizing function that is really helpful in many multiclass classification problems.
Implement softmax function with following instructions:

- $\text{for } x \in \mathbb{R}^{1\times n} \text{,     }$

\begin{align*}
 softmax(x) &= softmax\left(\begin{bmatrix}
    x_1  &&
    x_2 &&
    ...  &&
    x_n  
\end{bmatrix}\right) \\&= \begin{bmatrix}
    \frac{e^{x_1}}{\sum_{j}e^{x_j}}  &&
    \frac{e^{x_2}}{\sum_{j}e^{x_j}}  &&
    ...  &&
    \frac{e^{x_n}}{\sum_{j}e^{x_j}} 
\end{bmatrix} 
\end{align*}

- $\text{for a matrix } x \in \mathbb{R}^{m \times n} \text{,  $x_{ij}$ maps to the element in the $i^{th}$ row and $j^{th}$ column of $x$, thus we have: }$  

\begin{align*}
softmax(x) &= softmax\begin{bmatrix}
            x_{11} & x_{12} & x_{13} & \dots  & x_{1n} \\
            x_{21} & x_{22} & x_{23} & \dots  & x_{2n} \\
            \vdots & \vdots & \vdots & \ddots & \vdots \\
            x_{m1} & x_{m2} & x_{m3} & \dots  & x_{mn}
            \end{bmatrix} \\ \\&= 
 \begin{bmatrix}
    \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots  & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\
    \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots  & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots  & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}}
\end{bmatrix} \\ \\ &= \begin{pmatrix}
    softmax\text{(first row of x)}  \\
    softmax\text{(second row of x)} \\
    \vdots  \\
    softmax\text{(last row of x)} \\
\end{pmatrix} 
\end{align*}

In [13]:
def softmax(x):
    """Calculates the softmax for each row of the input x.

    Your code should work for a row vector and also for matrices of shape (m,n).

    Argument:
    x -- A numpy matrix of shape (m,n)

    Returns:
    s -- A numpy matrix equal to the softmax of x, of shape (m,n)
    """
    
    assert len(x.shape) == 2, 'Input must be numpy array of shape (m,n)'
    
    #(≈ 3 lines of code)
    # Apply exp() element-wise to x. Use np.exp(...).
    # x_exp = ...

    # Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True).
    # x_sum = ...
    
    # Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting.
    # s = ...
    
    # YOUR CODE STARTS HERE
    
    # YOUR CODE ENDS HERE
    
    return s

In [14]:
t_x = np.array([[9, 2, 5, 0, 0],
                [7, 5, 0, 0 ,0]])
print("softmax(x) = " + str(softmax(t_x)))

test_softmax(softmax)

softmax(x) = [[9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-04
  1.21052389e-04]
 [8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-04
  8.01252314e-04]]
All test cases are pass.


## Vectorization

Vectorization is a technique which helps us to speed up our code by removing loops. You can understand by looking
at the following examples of computational time of dot, outer and elementwise product

In [15]:
x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

print("Computation Time: ")

print("\n + Dot Product:")
### CLASSIC DOT PRODUCT OF VECTORS IMPLEMENTATION ###
tic = time.process_time()
dot = 0

for i in range(len(x1)):
    dot += x1[i] * x2[i]
toc = time.process_time()
print ("   - Classic implementation: " + str(1000 * (toc - tic)) + "ms")

### VECTORIZED DOT PRODUCT OF VECTORS ###
tic = time.process_time()
dot = np.dot(x1,x2)
toc = time.process_time()
print ("   - Vectorized implementation: " + str(1000 * (toc - tic)) + "ms")

print("\n + Outer Product:")
### CLASSIC OUTER PRODUCT IMPLEMENTATION ###
tic = time.process_time()
outer = np.zeros((len(x1), len(x2))) # we create a len(x1)*len(x2) matrix with only zeros

for i in range(len(x1)):
    for j in range(len(x2)):
        outer[i,j] = x1[i] * x2[j]
toc = time.process_time()
print ("   - Classic implementation: " + str(1000 * (toc - tic)) + "ms")

### VECTORIZED OUTER PRODUCT ###
tic = time.process_time()
outer = np.outer(x1,x2)
toc = time.process_time()
print ("   - Vectorized implementation: " + str(1000 * (toc - tic)) + "ms")

print("\n + Elementwise multiplication:")
### CLASSIC ELEMENTWISE IMPLEMENTATION ###
tic = time.process_time()
mul = np.zeros(len(x1))

for i in range(len(x1)):
    mul[i] = x1[i] * x2[i]
toc = time.process_time()
print ("   - Classic implementation: " + str(1000 * (toc - tic)) + "ms")

### VECTORIZED ELEMENTWISE MULTIPLICATION ###
tic = time.process_time()
mul = np.multiply(x1,x2)
toc = time.process_time()
print ("   - Vectorized implementation: " + str(1000*(toc - tic)) + "ms")

print("\n + General Dot Product:")
### CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ###
W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array
tic = time.process_time()
gdot = np.zeros(W.shape[0])

for i in range(W.shape[0]):
    for j in range(len(x1)):
        gdot[i] += W[i,j] * x1[j]
toc = time.process_time()
print ("   - Classic implementation: " + str(1000 * (toc - tic)) + "ms")

### VECTORIZED GENERAL DOT PRODUCT ###
tic = time.process_time()
dot = np.dot(W,x1)
toc = time.process_time()
print ("   - Vectorized implementation: " + str(1000 * (toc - tic)) + "ms")

Computation Time: 

 + Dot Product:
   - Classic implementation: 0.42999999999993044ms
   - Vectorized implementation: 0.3210000000000157ms

 + Outer Product:
   - Classic implementation: 0.5569999999999187ms
   - Vectorized implementation: 0.37399999999998546ms

 + Elementwise multiplication:
   - Classic implementation: 0.2929999999999877ms
   - Vectorized implementation: 0.22199999999994446ms

 + General Dot Product:
   - Classic implementation: 0.5349999999999522ms
   - Vectorized implementation: 1.2159999999999949ms


As you can see, vectorized implementation yields faster and cleaner code. For a bigger data, the difference in running time become enormous. Vectorization is a technique that you don't want to forget when dealing with big data, since a non-computationally-optimal calculation can lead to a huge bottleneck in your algorithm and can result in a model that take ages to run.

## Implementing L1 and L2 loss functions

Every machine learning algorithm need (a) loss function. In supervised learning, loss function is defined as the difference of your predicted output ($ \hat{y} $) and the true value ($y$). 
Implement the L1 loss function:
$$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^{m-1}|y^{(i)} - \hat{y}^{(i)}| \end{align*}$$

### L1

In [16]:
def L1(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    
    Returns:
    loss -- the value of the L1 loss function defined above
    """
    
    assert type(yhat) == type(y) == np.ndarray, 'Input must be numpy array'
    assert len(yhat) == len(y), 'yhat and y must have same size'
    
    #(≈ 1 line of code)
    # loss = 
    # YOUR CODE STARTS HERE
    
    # YOUR CODE ENDS HERE
    
    return loss

In [17]:
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L1 = " + str(L1(yhat, y)))

test_L1(L1)

L1 = 1.1
All test cases are pass.


<a name='#1'></a>
### L2

Implement the numpy vectorized version of the L2 loss. You may find the function np.dot() useful. As a reminder, if $x = [x_1, x_2, ..., x_n]$, then `np.dot(x,x)` = $\sum_{j=0}^n x_j^{2}$. 

- L2 loss is defined as $$\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^{m-1}(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}\tag{7}$$

In [18]:
def L2(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    
    Returns:
    loss -- the value of the L2 loss function defined above
    """
    assert type(yhat) == type(y) == np.ndarray, 'Input must be numpy array'
    assert len(yhat) == len(y), 'yhat and y must have same size'
    
    #(≈ 1 line of code)
    # loss = ...
    # YOUR CODE STARTS HERE
    
    # YOUR CODE ENDS HERE
    
    return loss

In [19]:
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])

print("L2 = " + str(L2(yhat, y)))

test_L2(L2)

L2 = 0.43
All test cases are pass.


# Takeaway

- This lab introduces you small but essential "pieces" (numpy functions, data processing and normalizing, L1/L2 loss functions,...) which help you to build a full-fledged machine learning/ deep learning model
- Broadcasting and vectorization are techiniques that you need to get familiar with
- Avoid hardcoding (In image2vector exercise, you can't pass all the test cases if you try to hardcode the image's dimensions, using "shape" to look up the quantities you need is the best solution)