In [1]:
test = "Hello World"
print "test: " + test

test: Hello World


## Executing sigmoid function ##

<img src="images/Sigmoid.png" style="width:500px;height:228px;">

In [3]:
import math

def basic_sigmoid(x):
    """ Compute sigmoid of x using math.exp
    Arguments : x -- Scalar
    Return : s -- sigmoid(x)"""
    s = 1/(1+math.exp(-x))
    return s

In [9]:
basic_sigmoid(3)

0.9525741268224334

In [10]:
x = [1, 2, 3]
basic_sigmoid(x) #math doesn't take list as argument. So instead use numpy.exp

TypeError: bad operand type for unary -: 'list'

In [11]:
import numpy as np
x = [1,2,3]
print np.exp(x)

[  2.71828183   7.3890561   20.08553692]


### Implement sigmoid using numpy ###

$$ \text{For } x \in \mathbb{R}^n \text{,      } sigmoid(x) = 
sigmoid\begin{pmatrix}
    x_1 \\
    x_2 \\
    ... \\
    x_n \\
    \end{pmatrix} = \begin{pmatrix}
        \frac{1}{1+e^{-x_1}} \\
        \frac{1}{1+e^{-x_2}} \\
        ... \\
        \frac{1}{1+e^{-x_n}} \\
\end{pmatrix}\tag{1}$$ 

In [13]:
import numpy as np
def sigmoid(x):
    """ Compute sigmoid of x using np.exp
    Arguments : x --- Scalar or numpy array of any size
    Return : s --- sigmoid(x)"""
    s = 1/(1+np.exp(-x))
    return s

In [14]:
x = np.array([1,2,3])
sigmoid(x)

array([ 0.73105858,  0.88079708,  0.95257413])

### Implement sigmoid gradient ###
$$ sigmoid\_derivative(x) = \sigma'(x) = \sigma(x)(1-\sigma(x)) $$

In [17]:
def sigmoid_derivative(x):
    """Computes the graident (slope or derivative) of sigmoid function
    
    Arguments : x --- Scalar or numpy array
    Return : ds -- computed gradient """
    s = sigmoid(x) #calculates sigmoid of x
    ds = s*(1-s)
    return ds

In [20]:
x = np.array([1,2, 3])
print "sigmoid(x) = " + str(sigmoid(x))
print "sigmoid_derivative(x) = " + str(sigmoid_derivative(x))

sigmoid(x) = [ 0.73105858  0.88079708  0.95257413]
sigmoid_derivative(x) = [ 0.19661193  0.10499359  0.04517666]


### 1.3 - Reshaping arrays ###

Two common numpy functions used in deep learning are [np.shape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html) and [np.reshape()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html). 
- X.shape is used to get the shape (dimension) of a matrix/vector X. 
- X.reshape(...) is used to reshape X into some other dimension. 

For example, in computer science, an image is represented by a 3D array of shape $(length, height, depth = 3)$. However, when you read an image as the input of an algorithm you convert it to a vector of shape $(length*height*3, 1)$. In other words, you "unroll", or reshape, the 3D array into a 1D vector.

<img src="images/image2vector_kiank.png" style="width:500px;height:300;">

**Exercise**: Implement `image2vector()` that takes an input of shape (length, height, 3) and returns a vector of shape (length\*height\*3, 1). For example, if you would like to reshape an array v of shape (a, b, c) into a vector of shape (a*b,c) you would do:
``` python
v = v.reshape((v.shape[0]*v.shape[1], v.shape[2])) # v.shape[0] = a ; v.shape[1] = b ; v.shape[2] = c
```
- Please don't hardcode the dimensions of image as a constant. Instead look up the quantities you need with `image.shape[0]`, etc. 

In [22]:
def image2vector(image):
    """Argument : image --- numpy array of shape (length, height, depth)
    Returns : v --- vector of shape (length*height*depth,1)"""
    v = image.reshape(image.shape[0]*image.shape[1]*image.shape[2],1)
    return v

In [23]:
# This is a 3 by 3 by 2 array, typically images will be (num_px_x, num_px_y,3) where 3 represents the RGB values
image = np.array([[[ 0.67826139,  0.29380381],
        [ 0.90714982,  0.52835647],
        [ 0.4215251 ,  0.45017551]],

       [[ 0.92814219,  0.96677647],
        [ 0.85304703,  0.52351845],
        [ 0.19981397,  0.27417313]],

       [[ 0.60659855,  0.00533165],
        [ 0.10820313,  0.49978937],
        [ 0.34144279,  0.94630077]]])

print ("image2vector(image) = " + str(image2vector(image)))

image2vector(image) = [[ 0.67826139]
 [ 0.29380381]
 [ 0.90714982]
 [ 0.52835647]
 [ 0.4215251 ]
 [ 0.45017551]
 [ 0.92814219]
 [ 0.96677647]
 [ 0.85304703]
 [ 0.52351845]
 [ 0.19981397]
 [ 0.27417313]
 [ 0.60659855]
 [ 0.00533165]
 [ 0.10820313]
 [ 0.49978937]
 [ 0.34144279]
 [ 0.94630077]]


### 1.4 - Normalizing rows

Another common technique we use in Machine Learning and Deep Learning is to normalize our data. It often leads to a better performance because gradient descent converges faster after normalization. Here, by normalization we mean changing x to $ \frac{x}{\| x\|} $ (dividing each row vector of x by its norm).

For example, if $$x = 
\begin{bmatrix}
    0 & 3 & 4 \\
    2 & 6 & 4 \\
\end{bmatrix}\tag{3}$$ then $$\| x\| = np.linalg.norm(x, axis = 1, keepdims = True) = \begin{bmatrix}
    5 \\
    \sqrt{56} \\
\end{bmatrix}\tag{4} $$and        $$ x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix}
    0 & \frac{3}{5} & \frac{4}{5} \\
    \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\
\end{bmatrix}\tag{5}$$ Note that you can divide matrices of different sizes and it works fine: this is called broadcasting and you're going to learn about it in part 5.


**Exercise**: Implement normalizeRows() to normalize the rows of a matrix. After applying this function to an input matrix x, each row of x should be a vector of unit length (meaning length 1).

In [28]:
def normalizeRows(x):
    """ Normalizes each row of a matrix .
    Arguments --- numpy matrix of shape (n,m)
    Return --- matrix of shape(n,m) with each row normalized (each row to have unit length)"""
    x_normalized = x / np.linalg.norm(x,ord=2, axis = 1, keepdims = True)
    return x_normalized

In [29]:
x = np.array([
    [0, 3, 4],
    [1, 6, 4]])
print("normalizeRows(x) = " + str(normalizeRows(x)))

normalizeRows(x) = [[ 0.          0.6         0.8       ]
 [ 0.13736056  0.82416338  0.54944226]]


### 1.5 - Broadcasting and the softmax function ####
A very important concept to understand in numpy is "broadcasting". It is very useful for performing mathematical operations between arrays of different shapes. For the full details on broadcasting, you can read the official [broadcasting documentation](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).

In [37]:
def softmax(x):
    """ Calculates softmax for each row of input x.
    Arguments : x --- matrix of shape (n,m)
    Return : matrix of shape(n,m) with softmax(each row)"""
    s = np.exp(x) / np.sum(np.exp(x), axis = 1, keepdims = True)
    return s

In [38]:
x = np.array([
    [9, 2, 5, 0, 0],
    [7, 5, 0, 0 ,0]])
print("softmax(x) = " + str(softmax(x)))

softmax(x) = [[  9.80897665e-01   8.94462891e-04   1.79657674e-02   1.21052389e-04
    1.21052389e-04]
 [  8.78679856e-01   1.18916387e-01   8.01252314e-04   8.01252314e-04
    8.01252314e-04]]


### 2.1 Implement the L1 and L2 loss functions

**Exercise**: Implement the numpy vectorized version of the L1 loss. You may find the function abs(x) (absolute value of x) useful.

**Reminder**:
- The loss is used to evaluate the performance of your model. The bigger your loss is, the more different your predictions ($ \hat{y} $) are from the true values ($y$). In deep learning, you use optimization algorithms like Gradient Descent to train your model and to minimize the cost.
- L1 loss is defined as:
$$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}| \end{align*}\tag{6}$$

In [45]:
def L1(yhat, y):
    """ Arguments : 
    yhat --- predicted labels (vector of size m)
    y --- true labels (vector of size m)
    
    Return : loss -- value of L1 loss function defined above"""
    
    loss = np.sum(np.abs(yhat - y))
    return loss

In [46]:
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L1 = " + str(L1(yhat,y)))

L1 = 1.1


**Exercise**: Implement the numpy vectorized version of the L2 loss. There are several way of implementing the L2 loss but you may find the function np.dot() useful. As a reminder, if $x = [x_1, x_2, ..., x_n]$, then `np.dot(x,x)` = $\sum_{j=0}^n x_j^{2}$. 

- L2 loss is defined as $$\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^m(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}\tag{7}$$

In [47]:
def L2(yhat, y):
    """ Arguments : 
    yhat --- predicted labels (vector of size m)
    y --- true labels (vector of size m)
    
    Return : loss -- value of L2 loss function defined above"""
    
    loss = np.sum(np.dot((yhat-y),(yhat-y)))
    return loss

In [48]:
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L2 = " + str(L2(yhat,y)))

L2 = 0.43
