Exercise: Set test to "Hello World" in the cell below to print "Hello World" and run the two cells below.

In [1]:
test = "hello world"
print("test: "+test)

test: hello world


Numpy is the main package for scientific computing in Python. It is maintained by a large community (www.numpy.org). 
To refer to a function belonging to a specific package you could call it using package_name.function()
when need more information on numpy function, look at the official documentation

In [2]:
#basic functions with numpy
#sigmoid function, np.exp():why np.exp() is preferable to math.exp():Actually, we rarely use the "math" library in deep learning because the inputs of the functions are real numbers. 
#In deep learning we mostly use matrices and vectors. This is why numpy is more useful.
import math
def basic_sigmoid(x):
    s = 1/(1+math.exp(-x))
    
    return s
basic_sigmoid(3)

0.9525741268224334

In [3]:
#x= [1,2,3]
#basic_sigmoid(x)

In [4]:
import numpy as np
x = np.array([1,2,3])
print(np.exp(x))# result is (exp(1), exp(2), exp(3))

[ 2.71828183  7.3890561  20.08553692]


In [5]:
#if x is a vector, then a Python operation such as s=x+3 or s=1/x will output s as a vector of the same size as x
x = np.array([1,2,3])
print(x+3)

[4 5 6]


In [6]:
#Implement the sigmoid function using numpy
import numpy as np# this means you can access numpy functions by writing np.function() instead of numpy.function()
def sigmoid(x):
    s = 1/(1+np.exp(-x))
    return s
x = np.array([1,2,3])
sigmoid(x)

array([0.73105858, 0.88079708, 0.95257413])

$s=\frac{1}{1+e^{-x}} $;

$\frac{1}{s}=1+e^{-x};$

$\frac{s'}{s^2}=-e^{-x}=1-\frac{1}{s};$

$s'=s(s-1)$

In [7]:
#sigmoid gradient
def sigmoid_derivative(x):
    s = 1/(1+np.exp(-x))
    ds = s*(1-s)
    return ds

x = np.array([1,2,3])
print("sigmoid_derivative(x)="+str(sigmoid_derivative(x)))

sigmoid_derivative(x)=[0.19661193 0.10499359 0.04517666]


Reshaping arrays:
- X.shape is used to get the shape (dimension) of a matrix/vector X.
- X.reshape(...) is used to reshape X into some other dimension.
For example, in computer science, an image is represented by a 3D array of shape $(length,height,depth=3)$ . However, when you read an image as the input of an algorithm you convert it to a vector of shape $(length*height*3,1)$. In other words, you "unroll", or reshape, the 3D array into a 1D vector.

In [8]:
def image2vector(image):
    v = image.reshape((image.shape[0]*image.shape[1]*image.shape[2],1))
    return v

image = np.array([[[ 0.67826139,  0.29380381],
        [ 0.90714982,  0.52835647],
        [ 0.4215251 ,  0.45017551]],

       [[ 0.92814219,  0.96677647],
        [ 0.85304703,  0.52351845],
        [ 0.19981397,  0.27417313]],

       [[ 0.60659855,  0.00533165],
        [ 0.10820313,  0.49978937],
        [ 0.34144279,  0.94630077]]])

print ("image2vector(image) = "+str(image2vector(image)))

image2vector(image) = [[0.67826139]
 [0.29380381]
 [0.90714982]
 [0.52835647]
 [0.4215251 ]
 [0.45017551]
 [0.92814219]
 [0.96677647]
 [0.85304703]
 [0.52351845]
 [0.19981397]
 [0.27417313]
 [0.60659855]
 [0.00533165]
 [0.10820313]
 [0.49978937]
 [0.34144279]
 [0.94630077]]


Normalizing rows

It often leads to a better performance because gradient descent converges faster after normalization. Here, by normalization we mean changing x to $\frac{x}{\|x\|}$(dividing each row vector of x by its norm)
For example, if
$$
x=\left[\begin{array}{lll}
0 & 3 & 4 \\
2 & 6 & 4
\end{array}\right]
$$
then
$$
\|x\|=\text { np.linalg. norm }(x, \text { axis }=1, \text { keepdims }=\text { True })=\left[\begin{array}{c}
5 \\
\sqrt{56}
\end{array}\right]
$$
and
$$
x_{-} \text {normalized }=\frac{x}{\|x\|}=\left[\begin{array}{ccc}
0 & \frac{3}{5} & \frac{4}{5} \\
\frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}}
\end{array}\right]
$$

you can divide matrices of different sizes and it works fine, this is called broadcasting

In [9]:
def normalizeRows(x):
    x_norm = np.linalg.norm(x,ord = 2, axis = 1,keepdims = True)
    x=x/x_norm

    return x

x = np.array([
    [0,3,4],
    [1,6,4]
])
print("normalizeRows(x) = "+str(normalizeRows(x)))

normalizeRows(x) = [[0.         0.6        0.8       ]
 [0.13736056 0.82416338 0.54944226]]


#Broadcasting and the softmax function
- for $x \in \mathbb{R}^{1 \times n}, \operatorname{softmax}(x)=\operatorname{softmax}\left(\left[\begin{array}{llll}x_{1} & x_{2} & \ldots & x_{n}\end{array}\right]\right)=\left[\begin{array}{llll}\frac{e^{x_{1}}}{\sum_{j} e^{x_{j}}} & \frac{e^{x_{2}}}{\sum_{j} e^{x_{j}}} & \cdots & \frac{e^{x_{n}}}{\sum_{j} e^{x_{j}}}\end{array}\right]$

- $\text{for a matrix } x \in \mathbb{R}^{m \times n} \text{, $x_{ij}$ maps to the element in the $i^{th}$ row and $j^{th}$ column of $x$, thus we have: }$ 
$$
softmax(x) = softmax\begin{bmatrix} 
    x_{11} & x_{12} & x_{13} & \dots & x_{1n} \\
    x_{21} & x_{22} & x_{23} & \dots & x_{2n} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    x_{m1} & x_{m2} & x_{m3} & \dots & x_{mn} \end{bmatrix} 
           = \begin{bmatrix} 
    \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\ 
    \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}} \end{bmatrix} 
           = \begin{pmatrix} 
           softmax\text{(first row of x)} \\ 
           softmax\text{(second row of x)} \\
           ... \\ 
           softmax\text{(last row of x)}  \end{pmatrix} 
$$

In [10]:
def softmax(x):
    x_exp = np.exp(x)
    x_sum = np.sum(x_exp,axis = 1, keepdims = True)
    s = x_exp/x_sum #x_exp/x_sum works due to python broadcasting.
    return s

x = np.array([
    [9,2,5,0,0],
    [7,5,0,0,0]
])
print("softmax(x) = "+str(softmax(x)))

softmax(x) = [[9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-04
  1.21052389e-04]
 [8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-04
  8.01252314e-04]]


In deep learning, you deal with very large datasets. Hence, a non-computationally-optimal function can become a huge bottleneck in your algorithm and can result in a model that takes ages to run. To make sure that your code is computationally efficient, you will use vectorization. For example, try to tell the difference between the following implementations of the dot/outer/elementwise product.

In [11]:
import time
x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

### CLASSIC DOT PRODUCT OF VECTORS IMPLEMENTATION ###
tic = time.process_time()
dot = 0
for i in range(len(x1)):
    dot += x1[i]*x2[i]
toc = time.process_time()
print("dot ="+str(dot)+"\n ----- Computation time = "+str(1000*(toc-tic))+"ms")

dot =278
 ----- Computation time = 0.25948100000006136ms


In [12]:
### CLASSIC OUTER PRODUCT IMPLEMENTATION ###
tic = time.process_time()
outer = np.zeros((len(x1),len(x2)))

for i in range(len(x1)):
    for j in range(len(x2)):
        outer[i,j] = x1[i]*x2[j]

toc = time.process_time()
print("outer = " + str(outer)+"\n ----- Computation time = " + str(1000*(toc-tic)) + "ms")

outer = [[81. 18. 18. 81.  0. 81. 18. 45.  0.  0. 81. 18. 45.  0.  0.]
 [18.  4.  4. 18.  0. 18.  4. 10.  0.  0. 18.  4. 10.  0.  0.]
 [45. 10. 10. 45.  0. 45. 10. 25.  0.  0. 45. 10. 25.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [63. 14. 14. 63.  0. 63. 14. 35.  0.  0. 63. 14. 35.  0.  0.]
 [45. 10. 10. 45.  0. 45. 10. 25.  0.  0. 45. 10. 25.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [81. 18. 18. 81.  0. 81. 18. 45.  0.  0. 81. 18. 45.  0.  0.]
 [18.  4.  4. 18.  0. 18.  4. 10.  0.  0. 18.  4. 10.  0.  0.]
 [45. 10. 10. 45.  0. 45. 10. 25.  0.  0. 45. 10. 25.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]
 ----- Computation time = 0.5271730000000474ms

In [13]:
### CLASSIC ELEMENTWISE IMPLEMENTATION ###
tic = time.process_time()
mul = np.zeros(len(x1))
for i in range(len(x1)):
    mul[i] = x1[i]*x2[i]
toc = time.process_time()
print("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

elementwise multiplication = [81.  4. 10.  0.  0. 63. 10.  0.  0.  0. 81.  4. 25.  0.  0.]
 ----- Computation time = 0.1910570000001055ms


In [14]:
### CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ###
W = np.random.rand(3,len(x1)) #Random 3*len(x1) numpy array
tic = time.process_time()
gdot = np.zeros(W.shape[0])
for i in range(W.shape[0]):
    for j in range(len(x1)):
        gdot[i] += W[i,j]*x1[j]#矩阵向量点积
toc = time.process_time()
print("gdot = " + str(gdot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

gdot = [16.5916615  26.4456432  12.60284021]
 ----- Computation time = 0.4963190000000228ms


In [25]:
x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

### VECTORIZED DOT PRODUCT OF VECTORS ###
tic = time.process_time()
dot = np.dot(x1,x2)
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
#x1*x2#can't multiply sequence by non-int of type 'list'
#6*x1#注意列表的乘法，并不是每个值乘6，而是重复六遍
#np.dot(6,x1)#每个值乘6
#np.outer(6,x1)#注意结果是外积

dot = 278
 ----- Computation time = 0.15933599999984338ms


array([[54, 12, 30,  0,  0, 42, 30,  0,  0,  0, 54, 12, 30,  0,  0]])

In [16]:
### VECTORIZED OUTER PRODUCT ###
tic = time.process_time()
outer = np.outer(x1,x2)
toc = time.process_time()
print("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

outer = [[81 18 18 81  0 81 18 45  0  0 81 18 45  0  0]
 [18  4  4 18  0 18  4 10  0  0 18  4 10  0  0]
 [45 10 10 45  0 45 10 25  0  0 45 10 25  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [63 14 14 63  0 63 14 35  0  0 63 14 35  0  0]
 [45 10 10 45  0 45 10 25  0  0 45 10 25  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [81 18 18 81  0 81 18 45  0  0 81 18 45  0  0]
 [18  4  4 18  0 18  4 10  0  0 18  4 10  0  0]
 [45 10 10 45  0 45 10 25  0  0 45 10 25  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]]
 ----- Computation time = 0.26773300000004774ms


In [17]:
### VECTORIZED ELEMENTWISE MULTIPLICATION ###
tic = time.process_time()
mul = np.multiply(x1,x2)
toc = time.process_time()
print("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

outer = [[81 18 18 81  0 81 18 45  0  0 81 18 45  0  0]
 [18  4  4 18  0 18  4 10  0  0 18  4 10  0  0]
 [45 10 10 45  0 45 10 25  0  0 45 10 25  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [63 14 14 63  0 63 14 35  0  0 63 14 35  0  0]
 [45 10 10 45  0 45 10 25  0  0 45 10 25  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [81 18 18 81  0 81 18 45  0  0 81 18 45  0  0]
 [18  4  4 18  0 18  4 10  0  0 18  4 10  0  0]
 [45 10 10 45  0 45 10 25  0  0 45 10 25  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]]
 ----- Computation time = 0.10617200000018201ms


In [18]:
### VECTORIZED GENERAL DOT PRODUCT ###
tic = time.process_time()
dot = np.dot(W,x1)
toc = time.process_time()
print("gdot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

gdot = [16.5916615  26.4456432  12.60284021]
 ----- Computation time = 1.543985000000081ms


As you may have noticed, the vectorized implementation is much cleaner and more efficient. For bigger vectors/matrices, the differences in running time become even bigger.

Note that np.dot() performs a matrix-matrix or matrix-vector multiplication. This is different from np.multiply() and the * operator (which is equivalent to .* in Matlab/Octave), which performs an element-wise multiplication.

- Implement the L1 and L2 loss functions

In [19]:
def L1(yhat, y):
    loss = sum(abs(y-yhat))
    return loss

yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L1 = " + str(L1(yhat, y)))

L1 = 1.1


In [20]:
def L2(yhat, y):
    loss = sum((y-yhat)**2)
    return loss

yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L2 = " + str(L2(yhat,y)))

L2 = 0.43
