# Python與資料科學基礎

即時有用過python的人，這個lab也可以幫助你熟悉你在往後兩天所需要具備的python知識

本次lab會教你:

* 使用Jupyter notebook
* 使用numpy的運算來計算矩陣與向量的數學
* 了解何謂Vectorization
* 了解何謂broadcasting

## 有關Jupyter notebook ##

讓我們開始吧!

In [5]:
hello = 'Deep Learning is Fun! \n 深度學習好棒'

In [6]:
print ("我們都知道: " + hello)

我們都知道: Deep Learning is Fun! 
 深度學習好棒


**預期的輸出**:
我們都知道: Deep Learning is Fun! 
 深度學習好棒

## 1) Vectorization - 向量化

在早上的課程中，我們由logistic regression的演算法中了解到，計算$\rho{(W^{T}X+b)}$，loss function $J = \frac{-1}{m}\sum{L(y, \hat{y})})$和weights更新都會需要使用for迴圈來計算。然而，在資料量變大時，如此的設計將會造成運算過大而變成整個系統中的一大瓶頸。因此，python裡，你將會練習如何將for迴圈透過向量化(vectorization)的方法來加速。

第一個cell是傳統的for loop作法，第二個cell是使用numpy的vectorization作法。


In [2]:
import time
import numpy as np
x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

### CLASSIC DOT PRODUCT OF VECTORS IMPLEMENTATION ###
tic = time.time()
dot = 0
for i in range(len(x1)):
    dot+= x1[i]*x2[i]
toc = time.time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### CLASSIC OUTER PRODUCT IMPLEMENTATION ###
tic = time.time()
outer = np.zeros((len(x1),len(x2))) # we create a len(x1)*len(x2) matrix with only zeros
for i in range(len(x1)):
    for j in range(len(x2)):
        outer[i,j] = x1[i]*x2[j]
toc = time.time()
print ("----- Computation time = " + str(1000*(toc - tic)) + "ms")

### CLASSIC ELEMENTWISE IMPLEMENTATION ###
tic = time.time()
mul = np.zeros(len(x1))
for i in range(len(x1)):
    mul[i] = x1[i]*x2[i]
toc = time.time()
print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ###
W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array
tic = time.time()
gdot = np.zeros(W.shape[0])
for i in range(W.shape[0]):
    for j in range(len(x1)):
        gdot[i] += W[i,j]*x1[j]
toc = time.time()
print ("gdot = " + str(gdot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

dot = 278
 ----- Computation time = 0.290155410767ms
----- Computation time = 0.502824783325ms
elementwise multiplication = [ 81.   4.  10.   0.   0.  63.  10.   0.   0.   0.  81.   4.  25.   0.   0.]
 ----- Computation time = 0.157117843628ms
gdot = [ 23.97917293  28.9122999   23.98478889]
 ----- Computation time = 0.220060348511ms


In [12]:
x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

### VECTORIZED DOT PRODUCT OF VECTORS ###
tic = time.time()
dot = None  # please type np.dot(x1, x2)
toc = time.time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### VECTORIZED OUTER PRODUCT ###
tic = time.time()
outer = None # please type np.outer(x1,x2)
toc = time.time()
print ("----- Computation time = " + str(1000*(toc - tic)) + "ms")

### VECTORIZED ELEMENTWISE MULTIPLICATION ###
tic = time.time()
mul = None # please type np.multiply(x1,x2)
toc = time.time()
print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### VECTORIZED GENERAL DOT PRODUCT ###
tic = time.time()
dot = None # please type np.dot(W,x1)
toc = time.time()
print ("gdot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

dot = 278
 ----- Computation time = 0.0879764556885ms
----- Computation time = 0.147104263306ms
elementwise multiplication = [81  4 10  0  0 63 10  0  0  0 81  4 25  0  0]
 ----- Computation time = 0.158786773682ms
gdot = [ 29.45186099  14.5491566   19.4357208 ]
 ----- Computation time = 0.128030776978ms


### 1.1 實作 L1與L2 loss functions

**示範**: 透過np.abs()和np.sum()來運算L1 loss


- L1 loss is defined as:
$$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}| \end{align*}$$

In [3]:
def L1(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    
    Returns:
    loss -- the value of the L1 loss function defined above
    """
    
    ### 由此開始 ### (≈ 1 line of code)
    loss = np.sum(np.abs(yhat-y))
    ### 由此結束 ###
    
    return loss

In [22]:
yhat = np.array([.8, 0.12, 0.1, .3, 1.1])
y = np.array([1, 0, 0, 1, 1])
print(None) #please type print("L1 = " + str(L1(yhat,y)))

None


**預期產生**:

<table style="width:20%">

     <tr> 
       <td> **L1** </td> 
       <td> 1.22 </td> 
     </tr>
</table>


**學生練習**: 請按照上面所教實作由numpy vectorized過的L2 loss function。

提示: 

使用np.dot() , 

example : $x = [x_1, x_2, ..., x_n]$, then `np.dot(x,x)` = $\sum_{j=0}^n x_j^{2}$. 

- L2 loss is defined as $$\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^m(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}$$

In [4]:
# GRADED FUNCTION: L2

def L2(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    
    Returns:
    loss -- the value of the L2 loss function defined above
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    loss = None     # type loss = np.dot(yhat-y, yhat-y)
    ### END CODE HERE ###
    
    return loss

In [5]:
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L2 = " + str(L2(yhat,y)))

L2 = 0.43


**Expected 產生**: 
<table style="width:20%">
     <tr> 
       <td> **L2** </td> 
       <td> 0.43 </td> 
     </tr>
</table>

## 2 -更加熟悉Numpy ##

Numpy 是Python library中最重要的科學計算package之一。這是個由開放社群維護的優良package (www.numpy.org)。
在這個練習中會學到以下:

* np.exp - 計算指數
* np.log - 計算對數
* np.reshape - 改變矩陣或向量維度



### 練習2.1 - 使用np.exp()設計sigmoid 函數 ###

Python的內建math模組其實已經有一個指數函數- math.exp。然而，這個函數並沒有針對矩陣作優化。請按照上課所示範的使用math.exp(x)設計sigmoid(x)

**提示**:
$sigmoid(x) = \frac{1}{1+e^{-x}}$ 又稱作 logistic function。為深度學習中的MLP(Multi-layer-perceptron)和DNN(Deep Neural Network)中最常使用的activation function之一。

<img src="figures/sigmoid.png" style="width:500px;height:228px;">

In [23]:
#老師示範
import math

def basic_sigmoid(x):
    """
    Compute sigmoid of x.

    Arguments:
    x -- A scalar

    Return:
    s -- sigmoid(x)
    """
    s = 1 / (1 + math.exp(-x))    
    return s

In [24]:
basic_sigmoid(3)

0.9525741268224334

**預期答案**: 
<table style = "width:40%">
    <tr>
    <td>** basic_sigmoid(3) **</td> 
        <td>0.9525741268224334 </td> 
    </tr>

</table>

### 但math這個函式庫只能針對real number進行運算。因此我們很少使用這個內建，而改用numpy。

In fact, if $ x = (x_1, x_2, ..., x_n)$ is a row vector then $np.exp(x)$ will apply the exponential function to every element of x. The output will thus be: $np.exp(x) = (e^{x_1}, e^{x_2}, ..., e^{x_n})$

In [6]:
import numpy as np

# example of np.exp
x = np.array([1, 2, 3])
print(np.exp(x)) # result is (exp(1), exp(2), exp(3))

[  2.71828183   7.3890561   20.08553692]


若x是一個向量，則一個基本與純量的運算如 $s = x + 5$ 或 $s = \frac{-1}{x}$ 將會輸出與x維度一樣大的向量。


In [25]:
# example of vector operation
x = np.array([1, 2, 3])
print (x * 2)

[2 4 6]


Numpy的官方文件相當完整: [the official documentation](https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.exp.html). 

在notebook裡直接查詢:
打 `np.exp?` 即可查詢。

**學生練習**: 改用Numpy實作Sigmoid

**提示**: 使用numpy的資料格式，x 可以是實數、向量、或矩陣。
$$ \text{For } x \in \mathbb{R}^n \text{,     } sigmoid(x) = sigmoid\begin{pmatrix}
    x_1  \\
    x_2  \\
    ...  \\
    x_n  \\
\end{pmatrix} = \begin{pmatrix}
    \frac{1}{1+e^{-x_1}}  \\
    \frac{1}{1+e^{-x_2}}  \\
    ...  \\
    \frac{1}{1+e^{-x_n}}  \\
\end{pmatrix}\tag{1} $$

In [8]:
import numpy as np 

def sigmoid(x):
    """
    Compute the sigmoid of x

    Arguments:
    x -- A scalar or numpy array of any size

    Return:
    s -- sigmoid(x)
    """    
    ### START CODE HERE ### (≈ 1 line of code)
    s = None
    ### END CODE HERE ###
    return s

In [29]:
x = np.array([1, 2, 3])
print(sigmoid(x))
#使用basic_sigmoid得針對個別元素做sigmoid
print(str([basic_sigmoid(item) for item in x]))

NameError: name 'sigmoid' is not defined

**兩邊的結果應該要是一樣**: 
<table>
    <tr> 
        <td> **sigmoid([1,2,3])**</td> 
        <td> array([ 0.73105858,  0.88079708,  0.95257413]) </td> 
    </tr>
</table> 


### 1.2 - Sigmoid gradient 計算Sigmoid的梯度


**學生練習**:  $$sigmoid\_derivative(x) = \sigma'(x) = \sigma(x) (1 - \sigma(x))\tag{2}$$
You often code this function in two steps:
1. Set s to be the sigmoid of x. You might find your sigmoid(x) function useful.
2. Compute $\sigma'(x) = s(1-s)$

In [30]:
def sigmoid_derivative(x):
    """
    Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x.
    You can store the output of the sigmoid function into variables and then use it to calculate the gradient.
    
    Arguments:
    x -- A scalar or numpy array

    Return:
    ds -- Your computed gradient.
    """
    
    ### START CODE HERE ### (≈ 2 lines of code)
    s = None # type s = sigmoid(x)
    ds = None  # type ds = s(1-s)
    ### END CODE HERE ###
    
    return ds

In [22]:
x = np.array([1, 2, 3])
print ("sigmoid_derivative(x) = " + str(sigmoid_derivative(x)))

sigmoid_derivative(x) = [ 0.19661193  0.10499359  0.04517666]


**預期答案**: 


<table>
    <tr> 
        <td> **sigmoid_derivative([1,2,3])**</td> 
        <td> [ 0.19661193  0.10499359  0.04517666] </td> 
    </tr>
</table> 



### 2.3 - Reshaping 陣列(array) ###

在深度網路的實作裡，最常用到的兩大numpy函數就是: [np.shape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html) and [np.reshape()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html). 
- X.shape is used to get the shape (dimension) of a matrix/vector X. 
- X.reshape(...) is used to reshape X into some other dimension. 

舉例，在電腦視覺中，一張彩色的圖是透過一個3D陣列來表示 $(length, height, depth = 3)$。然而，在使用上我們偏好將其轉成一維向量$(length*height*3, 1)$。也就是我們將3D陣列攤平成1D向量。

<img src="figures/image2vector_kiank.png" style="width:500px;height:300;">

**示範**: 以下這個 `image2vector()` 會將一個大小為 (length, height, 3)的陣列轉成 (length\*height\*3, 1)大小的向量。舉例:
``` python
v = v.reshape((v.shape[0]*v.shape[1], v.shape[2])) # v.shape[0] = a ; v.shape[1] = b ; v.shape[2] = c
```
- 請使用image.shape去得到需要的維度大小: e.g. `image.shape[0]`

In [31]:
def image2vector(image):
    """
    Argument:
    image -- a numpy array of shape (length, height, depth)
    
    Returns:
    v -- a vector of shape (length*height*depth, 1)
    """
    v = image.reshape(image.shape[0] *image.shape[1] *image.shape[2], 1)
    
    return v

In [32]:
# This is a 3 by 3 by 2 array, typically images will be (num_px_x, num_px_y,3) where 3 represents the RGB values
image = np.array([[[ 0.67826139,  0.29380381],
        [ 0.90714982,  0.52835647],
        [ 0.4215251 ,  0.45017551]],

       [[ 0.92814219,  0.96677647],
        [ 0.85304703,  0.52351845],
        [ 0.19981397,  0.27417313]],

       [[ 0.60659855,  0.00533165],
        [ 0.10820313,  0.49978937],
        [ 0.34144279,  0.94630077]]])

print ("image2vector(image) = " + str(image2vector(image)))

image2vector(image) = [[ 0.67826139]
 [ 0.29380381]
 [ 0.90714982]
 [ 0.52835647]
 [ 0.4215251 ]
 [ 0.45017551]
 [ 0.92814219]
 [ 0.96677647]
 [ 0.85304703]
 [ 0.52351845]
 [ 0.19981397]
 [ 0.27417313]
 [ 0.60659855]
 [ 0.00533165]
 [ 0.10820313]
 [ 0.49978937]
 [ 0.34144279]
 [ 0.94630077]]


**預期產生**: 


<table style="width:100%">
     <tr> 
       <td> **image2vector(image)** </td> 
       <td> [[ 0.67826139]
 [ 0.29380381]
 [ 0.90714982]
 [ 0.52835647]
 [ 0.4215251 ]
 [ 0.45017551]
 [ 0.92814219]
 [ 0.96677647]
 [ 0.85304703]
 [ 0.52351845]
 [ 0.19981397]
 [ 0.27417313]
 [ 0.60659855]
 [ 0.00533165]
 [ 0.10820313]
 [ 0.49978937]
 [ 0.34144279]
 [ 0.94630077]]</td> 
     </tr>
    
   
</table>

### 2.4 - Normalizing rows 正規化資料

正規化資料在機器學習與深度學習裡面是相當重要的一個技巧。會讓最佳化過程變得比較快速。
正規化的方法有很多這邊我們使用最常見的: $ \frac{x}{\| x\|} $ (dividing each row vector of x by its norm).

<img src="figures/normalization.jpeg" style="width:500px;height:300;">

For example, if $$x = 
\begin{bmatrix}
    0 & 3 & 4 \\
    2 & 6 & 4 \\
\end{bmatrix}\tag{3}$$ then $$\| x\| = np.linalg.norm(x, axis = 1, keepdims = True) = \begin{bmatrix}
    5 \\
    \sqrt{56} \\
\end{bmatrix}\tag{4} $$and        $$ x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix}
    0 & \frac{3}{5} & \frac{4}{5} \\
    \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\
\end{bmatrix}\tag{5}$$ Note that you can divide matrices of different sizes and it works fine: this is called broadcasting and you're going to learn about it in part 5.


**練習**: 實作normalizeRows(x)，檢查每一列向量的長度是否為1。

In [35]:
def normalizeRows(x):
    """
    Implement a function that normalizes each row of the matrix x (to have unit length).
    
    Argument:
    x -- A numpy matrix of shape (n, m)
    
    Returns:
    x -- The normalized (by row) numpy matrix. You are allowed to modify x.
    """
    
    ### START CODE HERE ### (≈ 2 lines of code)
    # Compute x_norm as the norm 2 of x. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True)
    x_norm = None 
    # Divide x by its norm.
    x = x / x_norm
    ### END CODE HERE ###

    return x

In [36]:
x = np.array([
    [0, 3, 4],
    [1, 6, 4]])
print("normalizeRows(x) = " + str(normalizeRows(x)))

normalizeRows(x) = [[ 0.          0.6         0.8       ]
 [ 0.13736056  0.82416338  0.54944226]]


**預期答案**: 

<table style="width:60%">

     <tr> 
       <td> **normalizeRows(x)** </td> 
       <td> [[ 0.          0.6         0.8       ]
 [ 0.13736056  0.82416338  0.54944226]]</td> 
     </tr>
    
   
</table>

**註記**:
X每一列的向量長度為一個實數，然而在上述檢查裡面我們將整個X都除以這個數字。你們可能會發現兩者的shape不同。為何在這裡可以這樣使用呢? 

### 1.5 - Broadcasting  ####
在Numpy中一個很重要的技巧是broadcasting。這個技巧讓我們不用太費心在統一所有的陣列維度。相關說明請參閱官網：
[broadcasting documentation](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).

**練習**: Implement a softmax function using numpy. You can think of softmax as a normalizing function used when your algorithm needs to classify two or more classes. You will learn more about softmax in the second course of this specialization.

**Instructions**:
- $ \text{for } x \in \mathbb{R}^{1\times n} \text{,     } softmax(x) = softmax(\begin{bmatrix}
    x_1  &&
    x_2 &&
    ...  &&
    x_n  
\end{bmatrix}) = \begin{bmatrix}
     \frac{e^{x_1}}{\sum_{j}e^{x_j}}  &&
    \frac{e^{x_2}}{\sum_{j}e^{x_j}}  &&
    ...  &&
    \frac{e^{x_n}}{\sum_{j}e^{x_j}} 
\end{bmatrix} $ 

- $\text{for a matrix } x \in \mathbb{R}^{m \times n} \text{,  $x_{ij}$ maps to the element in the $i^{th}$ row and $j^{th}$ column of $x$, thus we have: }$  $$softmax(x) = softmax\begin{bmatrix}
    x_{11} & x_{12} & x_{13} & \dots  & x_{1n} \\
    x_{21} & x_{22} & x_{23} & \dots  & x_{2n} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    x_{m1} & x_{m2} & x_{m3} & \dots  & x_{mn}
\end{bmatrix} = \begin{bmatrix}
    \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots  & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\
    \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots  & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots  & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}}
\end{bmatrix} = \begin{pmatrix}
    softmax\text{(first row of x)}  \\
    softmax\text{(second row of x)} \\
    ...  \\
    softmax\text{(last row of x)} \\
\end{pmatrix} $$

In [35]:
def softmax(x):
    """Calculates the softmax for each row of the input x.

    Your code should work for a row vector and also for matrices of shape (n, m).

    Argument:
    x -- A numpy matrix of shape (n,m)

    Returns:
    s -- A numpy matrix equal to the softmax of x, of shape (n,m)
    """
    
    ### START CODE HERE ### (≈ 3 lines of code)
    # Apply exp() element-wise to x. Use np.exp(...).
    x_exp = None  #type np.exp(x)

    # Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True).
    x_sum = None # type np.sum(x_exp, axis = 1, keepdims = True).
    
    # Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting.
    s = None # type x_exp / x_sum

    ### END CODE HERE ###
    
    return s

In [34]:
x = np.array([
    [9, 2, 5, 0, 0],
    [7, 5, 0, 0 ,0]])
print("softmax(x) = " + str(softmax(x)))

softmax(x) = [[  9.80897665e-01   8.94462891e-04   1.79657674e-02   1.21052389e-04
    1.21052389e-04]
 [  8.78679856e-01   1.18916387e-01   8.01252314e-04   8.01252314e-04
    8.01252314e-04]]


**Expected Output**:

<table style="width:60%">

     <tr> 
       <td> **softmax(x)** </td> 
       <td> [[  9.80897665e-01   8.94462891e-04   1.79657674e-02   1.21052389e-04
    1.21052389e-04]
 [  8.78679856e-01   1.18916387e-01   8.01252314e-04   8.01252314e-04
    8.01252314e-04]]</td> 
     </tr>
</table>
