使用math.exp实现sigmoid函数

In [1]:
import math

def basic_sigmoid(x):
    """
    Compute sigmoid of x.
    
    Arguments:
    x -- A scalar
    
    Return:
    s -- sigmoid(x)
    
    """
    
    s = 1/(1 + math.exp(-x))
    
    return s

In [2]:
basic_sigmoid(3)

0.9525741268224334

实际上，在深度学习中很少使用‘math’模块，因为math的输入为实数，而在DeepLearning中绝大多数会使用矩阵或者是向量，这也是numpy为什么更实用一些。

In [3]:
###像这种情况下使用math模块就会出错###
x = [1, 2, 3]
basic_sigmoid(x)

TypeError: bad operand type for unary -: 'list'

In fact, if $ x = (x_1, x_2, ..., x_n)$ is a row vector then $np.exp(x)$ will apply the exponential function to every element of x. The output will thus be: $np.exp(x) = (e^{x_1}, e^{x_2}, ..., e^{x_n})$

In [4]:
import numpy as np

x = np.array([1, 2, 3])
print(np.exp(x))   # result is (exp(1), exp(2), exp(3))

[ 2.71828183  7.3890561  20.08553692]


如果x是一个向量，执行 s = x + 3或者 s = 1/x 将会输出一个跟x尺寸相同的向量

In [7]:
x = np.array([1, 2, 3])
print(x + 3)

[4 5 6]


Any time you need more info on a numpy function, we encourage you to look at [the official documentation](https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.exp.html). 

You can also create a new cell in the notebook and write `np.exp?` (for example) to get quick access to the documentation.

**Exercise**: Implement the sigmoid function using numpy. 

**Instructions**: x could now be either a real number, a vector, or a matrix. The data structures we use in numpy to represent these shapes (vectors, matrices...) are called numpy arrays. You don't need to know more for now.
$$ \text{For } x \in \mathbb{R}^n \text{,     } sigmoid(x) = sigmoid\begin{pmatrix}
    x_1  \\
    x_2  \\
    ...  \\
    x_n  \\
\end{pmatrix} = \begin{pmatrix}
    \frac{1}{1+e^{-x_1}}  \\
    \frac{1}{1+e^{-x_2}}  \\
    ...  \\
    \frac{1}{1+e^{-x_n}}  \\
\end{pmatrix}\tag{1} $$

In [11]:
import numpy as np

def sigmoid(x):
    
    """
    Compute the sigmoid of x
    
    Arguments:
    x -- A scalar or numpy array of any size
    
    Return:
    s -- sigmoid(x)
    """
    s = 1/(1+np.exp(-x))
    return s
    
    

In [12]:
x = np.array([1, 2, 3])
sigmoid(x)

array([0.73105858, 0.88079708, 0.95257413])

计算sigmoid gradient
$$sigmoid\_derivative(x) = \sigma'(x) = \sigma(x) (1 - \sigma(x))\tag{2}$$

两步：
1 设置s为关于x的sigmoid函数
2 计算导数 $\sigma'(x) = s(1-s)$

In [14]:
def sigmoid_derivative(x):
    """
    计算sigmoid 函数关于输入x的梯度
    可以将sigmoid函数的输出存储到变量中，并用他计算梯度
    
    Arguments：
    x -- A saclar or numpy array
    
    Return:
    ds -- Your computed gradient.
    
    """
    
    s = sigmoid(x)
    ds = s * (1 - s)
    
    return ds
    

In [15]:
x = np.array([1, 2, 3])
print("sigmoig_derivative(x) = " + str(sigmoid_derivative(x)))

sigmoig_derivative(x) = [0.19661193 0.10499359 0.04517666]


### 1.3 - Reshaping arrays ###

Two common numpy functions used in deep learning are [np.shape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html) and [np.reshape()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html). 
- X.shape is used to get the shape (dimension) of a matrix/vector X. 
- X.reshape(...) is used to reshape X into some other dimension. 


练习将图像转化为向量，输入image （length，height，3）输出向量 （length*height*3， 1）。
举个栗子  ：
 reshape an array v of shape (a, b, c) into a vector of shape (a*b,c) you would do:
``` python
v = v.reshape((v.shape[0]*v.shape[1], v.shape[2])) # v.shape[0] = a ; v.shape[1] = b ; v.shape[2] = c
```
- 注意不要使用固定的数值来表示图片的尺寸， 使用 image.shape[0] 等 ，来表示图像的尺寸



In [18]:
def image2vector(image):
    """
    Arguemnt:
    iamge -- a numpy array of shape (length, height, depth)
    
    Returns:
    v -- a vector of shape (length*height*depth, 1)
    """
    
    v = image.reshape(image.shape[0] * image.shape[1] * image.shape[2], 1)
    
    return v

In [28]:

image = np.array([[[ 0.67826139, 0.9380381],
                 [0.90714982, 0.52835647],
                 [0.4215251 , 0.45017551]],
                
                [[0.928141219, 0.96677647],
                [0.85304703, 0.52351845],
                [0.19981397, 0.274173113]],
                
                [[0.60659855, 0.00533165],
                [0.10820313, 0.49978937],
                [0.34144279, 0.94630077]]])

print ("image2vector(iamge) = " + str(image2vector(image)))


image2vector(iamge) = [[0.67826139]
 [0.9380381 ]
 [0.90714982]
 [0.52835647]
 [0.4215251 ]
 [0.45017551]
 [0.92814122]
 [0.96677647]
 [0.85304703]
 [0.52351845]
 [0.19981397]
 [0.27417311]
 [0.60659855]
 [0.00533165]
 [0.10820313]
 [0.49978937]
 [0.34144279]
 [0.94630077]]


normalizing rows  行归一化
在进行机器学习和深度学习时需要对数据进行归一化。归一化数据后的梯度下降收敛速度更快
网络的表现更好。
将 X 的每个行向量除以这一行的范数。


For example, if $$x = 
\begin{bmatrix}
    0 & 3 & 4 \\
    2 & 6 & 4 \\
\end{bmatrix}\tag{3}$$ then $$\| x\| = np.linalg.norm(x, axis = 1, keepdims = True) = \begin{bmatrix}
    5 \\
    \sqrt{56} \\
\end{bmatrix}\tag{4} $$and        $$ x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix}
    0 & \frac{3}{5} & \frac{4}{5} \\
    \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\
\end{bmatrix}\tag{5}$$ Note that you can divide matrices of different sizes and it works fine: this is called broadcasting and you're going to learn about it in part 5.


**Exercise**: Implement normalizeRows() to normalize the rows of a matrix. After applying this function to an input matrix x, each row of x should be a vector of unit length (meaning length 1).

In [6]:
def normalizeRows(x):
    """
    执行一个函数实现归一化矩阵 x 的每一行
    
    参数：
    x -- 一个numpy矩阵，大小为（n，m）
    
    返回值：
    x -- 行归一化后的numpy矩阵
    """
    
    x_norm = np.linalg.norm(x, axis = 1, keepdims = True)
    print("x_norm : " + str(x_norm))
    x = x / x_norm
    
    return x

In [7]:
import numpy as np
x = np.array([
        [0, 3, 4],
        [1, 6, 4]
    ])

print("x.shape" + str(x.shape))
s = normalizeRows(x)
print("s.shape" + str(s.shape))
print("normalizeRows() = " + str(s))



x.shape(2, 3)
x_norm : [[5.        ]
 [7.28010989]]
s.shape(2, 3)
normalizeRows() = [[0.         0.6        0.8       ]
 [0.13736056 0.82416338 0.54944226]]


广播和 softmax函数


A very important concept to understand in numpy is "broadcasting". It is very useful for performing mathematical operations between arrays of different shapes. For the full details on broadcasting, you can read the official [broadcasting documentation](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).


使用numpy实现softmax 函数。当需要进行二分类或者更多类别时，可以认为 softmax 函数就是一个正则化函数

**Instructions**:
- $ \text{for } x \in \mathbb{R}^{1\times n} \text{,     } softmax(x) = softmax(\begin{bmatrix}
    x_1  &&
    x_2 &&
    ...  &&
    x_n  
\end{bmatrix}) = \begin{bmatrix}
     \frac{e^{x_1}}{\sum_{j}e^{x_j}}  &&
    \frac{e^{x_2}}{\sum_{j}e^{x_j}}  &&
    ...  &&
    \frac{e^{x_n}}{\sum_{j}e^{x_j}} 
\end{bmatrix} $ 

- $\text{for a matrix } x \in \mathbb{R}^{m \times n} \text{,  $x_{ij}$ maps to the element in the $i^{th}$ row and $j^{th}$ column of $x$, thus we have: }$  $$softmax(x) = softmax\begin{bmatrix}
    x_{11} & x_{12} & x_{13} & \dots  & x_{1n} \\
    x_{21} & x_{22} & x_{23} & \dots  & x_{2n} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    x_{m1} & x_{m2} & x_{m3} & \dots  & x_{mn}
\end{bmatrix} = \begin{bmatrix}
    \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots  & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\
    \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots  & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots  & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}}
\end{bmatrix} = \begin{pmatrix}
    softmax\text{(first row of x)}  \\
    softmax\text{(second row of x)} \\
    ...  \\
    softmax\text{(last row of x)} \\
\end{pmatrix} $$

In [9]:
def softmax(x):
    """
    计算输入x 的softmax
    程序要能计算一个行向量或者是一个矩阵（n，m）
    
    参数：
    x -- 一个numpy类型的矩阵（n，m）
    
    返回值：
    s -- 一个numpy类型的矩阵等于关于x 的softmax函数。（n，m）
    
    """
    x_exp = np.exp(x)
    
    x_sum = np.sum(x_exp, axis = 1, keepdims= True)
    
    s = x_exp / x_sum
    
    return s
    
    

In [10]:
x = np.array([
        [9, 2, 5, 0, 0],
        [7, 5, 0, 0, 0]
    ])

print("softmax(x) = " + str(softmax(x)))

softmax(x) = [[9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-04
  1.21052389e-04]
 [8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-04
  8.01252314e-04]]


loss 用来评估模型的表现，loss越大，预测值与真实值的差距越大。在深度学习中使用梯度下降来训练模型，并且是cost最小，loss定义为：
$$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}| \end{align*}\tag{6}$$

In [11]:
def L1(yhat, y):
    """
    参数：
    yhat -- m维向量（预测值）
    y -- m维向量（实际值）
    
    返回值：
    loss -- 前面定义的 L1 损失函数的值
    """

    loss = np.sum(np.abs(y - yhat))
    
    return loss

In [13]:
yhat = np.array([0.9, 0.2, 0.1, 0.4, 0.9])
y = np.array([1, 0, 0, 1, 1])
print("L1 = " + str(L1(yhat, y)))

L1 = 1.1


In [17]:
a= np.array([1,2,3])
b= np.array([2,3,4])
a2 = np.dot(a,b)

print(a2)

20


In [18]:
def L2(yhat, y):
    
    
    loss = np.dot((y-yhat),(y-yhat).T)
    
    return loss



In [19]:
yhat = np.array([0.9, 0.2, 0.1, 0.4, 0.9])
y = np.array([1, 0, 0, 1, 1])

print("L2 : " + str (L2(yhat,y)))

L2 : 0.43



<font color='red'>
**What to remember:**
- Vectorization is very important in deep learning. It provides computational efficiency and clarity.
- You have reviewed the L1 and L2 loss.
- You are familiar with many numpy functions such as np.sum, np.dot, np.multiply, np.maximum, etc...