# Week3

目标:
- Master the process of hyperparameter tuning

## 超参调节

**超参重要性排序(经验性)**

1. 学习率
2. 动量$\beta$, batch-size, hidden-size
3. 隐藏层层数, learning decay rate

**调参方法**

1. 早期方法: 网格搜索. 缺点是不同的参数的重要性可能不同, 搜索效率不够高
2. 推荐: 随机采样, 适合不确定什么超参更重要的情形
3. 缩小搜索区域, 大范围的搜索后确定较优的超参范围, 进行进一步的随机采样

某些超参的随机采样是可以直接用uniform分布的, 例如隐藏层数量等, 但是有些超参, 例如学习率从0.001到0.1的随机采样不能直接使用几何分布,而应当使用基于对数的分布,如上例中是$10^{-3}$到$10^{-1}$, 则可以对-3到-1的范围进行几何分布的随机采样.

**调参tips**
1. 不同领域,甚至不同数据集的最佳超参都是不同的, 要时常重新评估最佳超参
2. 没有足够计算资源时, 可以采用babysit的方法调参, 照看模型, 观察性能曲线, 耐性微调学习率等
3. 有大量模型并行训练时, 可以随机采样超参训练, 选择结果最优的参数

## Batch Normalization

batch Normalization是能够简化调参的策略, 通过Norm降低模型对超参的敏感程度和对权重初始化的依赖.其Intuition来自对输入数据的Normalization能够加速梯度下降, 因此也可以对隐藏层的输入进行标准化.

**操作方法**

对于隐藏层l层的输入$(z^{[l](1)}, z^{[l](2)}, \dots, z^{[l](m)})$, 按照
$$\mu = \frac{1}{m} \sum z^{(i)} \\
\sigma^2 = \frac{1}{m} \sum (z^{(i)}-\mu)^2 \\
z_{norm}^{(i)} = \frac{z^{(i)} - \mu}{\sqrt{\sigma + \epsilon}} \\
\tilde{z}^{(i)} = \gamma z_{norm}^{(i)} + \beta$$

计算得到$\tilde{z}$代替z进行之后的计算. 其中$\gamma$和$\beta$都是可以学习的参数, 上述计算的第三步将输入变为均值为0, 方差为1的分布, 再通过第四步的线性变换转为理想分布,

> 在mini-batch训练中加入batch-normalization时, $z^{[l]} = w^{[l]}a^{[l-1]}+b^{[l]}$, 之后batch-normalization对$z^[l]$标准化又减去了$b^{[l]}$, 所以事实上使用BN时, $b^{[l]}$不起作用.

**Batch Normalization有效的原因**

1. 将各维缩放到0-1之间, 加速下降
2. 给深层网络产生权重
3. 具有一定的正则化效果

第二点的解释:

数据集变化时, 协变量的变化对网络的准确性产生影响; 通过normalization减少隐藏层输入分布的不稳定性;降低前后网络的耦合

**测试中正确处理BN**

测试时, 测试集没有训练集的规模, 当输入单个例子而不是一个batch时, 如何解决$\mu, \sigma$的取值问题:

训练过程对于输入的mini-batch, 会产生$\mu^{\{1\}[l]}, \mu^{\{2\}[l]},\dots, \mu^{\{k\}[l]}$, 和对应的$\sigma$, 可以通过**指数加权平均**的递推式更新总的$\mu, \sigma$来估计整体数据集的两个参数, 并在测试时,使用这两个参数.


# TensorFlow基础操作

tensorflow的核心操作是使用`tf.constant`,`tf.Variable`, `tf.placeholder`建立计算图, 调用`session.run`运行图得到结果, 训练数据通过`placeholder入口输入网络, 输入的方式是`run`方法的`feed_dict`参数.

In [None]:
import tensorflow as tf
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

def linearFunc():
    # tensorflow的所有操作都要通过run得到计算结果
    X = tf.constant(np.random.randn(5,3), name="x")
    W = tf.constant(np.random.randn(4,5), name="w")
    b = tf.constant(np.random.randn(4,1), name="b")
    res = tf.add(tf.matmul(W, X), b)
    with tf.Session() as sess:
    result = sess.run(res) # run返回的结果是ndarray
    return result

def sigmoid(x):
    # 设置placeholder作为数据的入口
    a = tf.placeholder(tf.float32, name='a')
    with tf.Session() as sess:
        result = sess.run(tf.sigmoid(a), feed_dict={a: x})
    return result

def cost(logit, label):
    # 根据logit值和label计算cost函数
    l = tf.placeholder(tf.float32, name='logit')
    y = tf.placeholder(tf.float32, name='y')
    c = tf.nn.sigmoid_cross_entropy_with_logits(logits=l, labels=y) # feed_dict的key是placeholder的变量名
    with tf.Session() as sess:
        res = sess.run(c, feed_dict={l: logit, y:label})
    return res

def test_cost():
    # 计算测试
    x = np.random.randn(5,3)
    sig = sigmoid(x)
    labels = np.random.randint(0, 2, sig.shape)
    print(np.mean(cost(sig, labels)))
    

def convert_to_one_hot(labels, C):
    # 将label转化为one_hot格式
    with tf.Session() as sess:
        res = sess.run(tf.one_hot(indices=labels, depth=C, axis=0)) # 该函数的label是数值或数组, 可以不是placeholder
    return res

test_cost()

# 使用TensorFlow搭建一个神经网络的多分类器

1. 创建数据入口--`placeholder`
2. 创建所需参数
3. 使用参数和数据入口构建计算图
4. 为计算图增加cost计算部分和评价部分
5. 创建优化器对象
7. 准备数据流, mini-batches
8. 跑cost和优化器的图进行训练
9. 运行评价部分查看训练效果

In [None]:
# 使用sklearn的digits数据集做一个多分类器

X, Y = load_digits(6, True)
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)
X_train, X_test = X_train.T, X_test.T
print(X_train.shape, y_train.shape)

In [None]:
# 对数据集可视化
index = 12
x_show = X_train[:, index].reshape(8, 8)
import matplotlib.pyplot as plt
plt.rcParams['image.cmap'] = 'gray'
plt.imshow(x_show)
print(y_train[index])

In [None]:
# X进行标准化
X_train = X_train/16.
X_test = X_test/16.
# y转换为one_hot格式
y_train = convert_to_one_hot(y_train, 6)
y_test = convert_to_one_hot(y_test, 6)
print(y_train.shape)

In [None]:
def create_placeholder(n_x, n_y):
    X = tf.placeholder(tf.float32, [n_x, None], name='X')
    Y = tf.placeholder(tf.int32, [n_y, None], name='Y')
    return X, Y

def init_parameters(n_x):
    W1 = tf.get_variable('W1', [25, n_x], initializer=tf.contrib.layers.xavier_initializer())
    b1 = tf.get_variable('b1', [25, 1], initializer=tf.zeros_initializer())
    W2 = tf.get_variable('W2', [12, 25], initializer=tf.contrib.layers.xavier_initializer())
    b2 = tf.get_variable('b2', [12, 1], initializer=tf.zeros_initializer())
    W3 = tf.get_variable('W3', [6, 12], initializer=tf.contrib.layers.xavier_initializer())
    b3 = tf.get_variable('b3', [6,1], initializer=tf.zeros_initializer())
    
    Params = {'W1':W1,
                     'W2':W2,
                     'W3':W3,
                     'b1':b1,
                     'b2':b2,
                     'b3':b3}
    return Params

def forward_propagation(X, Params):
    W1 = Params['W1'] # (25, n_x)
    W2 = Params['W2'] # (12, 25)
    W3 = Params['W3'] # (6, 12)
    b1 = Params['b1'] # (25, 1)
    b2 = Params['b2'] # (12, 1)
    b3 = Params['b3'] # (6, 1)
    
    Z1 = tf.add(tf.matmul(W1, X), b1) # (25, size)
    A1 = tf.nn.relu(Z1)
    Z2 = tf.add(tf.matmul(W2, A1), b2) # (12, size)
    A2 = tf.nn.relu(Z2)
    Z3 = tf.add(tf.matmul(W3, A2), b3) # (6, size)
    
    return Z3

def compute_cost(logits, labels):
    
    logits = tf.transpose(logits) # (size, 6) 
    labels = tf.transpose(labels) # (size, 6)
    
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits =logits)) 
    # softmax_cross_entropy...这个函数做了两件事, 计算softmax, 计算交叉熵
    return cost

def seperate_batch(x, y, batch_size):
    batch_num = x.shape[1] // batch_size
    batches = []
    for i in range(batch_num):
        batches.append([x[:, (i*batch_size):(i + 1)*batch_size], y[:, i*batch_size:(i+1)*batch_size]])
    if x.shape[1] % batch_size == 0:
        pass
    else:
        batches.append([x[:, batch_num*batch_size:], y[:, batch_num*batch_size:]])
    return batches

def model(x_train, y_train, x_test, y_test, 
                    learning_rate=0.0001, epoch_num=1000, batch_size=32, print_cost=True):
    '''
    '''
    assert x_train.shape[0] == 64
    assert y_train.shape[0] == 6
    n_x, m = x_train.shape
    n_y = y_train.shape[0]
    tf.reset_default_graph()
    
    X, Y = create_placeholder(n_x, n_y)
    params = init_parameters(n_x)
    Z3 = forward_propagation(X, params) # (6, batch_size)
    cost = compute_cost(Z3, Y)
    
    # define BackPropagation
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
    
    init = tf.initialize_all_variables()
    
    with tf.Session() as sess:
        sess.run(init)
        costs = []
        for epoch in range(epoch_num):
            epoch_cost = 0
            mini_batches = seperate_batch(x_train, y_train, batch_size)
            batch_num = len(mini_batches)
            
            for batch in mini_batches:
                (x, y) = batch
                assert x.shape[0] == 64
                assert y.shape[0] == 6
                mini_batch_cost, _ = sess.run([cost, optimizer], feed_dict={X:x, Y:y})
                epoch_cost = epoch_cost + mini_batch_cost
                
            if print_cost == True and epoch % 100 == 0:
                print('Cost after epoch %i : %f' %(epoch, epoch_cost))
            if print_cost == True and epoch % 5 == 0:      
                costs.append(epoch_cost)
        
        plt.plot(np.squeeze(costs))
        plt.show()
        
        parameters = sess.run(params)
        print("Paramters have been trained.")
        
        correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
        
        print("Train Accuracy:", accuracy.eval({X:x, Y:y})) # 评价时可以使用该方法, 方法在默认Session中执行该tensor
        print("Test Accuracy:", accuracy.eval({X:x_test, Y:y_test}))
        
        return parameters

In [None]:
param = model(X_train, y_train, X_test, y_test)

<font color='blue'>
    
**What you should remember**:
    
- Tensorflow is a programming framework used in deep learning
- The two main object classes in tensorflow are Tensors and Operators. 
- When you code in tensorflow you have to take the following steps:
    - Create a graph containing Tensors (Variables, Placeholders ...) and Operations (tf.matmul, tf.add, ...)
    - Create a session
    - Initialize the session
    - Run the session to execute the graph
- You can execute the graph multiple times as you've seen in model()
- The backpropagation and optimization is automatically done when running the session on the "optimizer" object.