# 深度学习（Deep Learning）培训


## 根本目的：为了找出解决问题的函数。    问题-> **f**-> 解
  
### 找寻该函数的一种方法：神经网络，它是机器学习的其中一种方法 
<br/>

## 目录
### 神经网络（Neural Networks）
### ➡️ 卷积神经网络（Convolutional Neural Networks）⬅️
### 循环神经网络（Recurrent Neural Networks）
### 生成对抗神经网络（Generative Adversarial Networks）
<br/>

## 卷积神经网络（Convolutional Neural Networks）

### 模型评价和验证

#### 训练集和测试集
训练集：用来训练模型  
测试集：用来评价模型的好坏，永远不能用测试集来训练，也要防止测试集变相泄漏到训练集中。（例如：根据测试集的好坏来调参）

In [7]:
#使用scikit-learn切分数据集
import numpy as np
from sklearn.model_selection import train_test_split

X=np.random.random((4,4))
Y=np.random.randint(2,size=(4,1))

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.25)


In [9]:
X_train

array([[ 0.52752549,  0.11075408,  0.04488648,  0.94831371],
       [ 0.945337  ,  0.57129645,  0.76167556,  0.86129855],
       [ 0.62836435,  0.46990505,  0.04987091,  0.28678808]])

In [10]:
X_test

array([[ 0.89174861,  0.70495976,  0.15363172,  0.56980621]])

In [11]:
Y_train

array([[1],
       [1],
       [0]])

In [12]:
Y_test

array([[1]])

#### 评价分类
##### 混淆矩阵（Confusion Matrix）
又称为可能性表格或是错误矩阵。可视化的看看分类效果，用来评价分类（Classification）

|     |被诊断有病      |被诊断无病     |
|---|--------------|--------------|
|有病|True Positive |False Negative|
|没病|False Positive|True Negative |

In [6]:
from sklearn.metrics import accuracy_score
y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]
score=accuracy_score(y_true,y_pred) # 0和3是对的，1和2错了，所以0.5
score

0.5

#### 评价回归
##### 平均绝对误差（Mean Absolute Error）
但是有缺点就是无法微分，不能应用梯度下降的误差函数

In [7]:
from sklearn.metrics import mean_absolute_error
from sklearn.linear_model import LinearRegression

X=np.array([1,2,3,4]).reshape((-1,1)) # 列向量,-1表示自动推理该位置有多少个数
Y=np.array([1,2,3,4]).reshape((-1,1)) # 列向量

regression=LinearRegression()
regression.fit(X,Y)

guesses=regression.predict(X)

error=mean_absolute_error(Y,guesses)
error

0.0

##### 均方差（Mean Squared Error）
可微分，适合做可梯度下降的误差函数

In [8]:
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression

X=np.array([1,2,3,4]).reshape((-1,1)) # 列向量,-1表示自动推理该位置有多少个数
Y=np.array([1,2,3,4]).reshape((-1,1)) # 列向量

regression=LinearRegression()
regression.fit(X,Y)

guesses=regression.predict(X)

error=mean_squared_error(Y,guesses)
error

0.0

##### $R^2$ 决定系数（$R^2$ Score）
又叫拟合优度
<img src="r2_score.jpg" width=450 height=450 />
最简模型的误差是最大的  
好的模型：越接近1，因为模型的误差相对于最简模型越小则该项越接近0

公式：  
If $\hat{y}_i$ is the predicted value of the i-th sample and y_i is the corresponding true value, then the score R² estimated over $n_{\text{samples}}$ is defined as

$R^2(y, \hat{y}) = 1 - \frac{\sum_{i=0}^{n_{\text{samples}} - 1} (y_i - \hat{y}_i)^2}{\sum_{i=0}^{n_\text{samples} - 1} (y_i - \bar{y})^2}$

where $\bar{y} =  \frac{1}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}} - 1} y_i$.

In [9]:
from sklearn.metrics import r2_score
y_true=np.array([1,2,4])
y_pred=np.array([1.3,2.5,3.7])
score=r2_score(y_true,y_pred)
score

0.90785714285714292

##### 模型复杂度图（Model Complexity Graph）
欠拟合（underfitting）：训练集上表现的不好。 Error due to bias  
* 模型小，收到数据影响小，variance比较小，但是可能模拟不到真实情况，bias整个都偏离了正确的   

过拟合（overfitting）：训练集上变现的太好，以至于试图记住训练集。 Error due to variance
* 模型大，更容易存在符合真实情况的权重使模型符合真实情况，bias更准，但是更容易收到数据带来的误差影响，variance更大

<img src="error_from_where.jpg" width=450 height=450 />
红色是每次取样后的拟合的线，蓝色是全部红线平均后的线，黑色是真实的
<img src="error_from_which.jpg" width=450 height=450 />

知道了欠拟合和过拟合的情况，就能运用验证集来挑选模型的拟合程度
<img src="cross_validation.jpg" width=450 height=450 />
<img src="model_complexity_graph.jpg" width=550 height=550 />

##### K-Fold Cross Validation
<img src="k_fold_cross_validation.jpg" width=450 height=450 />

In [10]:
from sklearn.model_selection import KFold
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4])

kf = KFold(n_splits=2,shuffle=False)  #做两次K-Fold
for train_index, test_index in kf.split(X):  #每次取出对应的索引
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    print("TRAIN:", train_index, "TEST:", test_index)
    print("TRAIN DATA:", X_train.tolist(), "TRAIN LABEL:",y_train.tolist(), "TEST:", X_test.tolist(),"TEST LABEL:",y_test.tolist())

TRAIN: [2 3] TEST: [0 1]
TRAIN DATA: [[1, 2], [3, 4]] TRAIN LABEL: [3, 4] TEST: [[1, 2], [3, 4]] TEST LABEL: [1, 2]
TRAIN: [0 1] TEST: [2 3]
TRAIN DATA: [[1, 2], [3, 4]] TRAIN LABEL: [1, 2] TEST: [[1, 2], [3, 4]] TEST LABEL: [3, 4]


### 文本情感分析
[Sentiment Analysis with Numpy](https://github.com/udacity/deep-learning/tree/master/sentiment-network): Andrew Trask leads you through building a sentiment analysis model, predicting if some text is positive or negative.

### Intro to TFLearn

#### 新的激活函数（Activation functions）
Sigmoid作激活函数  
缺点：每经过一层，衰减的厉害
<img src="derivative_sigmoid.jpg" width=450 height=450 />

普通问题：  
Rectified Linear Units简称ReLUs作激活函数，用来替代Sigmoid函数，他的微分为1，不会衰减。  
公式：  
$f(x)=max(x,0)$
<img src="relu.jpg" width=450 height=450 />
缺点：
要控制好learning rate，否则大的梯度会导致ReLUs的神经元的权重变成0，不再对数据有反应，相当于该神经元死亡

面对多分类的问题：  
通常使用Softmax作最后一层，输出层的激活函数
<img src="softmax.jpg" width=450 height=450 />
将普通的输出，转化为输出之和为1的概率
公式：  
$ \sigma(z)_j = \frac {e^{z_j}}{\sum_{k=1}^K e^{z_k}} $ for $ j = 1,...,K $
<img src="softmax_math.jpg" width=450 height=450 />

#### 新的误差函数（Loss Function）
##### 分类交叉熵（Categorical Cross-Entropy）
独热码（one-hot encoding）：用来表示目前是多种状态的哪一个状态  
如：  
标签 $ y = [0,0,0,0,1,0,0,0,0,0] $   
预测值 $ \hat y = [0.047,0.048,0.061,0.07,0.330,0.062,0.001,0.213,0.013,0.150] $

交叉熵（Cross Entropy）是Shannon信息论中一个重要概念，主要用于度量两个概率分布间的差异性信息。
公式及计算方法：
<img src="cross_entropy_calculation.jpg" width=450 height=450 />

分类交叉熵经常和输出层是Softmax配套使用

### 简单的情感分析技巧
##### Bag of Words
"the fox jumps over the lazy dog"   
分解成，键值对，键为词，值为出现个数：{'the': 2, 'jumps': 1, 'lazy': 1, 'over': 1, 'fox': 1, 'dog': 1}   
缺点是损失order of words
##### Word2vec
continuous bag of words (CBOW) and Skip grams  
Skip grams：利用神经网络来训练词的向量表示，方法是：输入神经网络的一个单词预测周围的n个词。   
训练结果具备线性相关的属性
<img src="word2vec_matrix.jpg" width=450 height=450 />
<img src="word2vec_linear.jpg" width=450 height=450 />
##### RNN（Recurrent Neural Network）
适合处理序列，如text和audio

### Intro to TensorFlow

In [13]:
import tensorflow as tf

# Create TensorFlow object called tensor
hello_constant = tf.constant('Hello World!')

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output = sess.run(hello_constant)
    print(output)

b'Hello World!'


#### TensorFlow基础
##### Tensor
TensorFlow里的数据都用Tensor对象表示

* 常量用tf.constant()
* 占位符用tf.placeholder()，占位符用于运行前填入数据
* 变量用tf.Variable()
##### Session
运行时的上下文环境，运行后输出Tensor的结果
##### TensorFlow Math
* tf.add()
* tf.subtract()
* tf.multiply()
* tf.divide()
* tf.cast()   例：tf.cast(tf.constant(2.0), tf.int32)

### 基于TensorFlow构造神经网络做手写数字的分类
#### 第一种神经网络：普通神经网络

##### 数据集
[MNIST数据集](http://yann.lecun.com/exdb/mnist/)：手写识别数字及其标签的数据集
<img src="MNIST_Matrix.jpg" width=450 height=450 />

##### 神经网络构造

输入层：  
输入是28 \* 28的矩阵  
标签是1 \* 9的One hot encoding向量

In [15]:
n_features = 28*28
n_labels = 9
x = tf.placeholder(tf.float32,(1,n_features)) # 横向量 

隐含层：
线性方程：$ y = xW + b $  
这里：x输入，W是权值，b是偏差  
方程用TensorFlow表示：

In [16]:
W = tf.Variable(tf.truncated_normal((n_features, n_labels)))  
b = tf.Variable(tf.zeros(n_labels))  
y = tf.matmul(x,W) + b

输出层：Softmax做激活函数

In [17]:
softmax = tf.nn.softmax(y)

误差函数：交叉熵  $ D(\hat y , y) = - \sum_j y_j ln\hat y_j $
<img src="cross_entropy_calculation.jpg" width=300 height=300 />

In [23]:
labels_one_hot = tf.placeholder(tf.float32,shape=(9,1))
cross_entropy = tf.multiply(-1.0, tf.reduce_sum(tf.multiply(labels_one_hot, tf.log(softmax))))
cross_entropy

<tf.Tensor 'Mul_8:0' shape=() dtype=float32>

##### 训练神经网络的技巧

##### 归一化输入和初始化权值（Normalized Inputs and Initial Weights）
<img src="Mean_Variance_Image.png" width=450 height=450 />
输入值要进行归一化   
权值要用正态分布随机取值

##### 衡量训练效果  
训练集  
验证集  
测试集  
误区：发现测试集效果不好就回头调参数，这样相当于用测试集来训练模型了，永远只在最后用测试集

##### 训练方法：随机梯度下降（Stochastic Gradient Descent）
<img src="stochastic_gradient_descent.jpg" width=450 height=450 />

直接使用全部数据做梯度下降虽然下降的方向很准，但是量太大很难计算走一步太慢，还可能爆内存。   
所以这里随机抽样一个数据出来算梯度下降近似替代。  
这种方法可适用于大模型大数据量，应用范围广泛

##### Mini-batch SGD
每次抽样一部分数据来计算梯度下降做近似

[各种SGD比较](http://www.cnblogs.com/richqian/p/4549590.html)   
batch、mini-batch、SGD、online的区别在于训练数据的选择上  

| |batch|	mini-batch|	Stochastic|	Online|
|--|--|--|--|--|
|训练集|	固定|	固定|	固定|	实时更新|
|单次迭代样本数|	整个训练集|	训练集的子集|	单个样本|	根据具体算法定|
|算法复杂度|	高|	一般|	低|	低|
|时效性|	低|	一般（delta 模型）|	一般（delta 模型）|	高|
|收敛性|	稳定|	较稳定|	不稳定	|不稳定|

In [19]:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
import math

def batches(batch_size, features, labels):
    """
    Create batches of features and labels
    :param batch_size: The batch size
    :param features: List of features
    :param labels: List of labels
    :return: Batches of (Features, Labels)
    """
    assert len(features) == len(labels)
    outout_batches = []
    
    sample_size = len(features)
    for start_i in range(0, sample_size, batch_size):
        end_i = start_i + batch_size
        batch = [features[start_i:end_i], labels[start_i:end_i]]
        outout_batches.append(batch)
        
    return outout_batches

def print_epoch_stats(epoch_i, sess, last_features, last_labels):
    """
    Print cost and validation accuracy of an epoch
    """
    current_cost = sess.run(
        cost,
        feed_dict={features: last_features, labels: last_labels})
    valid_accuracy = sess.run(
        accuracy,
        feed_dict={features: valid_features, labels: valid_labels})
    print('Epoch: {:<4} - Cost: {:<8.3} Valid Accuracy: {:<5.3}'.format(
        epoch_i,
        current_cost,
        valid_accuracy))

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('./mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
valid_features = mnist.validation.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
valid_labels = mnist.validation.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Input Layer: Features and Labels

features = tf.placeholder(tf.float32, shape=[None, n_input]) # (minibatch, n_input)
labels = tf.placeholder(tf.float32, shape=[None, n_classes])  

# Weights & bias
weights = tf.Variable(tf.random_normal(shape=[n_input, n_classes]))  # (n_input, n_classes)
bias = tf.Variable(tf.random_normal(shape=[n_classes]))

# Hidden layer: Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias) 
# (minibatch, n_input)*(n_input, n_classes)=(minibatch, n_classes)
# (minibatch, n_classes) + (minibatch,n_classes) = (minibatch, n_classes)

# Define loss and optimizer
learning_rate = tf.placeholder(tf.float32)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)



# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

init = tf.global_variables_initializer()

batch_size = 128
epochs = 10
learn_rate = 0.001

train_batches = batches(batch_size, train_features, train_labels)

with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch_i in range(epochs):

        # Loop over all batches
        for batch_features, batch_labels in train_batches:
            train_feed_dict = {
                features: batch_features,
                labels: batch_labels,
                learning_rate: learn_rate}
            sess.run(optimizer, feed_dict=train_feed_dict)

        # Print cost and validation accuracy of an epoch
        print_epoch_stats(epoch_i, sess, batch_features, batch_labels)

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting ./mnist/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting ./mnist/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting ./mnist/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting ./mnist/t10k-labels-idx1-ubyte.gz
Epoch: 0    - Cost: 12.5     Valid Accuracy: 0.108
Epoch: 1    - Cost: 11.5     Valid Accuracy: 0.123
Epoch: 2    - Cost: 10.9     Valid Accuracy: 0.139
Epoch: 3    - Cost: 10.4     Valid Accuracy: 0.153
Epoch: 4    - Cost: 9.88     Valid Accuracy: 0.168
Epoch: 5    - Cost: 9.45     Valid Accuracy: 0.183
Epoch: 6    - Cost: 9.05     Valid Accuracy: 0.197
Epoch: 7    - Cost: 8.67     Valid Accuracy: 0.212
Epoch: 8    - Cost: 8.32     Valid Accuracy: 0.225
Epoch: 9    - Cost: 7.99     Valid Accuracy: 0.239
Test Accuracy: 0.24040000140666962


##### 训练代数（Epochs）
每代表示全部数据走了一边

#### 训练技巧
训练时：
* 学习率衰减
* 防止卡在局部最优，可用动量，记录每次梯度的值做平均后在选梯度下降的方向。
* 学习率的选择：小的也可能更快
<img src="learning_rate_tuning.jpg" width=450 height=450>
* 超参数的选择有点黑魔法，凭借经验，最好用的一个方法是：如果觉得效果不好，就降低学习率试试。
* 为了降低调参的数量，可用ADAGRAD（一种SGD的变种），自带动量和学习率衰减的方式

课后练习：[Lab:TensorFlow Neural Network](https://github.com/udacity/deep-learning/tree/master/intro-to-tensorflow) 查看ipynb

#### 第二种神经网络：卷积神经网络

当你知道处理的是图像时，卷积神经网络最擅长处理图像

##### 统计的不变性（Statistical Invariance）
主要想说明，图像中的元素在图像的哪个位置不重要，只要能找出他识别出来即可。
<img src="image_cat.jpg" width=300 height=300 />

要在神经网络里实现这种效果，要用到权值共享（Weight Sharing），猫不管在哪个位置他的权值都是一样的

##### 卷积神经网络（Convolutional Neural Network）简称CovNet
##### 简介
图像的表示：
<img src="image_representation.jpg" width=300 height=300 />

用一个小的Filter去扫描图像，filter中的weights不变，会得到一个新图像，这种操作就是卷积（Convolutions）
<img src="covnet_filter.jpg" width=300 height=300 />
<img src="covnet_convolutions.jpg" width=450 height=450 />
卷积神经网络就是不断的做卷积叠加起来的神经网络，缩小长宽，加深深度，越深表示的信息越多
<img src="covnet.jpg" width=500 height=500 />
最终结果放入普通的神经网络做softmax分类

一些术语：  
patch：filter的扫描的面积
<img src="covnet_patch.jpg" width=450 height=450 />
featured map: 每个深度所表示的图像
<img src="covnet_featured_map.jpg" width=450 height=450 />
stride: filter扫描时的步长，决定卷积后的图像的大小
<img src="covnet_stride.jpg" width=450 height=450 />
<img src="covnet_stride_2.jpg" width=450 height=450 />
valid padding:filter扫描时不填充边缘
<img src="covnet_valid_padding.jpg" width=450 height=450 />
same padding:filter扫描时填充边缘，使卷积后的图像同长宽
<img src="covnet_same_padding.jpg" width=450 height=450 />

##### 直观来讲：  
每层卷积会识别图像的一种特征，更高的卷积会识别出组合后更高级的特征。  
人也是，从最基础的特征识别，比如狗的鼻子、嘴巴，再到狗脸，最后是整个狗。
<img src="covnet_intuition.jpg" width=450 height=450 />

##### CNN的核心Filter
<img src="covnet_convolution_detail.jpg" width=450 height=450 />
Filter用Patch这么大的面积去扫整个图像，Patch里每个像素的权值是一样的（Weights Sharing），每次扫到的Patch里的像素归到一个神经元管理，
<img src="covnet_filter_scan.jpg" width=450 height=450 />
<img src="covnet_filter_scan_2.jpg" width=450 height=450 />
<img src="covnet_filter_group.jpg" width=450 height=450 />
每一次扫描相当于把框框内的东西组合成一个神经元
<img src="covnet_filter_detail.jpg" width=300 height=300 />
Filter的k表示抽取的特性的个数

注意：如果不用卷积Filter的方式取归组像素，那相当于每个像素都要连接一个神经元，网络非常大根本无法学习

练习题  
卷积神经网络中：  
Setup  
H = height, W = width, D = depth  
* We have an input of shape 32x32x3 (HxWxD)  
* 20 filters of shape 8x8x3 (HxWxD)  
* A stride of 2 for both the height and width (S)  
* Zero padding of size 1 (P)  

Output Layer  
* 14x14x20 (HxWxD)

Q1 How many parameters does the convolutional layer have (without parameter sharing)?  
Q2 How many parameters does the convolution layer have (with parameter sharing)?  

##### 卷积神经网络的结构
<img src="covnet_full.jpg" width=450 height=450 />


##### 可视化卷积神经网络
[Visualizing and Understanding Convolutional Networks](http://www.matthewzeiler.com/pubs/arxive2013/eccv2014.pdf)

<img src="covnet_visualize_layer1.jpg" width=450 height=450 />
<img src="covnet_visualize_layer2.jpg" width=450 height=450 />
<img src="covnet_visualize_layer3.jpg" width=450 height=450 />
<img src="covnet_visualize_layer5.jpg" width=450 height=450 />

##### TensorFlow实现CNN


In [2]:
import tensorflow as tf
# Output depth
k_output = 64

# Image Properties
image_width = 10
image_height = 10
color_channels = 3

# Convolution filter
filter_size_width = 5
filter_size_height = 5

# Input/Image
input = tf.placeholder(
    tf.float32,
    shape=[None, image_height, image_width, color_channels])

# Weight and bias
weight = tf.Variable(tf.truncated_normal(
    [filter_size_height, filter_size_width, color_channels, k_output]))
bias = tf.Variable(tf.zeros(k_output))

# Apply Convolution
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
# Add bias
conv_layer = tf.nn.bias_add(conv_layer, bias)
# Apply activation function
conv_layer = tf.nn.relu(conv_layer)

##### 高级卷积神经网络网络
**Pooling**
<img src="covnet_pooling_stride_2.jpg" width=400 height=400 />
直接用stride步长为2的丢失很多信息，相应的我们使用stride步长为1的再用pooling抽取能保留更多信息
<img src="covnet_pooling_max.jpg" width=400 height=400 />
最常用的pooling方式：
* Max Pooling：取方格中最大的那个
* Average Pooling：取每个方格的平均值

Max Pooling数学表达方式：
<img src="covnet_pooling_max_math.jpg" width=400 height=400 />

用途：
* 更好的降低输出的大小
* 防止过拟合.  Preventing overfitting is a consequence of reducing the output size, which in turn, reduces the number of parameters in future layers

好处：  
* 模型更准确  
缺点：  
* 中间有个stride步长为1的过程，模型会增大
* 新增了pooling的size和stride的参数要调參

**一个带Pooling层的典型卷积神经网络**
<img src="covnet_pooling_classic_network.jpg" width=450 height=450 />

最近的研究表面Pooling层用的越来越少
* 最近的数据集越来越大，我们更容易欠拟合。Recent datasets are so big and complex we're more concerned about underfitting.
* Dropout is a much b*etter regularizer.
* Pooling results in a loss of information. Think about the max pooling operation as an example. We only keep the largest of n numbers, thereby disregarding n-1 numbers completely.

In [4]:
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
conv_layer = tf.nn.bias_add(conv_layer, bias)
conv_layer = tf.nn.relu(conv_layer)
# Apply Max Pooling
conv_layer = tf.nn.max_pool(
    conv_layer,
    ksize=[1, 2, 2, 1],
    strides=[1, 2, 2, 1],
    padding='SAME')
# The ksize and strides parameters are structured as 4-element lists, 
# with each element corresponding to a dimension of the input tensor ([batch, height, width, channels]). 
# For both ksize and strides, the batch and channel dimensions are typically set to 1.

### 基于TensorFlow构造神经网络做手写数字的分类
#### 第二种神经网络：卷积神经网络

In [6]:
# Dataset You've seen this section of code from previous lessons. Here we're importing the MNIST dataset and using a convenient TensorFlow function to batch, scale, and One-Hot encode the data.

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)

import tensorflow as tf

# Parameters
learning_rate = 0.00001
epochs = 10
batch_size = 128

# Number of samples to calculate validation and accuracy
# Decrease this if you're running out of memory to calculate accuracy
test_valid_size = 256

# Network Parameters
n_classes = 10  # MNIST total classes (0-9 digits)
dropout = 0.75  # Dropout, probability to keep units

# Store layers weight & bias
weights = {
    'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
    'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
    'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
    'out': tf.Variable(tf.random_normal([1024, n_classes]))}

biases = {
    'bc1': tf.Variable(tf.random_normal([32])),
    'bc2': tf.Variable(tf.random_normal([64])),
    'bd1': tf.Variable(tf.random_normal([1024])),
    'out': tf.Variable(tf.random_normal([n_classes]))}

def conv2d(x, W, b, strides=1):
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)

def maxpool2d(x, k=2):
    return tf.nn.max_pool(
        x,
        ksize=[1, k, k, 1],
        strides=[1, k, k, 1],
        padding='SAME')

def conv_net(x, weights, biases, dropout):
    # Layer 1 - 28*28*1 to 14*14*32
    conv1 = conv2d(x, weights['wc1'], biases['bc1'])
    conv1 = maxpool2d(conv1, k=2)

    # Layer 2 - 14*14*32 to 7*7*64
    conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
    conv2 = maxpool2d(conv2, k=2)

    # Fully connected layer - 7*7*64 to 1024
    fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
    fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
    fc1 = tf.nn.relu(fc1)
    fc1 = tf.nn.dropout(fc1, dropout)

    # Output Layer - class prediction - 1024 to 10
    out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
    return out

# tf Graph input
x = tf.placeholder(tf.float32, [None, 28, 28, 1])
y = tf.placeholder(tf.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.float32)

# Model
logits = conv_net(x, weights, biases, keep_prob)

# Define loss and optimizer
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

# Accuracy
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf. global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    for epoch in range(epochs):
        for batch in range(mnist.train.num_examples//batch_size):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            sess.run(optimizer, feed_dict={
                x: batch_x,
                y: batch_y,
                keep_prob: dropout})

            # Calculate batch loss and accuracy
            loss = sess.run(cost, feed_dict={
                x: batch_x,
                y: batch_y,
                keep_prob: 1.})
            valid_acc = sess.run(accuracy, feed_dict={
                x: mnist.validation.images[:test_valid_size],
                y: mnist.validation.labels[:test_valid_size],
                keep_prob: 1.})

            print('Epoch {:>2}, Batch {:>3} -'
                  'Loss: {:>10.4f} Validation Accuracy: {:.6f}'.format(
                epoch + 1,
                batch + 1,
                loss,
                valid_acc))

    # Calculate Test Accuracy
    test_acc = sess.run(accuracy, feed_dict={
        x: mnist.test.images[:test_valid_size],
        y: mnist.test.labels[:test_valid_size],
        keep_prob: 1.})
    print('Testing Accuracy: {}'.format(test_acc))

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting ./train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting ./train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting ./t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting ./t10k-labels-idx1-ubyte.gz
Epoch  1, Batch   1 -Loss: 42945.2773 Validation Accuracy: 0.136719
Epoch  1, Batch   2 -Loss: 38002.0547 Validation Accuracy: 0.125000
Epoch  1, Batch   3 -Loss: 29029.1250 Validation Accuracy: 0.140625
Epoch  1, Batch   4 -Loss: 29378.3789 Validation Accuracy: 0.148438
Epoch  1, Batch   5 -Loss: 29748.5742 Validation Accuracy: 0.156250
Epoch  1, Batch   6 -Loss: 27914.1035 Validation Accuracy: 0.152344
Epoch  1, Batch   7 -Loss: 22891.3438 Validation Accuracy: 0.175781
Epoch  1, Batch   8 -Loss: 20605.0508 Validation Accuracy: 0.187500
Epoch  1, Batch   9 -Loss: 21675.6543 Val

KeyboardInterrupt: 