In [1]:
from keras.utils import np_utils
import numpy as np
np.random.seed(10)

Using TensorFlow backend.


In [2]:
from keras.datasets import mnist
(x_train_image, y_train_label), (x_test_image, y_test_label) = mnist.load_data()

In [3]:
x_Train = x_train_image.reshape(60000, 784).astype('float32')
x_Test = x_test_image.reshape(10000, 784).astype('float32')

In [4]:
x_Train_normalize = x_Train / 255
x_Test_normalize = x_Test / 255

In [13]:
y_Train_OneHot = np_utils.to_categorical(y_train_label)
y_Test_OneHot = np_utils.to_categorical(y_test_label)

<hr>
以上為預處理, 詳細說明可看 1.Mnist_Preprocess<br>
接下來為建立模型
<hr>

In [6]:
from keras.models import Sequential
from keras.layers import Dense

<hr>
建立一個線性堆疊模型, 後續只要使用model.add()方法, 將各神經網路層加入模型即可
<hr>

In [7]:
model = Sequential()

<hr>
輸入層(input layer) => 隱藏層(Hidden layer) => 輸出層(Output layer)<br>
units = 256: 定義"隱藏層"神經元個數256<br>
input_dim = 784: 設定"輸入層"神經元個數784<br>
kernel_initializer = 'normal': 使用常態分布的亂數, 初始化權重(weight)與偏差(bias)<br>
activation = 'relu': 定義啟動函數為relu
<hr>

In [8]:
model.add(Dense(units = 256,
                input_dim = 784,
                kernel_initializer = 'normal',
                activation = 'relu'))

<hr>
units = 10: 定義"輸出層"神經元個數10
<hr>

In [9]:
model.add(Dense(units = 10,
                kernel_initializer = 'normal',
                activation = 'softmax'))

<hr>
隱藏層: 共256個神經元, 因為輸入層和隱藏層一起建立, 所以沒有顯示輸入層<br>
輸出層: 共10個神經元
<hr>

In [10]:
print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 256)               200960    
_________________________________________________________________
dense_2 (Dense)              (None, 10)                2570      
Total params: 203,530
Trainable params: 203,530
Non-trainable params: 0
_________________________________________________________________
None


<hr>
模型參數說明:<br>
每一層Param是超參數(Hyper-Parameters), 我們需要透過反向傳播演算法, 更新神經元連結的權重與偏差<br>
建立輸入層與隱藏層的公式: h1 = relu(X * W1 + b1)<br>
建立輸出層的公式: y = softmax(h1 * W2 + b2)<br>
所以每一層Param計算方式為: Param = (上一層神經元數量) * (本層的神經元數量) + (本層的神經元數量)<br>
因此<br>
200960 = 784 * 256 + 256<br>
2570   = 256 * 10  + 10<br>
全部必須訓練的超參數Trainable params是每一層的Param加總<br>
203530 = 200960 + 2570<br>
通常Trainable params數值越大, 代表此模型越複雜, 需要更多時間進行訓練
<hr>

<hr>
接下來使用compile方法對訓練模型進行設定<br>
loss: 設定損失函數(loss function), 在深度學習中通常使用交叉熵(cross entropy), 訓練效果比較好<br>
optimizer: 設定訓練時的最佳化方法, 在深度學習中使用adam最佳化方法, 可以讓訓練更快收斂, 並提高準確率<br>
metric: 設定評估模型的方法是準確率(accurancy)
<hr>

In [12]:
model.compile(loss = 'categorical_crossentropy',
              optimizer = 'adam', 
              metrics = ['accuracy'])

In [15]:
train_history = model.fit(x = x_Train_normalize, y = y_Train_OneHot, validation_split = 0.2, epochs = 10, batch_size = 200, verbose = 2)

Train on 48000 samples, validate on 12000 samples
Epoch 1/10
 - 1s - loss: 0.0262 - acc: 0.9938 - val_loss: 0.0812 - val_acc: 0.9758
Epoch 2/10
 - 1s - loss: 0.0221 - acc: 0.9951 - val_loss: 0.0827 - val_acc: 0.9760
Epoch 3/10
 - 1s - loss: 0.0184 - acc: 0.9958 - val_loss: 0.0784 - val_acc: 0.9768
Epoch 4/10
 - 1s - loss: 0.0156 - acc: 0.9970 - val_loss: 0.0779 - val_acc: 0.9778
Epoch 5/10
 - 1s - loss: 0.0128 - acc: 0.9978 - val_loss: 0.0791 - val_acc: 0.9778
Epoch 6/10
 - 1s - loss: 0.0103 - acc: 0.9985 - val_loss: 0.0767 - val_acc: 0.9791
Epoch 7/10
 - 1s - loss: 0.0089 - acc: 0.9988 - val_loss: 0.0801 - val_acc: 0.9773
Epoch 8/10
 - 1s - loss: 0.0076 - acc: 0.9991 - val_loss: 0.0782 - val_acc: 0.9778
Epoch 9/10
 - 1s - loss: 0.0069 - acc: 0.9990 - val_loss: 0.0820 - val_acc: 0.9782
Epoch 10/10
 - 1s - loss: 0.0054 - acc: 0.9994 - val_loss: 0.0841 - val_acc: 0.9776
