# Defining neural networks with Keras

前幾章有提到利用 tensorflow 來建立 linear regression models 和 neural networks 並且示範了 1. high level (既有的 layer 函數) 2. low level (linear algebra) 兩種方法。Tensorflow 的方法都要一層一層的建，而且還要 specify 前一層的名稱，比較麻煩。<br/>
這裡會介紹 Keras 這個 high level 的 API，可以更方便的建立模型並且避免複雜的迴圈過程。不過 high level 就代表著 flexibility 比較低，比較不能根據不同的狀況來隨意改變模型。

---

這一章主要是要將下圖中 (28 * 28) 的四個手勢建模，使得模型可以將新的照片分類。(4 個 output nodes，可用 softmax 這個 function 當作 output layer 的 activation function)

![](Image/Image19.jpg)

## Sequential API

Sequential API 是 Keras 裡面的一個建立 neural networks 的樣板。這個樣板確立了模型擁有 input layer, hidden layers 和 output layer。稱為 sequential 的原因是因為每一個 layer 都是按照順序建立的，因此不用像是用 tensorflow 建模一樣去 specify 前一層是哪一個。

---

### 實際操作

In [1]:
# import keras
from tensorflow import keras

# define a sequential model
model = keras.Sequential()

In [2]:
# add the first layer
model.add(keras.layers.Dense(16, activation = "relu", input_shape = (28 * 28,)))    

input shape 是一個存有資料 dimension 的 tuple，由於我們每一張圖都是 28 * 28 的解析度，而且要 reshape 成 vector (1-dimensional tensor) 才能訓練模型，因此會被 reshape 成 28 * 28 個 element 的 vector。

第一層必須 specify 輸入的 shape，才能讓電腦知道該如何建模。

In [3]:
# add the second layer
model.add(keras.layers.Dense(8, activation = "relu"))

# define the output layer
model.add(keras.layers.Dense(4, activation = "softmax"))

除了第一層以外，其他層不需要指出 input shape，因為 sequential 樣板的關係 (順序)，後面幾層都知道前一層的 shape。

**建立 optimizer 和 指定 loss function**

In [None]:
model.compile('adam', loss = "categorical_crossentropy")

**檢查模型各層的內容**
可以用 model.summary() 來看

In [5]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 16)                12560     
_________________________________________________________________
dense_1 (Dense)              (None, 8)                 136       
_________________________________________________________________
dense_2 (Dense)              (None, 4)                 36        
Total params: 12,732
Trainable params: 12,732
Non-trainable params: 0
_________________________________________________________________


12560 = 28 * 28 * 16 + 16 (input layer 有 28 * 28 個 nodes，第一層有 16 個 nodes，因此總共有 28 * 28 * 16 個 weights。同時，第一層的每個 nodes 都有一個 bias，總共 16 個 bias)

136 = 16 * 8 + 8

36 = 8 * 4 + 4

**此模型已經可以帶入資料訓練了。**

## Functional API

假設我的輸出是來自多個模型，如下圖：

![](Image/Image20.jpg)

則不能用 sequential API 而是用 functional API。

---

### 示範
假設除了 28 * 28 的照片以外，還有 10 個額外的 features 也要加入 training data，因此就有兩大組 input data

In [6]:
# import
import tensorflow as tf

# first input
model1_inputs = tf.keras.Input(shape = (28*28,))

# second input
model2_inputs = tf.keras.Input(shape = (10,))

In [8]:
# define the first layer for the model 1
model1_layer1 = tf.keras.layers.Dense(12, activation = "relu")(model1_inputs)

# define the second layer for the model 1
model1_layer2 = tf.keras.layers.Dense(4, activation = "softmax")(model1_layer1)

In [9]:
# define the first layer for the model 2
model2_layer1 = tf.keras.layers.Dense(8, activation = "relu")(model2_inputs)

# define the second layer for the model 2
model2_layer2 = tf.keras.layers.Dense(4, activation = "softmax")(model2_layer1)

In [10]:
# Combine the two outputs
merged = tf.keras.layers.add([model1_layer2, model2_layer2])

最後，定義一個合併的 functional model，並指定好 inputs 和 outputs

In [11]:
model = tf.keras.Model(inputs = [model1_inputs, model2_inputs], outputs = merged)

# compile the model
model.compile("adam", loss = "categorical_crossentropy")

In [12]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 784)]        0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, 10)]         0                                            
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 12)           9420        input_1[0][0]                    
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 8)            88          input_2[0][0]                    
______________________________________________________________________________________________

**此模型已經可以帶入資料訓練了。**

# Training and validation (evaluation) with Keras

過程： <br/>
1. Load and clean data (前幾章已教過)
2. Define model (前幾章已教過)
3. Train and validate model
4. Evaluate model

## Training model
利用 model.fit(features, labels) 這個函數來訓練

必要的參數： features, labels ，代表 training data 的 independent variables 和 dependent variables

非必要的參數： batch_size, epochs, validation_split

1. Batch_size： 代表了一個 batch 包含幾筆資料。當 batch_size 很大時，要將很多資料同時放入記憶體中，在某些記憶體較小的電腦可能會跑不動。而每個 batch 跑完就會更新 weights 和 bias 等參數，因此當 batch_size 太小時，資料不足使得參數的改變不一定合理。(default = 32)

2. epochs： 代表總共要跑幾次 epochs (epoch 代表完整的跑完一次所有的 batches)。當 epochs 很大時，代表經過較多的訓練 iterations，可能比較接近 global minima，但也代表訓練時間較久。

3. validation_split： 這裡要傳入一個屆於 1~0 的數，代表分給 validation set 的比例。這樣模型訓練時就會將原始資料按照比例分成 training set 和 validation set。在每一個 epoch 後都可以看到模型在 training set 和 validation set 的個別表現 (如果 training set 的 loss 比 validation set 的 loss 還小，則有 overfitting 的現象)

![](Image/Image21.jpg)

當訓練過程中看到 overfitting 的跡象，就要停止訓練，並在模型中加入一些 regularisation 的方法，例如 dropout。

In [None]:
# 訓練
model.fit(FEATURES, LABELS)

# Evaluate (利用一開始就分出來的 testing data)
model.evaluate(TESTING_SET)

## 實際操作

### Load data

In [57]:
# import
import tensorflow as tf
import pandas as pd
import numpy as np

# load data
data = pd.read_csv("Datasets/slmnist.csv", header = None)
print(data.head())

print("=========================================================")

# transform dataframe into numpy array
features_array = np.array(data.drop(labels = 0, axis = 1), dtype = np.float32)
print(features[0:10, :])
print(features.shape)
print(type(features))

# one-hot encoding
labels = pd.get_dummies(data.iloc[:,0], prefix='Class')

# transform labels into numpy array
labels_array = np.array(labels, dtype = np.float32)
print(labels_array[0:10, :])
print(labels_array.shape)
print(type(labels_array))

   0    1    2    3    4    5    6    7    8    9    ...  775  776  777  778  \
0    1  142  143  146  148  149  149  149  150  151  ...    0   15   55   63   
1    0  141  142  144  145  147  149  150  151  152  ...  173  179  179  180   
2    1  156  157  160  162  164  166  169  171  171  ...  181  197  195  193   
3    3   63   26   65   86   97  106  117  123  128  ...  175  179  180  182   
4    1  156  160  164  168  172  175  178  180  182  ...  108  107  106  110   

   779  780  781  782  783  784  
0   37   61   77   65   38   23  
1  181  181  182  182  183  183  
2  193  191  192  198  193  182  
3  183  183  184  185  185  185  
4  111  108  108  102   84   70  

[5 rows x 785 columns]
[[142 143 146 ...  65  38  23]
 [141 142 144 ... 182 183 183]
 [156 157 160 ... 198 193 182]
 ...
 [161 164 166 ... 240 240 240]
 [162 164 167 ... 166 176 170]
 [145 148 150 ... 173 168 159]]
(2000, 784)
<class 'numpy.ndarray'>
[[0. 1. 0. 0.]
 [1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 0. 1.]
 [0

### Define model

In [67]:
model = tf.keras.Sequential()

# first layer
model.add(tf.keras.layers.Dense(16, activation='relu', input_shape = (784,)))

# second layer
model.add(tf.keras.layers.Dense(8, activation='relu'))

# output layer
model.add(tf.keras.layers.Dense(4, activation='softmax'))

# define optimizer and loss function
model.compile('adam', loss = 'categorical_crossentropy', metrics = ["accuracy"])

# summarise the model
model.summary()

Model: "sequential_13"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_40 (Dense)             (None, 16)                12560     
_________________________________________________________________
dense_41 (Dense)             (None, 8)                 136       
_________________________________________________________________
dense_42 (Dense)             (None, 4)                 36        
Total params: 12,732
Trainable params: 12,732
Non-trainable params: 0
_________________________________________________________________


### Train and validate model

In [68]:
model.fit(features_array, labels_array, batch_size = 32, epochs = 10, validation_split = 0.1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x24080dad550>

## 嘗試另一個模型 (更低的 learning rate)

In [69]:
model2 = tf.keras.Sequential()

# first layer
model2.add(tf.keras.layers.Dense(1024, activation='relu', input_shape = (784,)))

# second layer
model2.add(tf.keras.layers.Dense(8, activation='relu'))

# output layer
model2.add(tf.keras.layers.Dense(4, activation='softmax'))

# define optimizer and loss function
model2.compile(optimizer = keras.optimizers.Adam(learning_rate = 0.001), loss = 'categorical_crossentropy', metrics = ["accuracy"])

# summarise the model
model2.summary()

Model: "sequential_14"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_43 (Dense)             (None, 1024)              803840    
_________________________________________________________________
dense_44 (Dense)             (None, 8)                 8200      
_________________________________________________________________
dense_45 (Dense)             (None, 4)                 36        
Total params: 812,076
Trainable params: 812,076
Non-trainable params: 0
_________________________________________________________________


In [71]:
model2.fit(features_array, labels_array, batch_size = 32, epochs = 30, validation_split = 0.5)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x240813a4fa0>

### Evaluation

In [73]:
model1_acc = model.evaluate(features_array, labels_array)
model2_acc = model2.evaluate(features_array, labels_array)

print("Model 1: Accuracy - {}".format(model1_acc[1]))
print("Model 2: Accuracy - {}".format(model2_acc[1]))

Model 1: Accuracy - 0.25
Model 2: Accuracy - 0.25


# Training models with the Estimators API
![](Image/Image22.jpg)

The Estimator API 是一個 high level tensorflow submodule (less flexible)。由於是最頂層的 API，因此不能隨便調整架構，但使用 (deploy) 的時間最短，可快速運用，也可以寫少一點 code。

## Training process

1. Define feature columns (specify shape and type of data)
2. Load and transform data within a function (定義一個函數，此函數要輸出一個 dictionary 物件，包含 features 和 labels)
3. Define custom estimators with different architectures (也可以用 premade estimators)
4. Apply train operation

### 1. Define feature columns

**以前一章節的資料為例**

In [82]:
# import
import tensorflow as tf

# define a numeric feature column
size = tf.feature_column.numeric_column("size")

# define a categorical feature column
rooms = tf.feature_column.categorical_column_with_vocabulary_list("room", ["1","2","3","4","5"])

# merge the two features
features_list = [size, rooms]

**以圖像辨識的資料為例**

In [78]:
# import 
import tensorflow as tf

# define a matrix feature column
image = tf.feature_column.numeric_column("image", shape = (784,))

# transform into list
features_list = [image]

### 2. Load and transform data within a function

**以前一章節的資料為例**

In [80]:
# define a function to load and transform data (只以三筆資料為例)
def input_function():
    # define feature dictionary
    features = {"size": [1340, 1690, 2720], "room": [1, 3, 4]}
    # define labels
    labels = [221900, 538000, 180000]
    return features, labels

### 3. Define custom estimators with different architectures

**以前一章節的資料為例** (Regression model)

In [83]:
# define a deep neural network regression
model0 = tf.estimator.DNNRegressor(feature_columns = features_list, hidden_units = [10, 6, 6, 3])

# train the regression model
model0.train(input_function, steps = 20)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\TANGKU~1\\AppData\\Local\\Temp\\tmpo16ggdvi', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
Instructions for updating:
Use Variable.rea

ValueError: Items of feature_columns must be a <class 'tensorflow.python.feature_column.feature_column_v2.DenseColumn'>. You can wrap a categorical column with an embedding_column or indicator_column. Given: VocabularyListCategoricalColumn(key='room', vocabulary_list=('1', '2', '3', '4', '5'), dtype=tf.string, default_value=-1, num_oov_buckets=0)

**以前一章節的資料為例** (Classification model)

In [85]:
# define a deep neural network classifier
model1 = tf.estimator.DNNClassifier(feature_columns = features_list, hidden_units = [32, 16, 8], n_classes = 4)

# train the regression model
model1.train(input_function, steps = 20)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\TANGKU~1\\AppData\\Local\\Temp\\tmpkb192z12', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Calling model_fn.


ValueError: Items of feature_columns must be a <class 'tensorflow.python.feature_column.feature_column_v2.DenseColumn'>. You can wrap a categorical column with an embedding_column or indicator_column. Given: VocabularyListCategoricalColumn(key='room', vocabulary_list=('1', '2', '3', '4', '5'), dtype=tf.string, default_value=-1, num_oov_buckets=0)

### 詳細教學
https://www.tensorflow.org/guide/estimators