# 适合现实世界的最佳实践
---
* 超参数优化
* 模型集成
* 混合精度训练
* 在多块`GPU`或单块`TPU`上训练的`Keras`模型

## 13.1 将模型性能发挥到极致

### 13.1.1 超参数优化

#### 1 使用`KerasTuner`

##### [C] 13.1 `KerasTuner`模型构建函数

In [32]:
from tensorflow import keras
from tensorflow.keras import layers

import kerastuner as kt

In [37]:
def build_model(hp):
    
    units = hp.Int(name='units', min_value=16, max_value=64, step=16)

    model = keras.Sequential([
        layers.Dense(units, activation='relu'),
        layers.Dense(10   , activation='softmax')
    ])

    optimizer = hp.Choice(name='optimizer', values=['rmsprop', 'adam'])
    
    model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

    return model

##### [C] 13.2 `KerasTuner`的`HyperModel`

In [38]:
class SimpleMLP(kt.HyperModel):
    def __init__(self, num_classes):
        self.num_classes = num_classes

    def build(self, hp):
        units = hp.Int(name="units", min_value=16, max_value=64, step=16)
        model = keras.Sequential([
            layers.Dense(units, activation="relu"),
            layers.Dense(self.num_classes, activation="softmax")
        ])
        
        optimizer = hp.Choice(name="optimizer", values=["rmsprop", "adam"])
        
        model.compile(
            optimizer=optimizer,
            loss="sparse_categorical_crossentropy",
            metrics=["accuracy"])
        
        return model

hypermodel = SimpleMLP(num_classes=10)

In [39]:
tuner = kt.BayesianOptimization(
    build_model,
    objective="val_accuracy",
    max_trials=100,
    executions_per_trial=2,
    directory="mnist_kt_test",
    overwrite=True,
)

In [40]:
tuner.search_space_summary()

Search space summary
Default search space size: 2
units (Int)
{'default': None, 'conditions': [], 'min_value': 16, 'max_value': 64, 'step': 16, 'sampling': 'linear'}
optimizer (Choice)
{'default': 'rmsprop', 'conditions': [], 'values': ['rmsprop', 'adam'], 'ordered': False}


##### 目标最大化和最小化

In [None]:
objective = kt.Objective(
    name='val_accuracy',  # 指标名称，会出现在每轮记录中
    direction='max'       # 指标优化方向：min or max
)

tuner = kt.BayesianOptimization(build_model, objective=object, ...)


In [43]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

x_train = x_train.reshape((-1, 28 * 28)).astype('float32') / 255
x_test  = x_test.reshape((-1, 28 * 28)).astype('float32') / 255

x_train_full = x_train[:]
y_train_full = y_train[:]

num_val_samples = 10000

x_train, x_val = x_train[:-num_val_samples], x_train[-num_val_samples:]
y_train, y_val = y_train[:-num_val_samples], y_train[-num_val_samples:]

callbacks = [keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)]

tuner.search(x_train, y_train, batch_size=128, 
             epochs=100, 
             validation_data=(x_val,y_val), 
             callbacks=callbacks, 
             verbose=2
            )

Trial 39 Complete [00h 00m 42s]
val_accuracy: 0.9762499928474426

Best val_accuracy So Far: 0.976500004529953
Total elapsed time: 00h 29m 38s

Search: Running Trial #40

Value             |Best Value So Far |Hyperparameter
64                |64                |units
rmsprop           |rmsprop           |optimizer

Epoch 1/100
391/391 - 2s - loss: 0.4182 - accuracy: 0.8879 - val_loss: 0.2487 - val_accuracy: 0.9293 - 2s/epoch - 4ms/step
Epoch 2/100
391/391 - 1s - loss: 0.2221 - accuracy: 0.9372 - val_loss: 0.1796 - val_accuracy: 0.9510 - 1s/epoch - 3ms/step
Epoch 3/100
391/391 - 1s - loss: 0.1731 - accuracy: 0.9504 - val_loss: 0.1603 - val_accuracy: 0.9561 - 866ms/epoch - 2ms/step
Epoch 4/100
391/391 - 1s - loss: 0.1407 - accuracy: 0.9593 - val_loss: 0.1407 - val_accuracy: 0.9603 - 1s/epoch - 3ms/step
Epoch 5/100
391/391 - 1s - loss: 0.1182 - accuracy: 0.9656 - val_loss: 0.1217 - val_accuracy: 0.9659 - 1s/epoch - 3ms/step
Epoch 6/100
391/391 - 1s - loss: 0.1018 - accuracy: 0.9703 - val_l

##### [C] 13.3 查询最佳超参数配置

In [41]:
top_n = 4
best_hps = tuner.get_best_hyperparameters(top_n)

In [None]:
def get_best_epoch(hp):
    
    model = build_model(hp)
    
    callbacks=[
        keras.callbacks.EarlyStopping(
            monitor="val_loss", mode="min", patience=10)
    ]
    
    history = model.fit(
        x_train, y_train,
        validation_data=(x_val, y_val),
        epochs=100,
        batch_size=128,
        callbacks=callbacks)
    
    val_loss_per_epoch = history.history["val_loss"]
    
    best_epoch = val_loss_per_epoch.index(min(val_loss_per_epoch)) + 1
    
    print(f"Best epoch: {best_epoch}")
    
    return best_epoch

In [None]:
def get_best_trained_model(hp):
    
    best_epoch = get_best_epoch(hp)
    
    model.fit(
        x_train_full, y_train_full,
        batch_size=128, epochs=int(best_epoch * 1.2))
    
    return model

best_models = []

for hp in best_hps:
    model = get_best_trained_model(hp)
    model.evaluate(x_test, y_test)
    best_models.append(model)

In [None]:
best_models = tuner.get_best_models(top_n)

#### 2 构建搜索空间的艺术

#### 3 超参数优化的未来：自动化机器学习

### 13.1.2 模型集成

## 13.2 加速模型训练

### 13.2.1 使用混合精度加快`GPU`上的训练速度

#### 1 理解浮点数精度

In [1]:
import tensorflow as tf
import numpy as np

np_array = np.zeros((2, 2))
tf_tensor = tf.convert_to_tensor(np_array)
tf_tensor.dtype

tf.float64

In [2]:
np_array  = np.zeros((2, 2))
tf_tensor = tf.convert_to_tensor(np_array, dtype="float32")
tf_tensor.dtype

tf.float32

#### 2 混合精度训练的实践

In [3]:
from tensorflow import keras
keras.mixed_precision.set_global_policy("mixed_float16")

The dtype policy mixed_float16 may run slowly because this machine does not have a GPU. Only Nvidia GPUs with compute capability of at least 7.0 run quickly with mixed_float16.


### 13.2.2 多`GPU`训练

#### 1 获得两块或多块`GPU`

#### 2 单主机、多设备同步训练

In [4]:
strategy = tf.distribute.MirroredStrategy()  # 创建一个分布式策略对象
print(f'Number of devices:{strategy.num_replicas_in_sync}')

with strategy.scope():  # 开启策略作用域
    model = get_compiled_model()
    model.fit(train_dataset,
              epochs=100,
              validation_data=val_dataset,
              callbacks=callbacks
             )

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
Number of devices:1


### 13.2.3 `TPU`训练

#### 1 通过谷歌`Colab`使用`TPU`

In [None]:
import tensorflow as tf

tpu = tf.distribute.cluster_resolver.TPUClusterResolver.connect()

print('Device:', tpu.master())

##### [C] 13.4 在`TPUStrategy`作用域中构建模型

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

strategy = tf.distribute.TPUStrategy(tpu)
print(f'Number of replicas:{stragy.num_replicas_in_sync}')

#### 2 利用步骤融合来提高`TPU`利用率