# Eager Execution

Eager Execution是动态图，是一种命令式交互编程环境。
- 无需构建图，操作会返回具体的值，与tensorflow1.x的相比，不需要构建图在运行计算图，即sess.run()
- 自然控制流程 - 使用 Python 控制流程而不是图控制流程，简化了动态模型的规范
- 使用 Python 数据结构，与numpy兼容
- 执行效率高

### tensorflow的默认运行方式

In [1]:
import tensorflow as tf
print(tf.__version__)
tf.executing_eagerly()

2.0.0


True

## 1. Eager Execution运算
- tf.Tensor.numpy方法将对象的值作为NumPy的ndarray类型返回。

In [2]:
x = tf.constant([[1, 2, 4, 6], [8, 10, 7, 11]])
x = tf.add(x, 2)
print(x)
print("x:", x.numpy())

tf.Tensor(
[[ 3  4  6  8]
 [10 12  9 13]], shape=(2, 4), dtype=int32)
x: [[ 3  4  6  8]
 [10 12  9 13]]


## 2. Eager Execution 动态控制流
- Eager Execution再执行模型时可以使用Python的所有功能，例如for, if等

In [3]:
import numpy as np
def find_factor(num):
    result = []
    num = tf.convert_to_tensor(num)
    for i in range(1, num.numpy()+1):
        if int(num.numpy()) % i == 0:
            result.append(i)
    return result

In [4]:
result = find_factor(14)
print("all factors:", result)

all factors: [1, 2, 7, 14]


## 3. Eager Execution梯度计算
在tensorflow1.x版本是静态图，每一个静态图包含前向图和反向图。反向图用于梯度计算，用于训练模型的过程。在tensorflow2.x版本中默认运行的模式式Eager，用tf.GradientTape()计算函数和变量的梯度，类似于一个连接器的作用。tf.GradientTape()是官方推荐的用法。
- tf.GradientTape(persistent=False, watch_accessed_variables=True)
   - persistent：默认False，表示调用gradient函数后释放，无法进行下一次的梯度计算，如果进行多次梯度计算，设persistent为True
   - watch_accessed_variables：默认True，自动监测变量，对变量进行求导<br />
   
- gradient(target,
    sources,
    output_gradients=None,
    unconnected_gradients=tf.UnconnectedGradients.NONE) 
    - target: 可微的列表，变量，嵌套的张量结构
    - sources： 列表，变量，嵌套的张量结构，target在sources处的微分
    - output_gradients： 输出梯度的列表，梯度的每一个元素，默认为None
    - unconnected_gradients：如果sources与target没有链接(不可微),返回"none"或者"zero"，UnconnectedGradients会有详细的说明,并且默认为"none"

### 例子1：计算 y = 3x^2 在 x = 2的梯度

In [5]:
x = tf.constant(3.0)
with tf.GradientTape() as g:
    g.watch(x)
    y = 3 * x * x
    dy_dx = g.gradient(y, x)
print("计算的梯度为", dy_dx)

计算的梯度为 tf.Tensor(18.0, shape=(), dtype=float32)


 \* watch的作用是把常量x加进来，tf.GradientTape()默认监控到由tf.Variable定义的可训练的变量。对常量求梯度也可以设置watch_accessed_variables=False<br />
 \* 常量类型是float型，如果是int，会返回none<br />
 \* 一般在网络中使用时，不需要显式调用watch函数，使用默认设置，GradientTape会监控可训练变量

### 例子2 计算y=3x^3在x=2的二阶梯度

- tf.GradientTape()中的参数persistent为默认值

In [6]:
x = tf.constant(3.0)
with tf.GradientTape() as g:
    g.watch(x)
    y = 3 * x * x * x
    z = x * x
    dy_dx = g.gradient(y, x)
    dy2_dx2 = g.gradient(z, x)
print("dy_dx: ", dy_dx)
print("dy2_dx2: ", dy2_dx2)

RuntimeError: GradientTape.gradient can only be called once on non-persistent tapes.

- tf.GradientTape()中的参数persistent设为True

In [7]:
x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as g:
    g.watch(x)
    y = 3 * x * x * x
    dy_dx = g.gradient(y, x)
    dy2_dx2 = g.gradient(dy_dx, x)
print("dy_dx: ", dy_dx)
print("dy2_dx2: ", dy2_dx2)

dy_dx:  tf.Tensor(81.0, shape=(), dtype=float32)
dy2_dx2:  tf.Tensor(54.0, shape=(), dtype=float32)


- mathod2： 用两个tf.GradientTape()求梯度，采用默认的persistent=False

In [8]:
x = tf.constant(3.0)
with tf.GradientTape() as g:
    g.watch(x)
    with tf.GradientTape() as gg:
        gg.watch(x)
        y = 3 * x * x * x     
    dy_dx = gg.gradient(y, x)     # Will compute to 81.0
d2y_dx2 = g.gradient(dy_dx, x)  # Will compute to 54.0
print("dy_dx: ", dy_dx)
print("d2y_dx2", d2y_dx2)

dy_dx:  tf.Tensor(81.0, shape=(), dtype=float32)
d2y_dx2 tf.Tensor(54.0, shape=(), dtype=float32)


### 例子3 优化----求解参数w，b

In [9]:
class SimLayers(tf.keras.Model):
    def __init__(self):
        self.w = tf.Variable(tf.random.normal((6, 1)))
        self.b = tf.Variable(tf.random.normal([10]))
        super(SimLayers, self).__init__()
        
    def call(self, inputs):
        output = tf.add(tf.matmul(inputs, self.w), self.b)
        output = tf.keras.activations.sigmoid(output)
        return output 

In [10]:
x = tf.random.normal(shape=(10, 6))
y = tf.random.normal([10])
## 对x进行加噪声处理
x = tf.multiply(x, 2.5) + tf.random.normal(shape=(10, 6))
print("input x shape: ", x.shape)
## 加载模型
model = SimLayers()
## 优化器
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)
## 训练过程
for i in range(500):
    ## 计算梯度
    with tf.GradientTape() as tape:
        ## 计算loss
        loss = model(x) - y
        losses = tf.reduce_mean(tf.square(loss))
        grad = tape.gradient(losses, [model.w, model.b])
    optimizer.apply_gradients(zip(grad, [model.w, model.b]))
    if (i+1) % 10 == 0:
        print("step: {} loss: {}". format(i+1, losses))
        
print("optimizer weights: {}".format(model.w.numpy()))
print("optimizer bias: {}".format(model.b.numpy()))

input x shape:  (10, 6)
step: 10 loss: 1.0718059539794922
step: 20 loss: 1.0698118209838867
step: 30 loss: 1.0679030418395996
step: 40 loss: 1.0660481452941895
step: 50 loss: 1.0642211437225342
step: 60 loss: 1.0623986721038818
step: 70 loss: 1.0605601072311401
step: 80 loss: 1.05868661403656
step: 90 loss: 1.0567595958709717
step: 100 loss: 1.0547617673873901
step: 110 loss: 1.0526756048202515
step: 120 loss: 1.0504834651947021
step: 130 loss: 1.0481678247451782
step: 140 loss: 1.0457106828689575
step: 150 loss: 1.043094277381897
step: 160 loss: 1.0403008460998535
step: 170 loss: 1.0373129844665527
step: 180 loss: 1.0341147184371948
step: 190 loss: 1.0306910276412964
step: 200 loss: 1.0270297527313232
step: 210 loss: 1.0231202840805054
step: 220 loss: 1.018954873085022
step: 230 loss: 1.0145275592803955
step: 240 loss: 1.009832739830017
step: 250 loss: 1.0048632621765137
step: 260 loss: 0.9996078610420227
step: 270 loss: 0.9940468072891235
step: 280 loss: 0.9881490468978882
step: 290 

### 例子4 GPU和CPU性能比较

In [11]:
## 28x28x1
def CNNModel():
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Conv2D(64, 3, kernel_initializer='he_normal', strides=1, activation='relu', padding='same',
                                         name="conv1"))
    model.add(tf.keras.layers.MaxPool2D((2, 2), strides=2, padding='valid', name="pool1"))
    model.add(tf.keras.layers.Conv2D(128, 3, kernel_initializer="he_normal", strides=1, activation="relu", padding="same",
                                    name="conv2"))
    model.add(tf.keras.layers.MaxPool2D((2, 2), strides=2, padding="valid", name="pool2"))
    model.add(tf.keras.layers.Conv2D(128, 3, kernel_initializer="he_normal", strides=1, activation="relu", padding="same",
                                    name="conv3"))
    model.add(tf.keras.layers.MaxPool2D((2, 2), strides=2, padding="valid", name="pool3"))
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(128, activation="relu", kernel_initializer="he_normal"))
    model.add(tf.keras.layers.Dense(10, activation='softmax',kernel_initializer='he_normal'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

In [12]:
import time
def train_gpu():
    (train_data, train_label), (test_data, test_label) = tf.keras.datasets.mnist.load_data()
    ## 转换数据格式
    train_data = train_data.astype('float32')/255
    test_data = test_data.astype("float32")/255
    ## 对label进行one-hot编码
    train_label = tf.keras.utils.to_categorical(train_label, 10)
    train_label = tf.convert_to_tensor(train_label)
    test_label = tf.keras.utils.to_categorical(test_label, 10)
    test_label= tf.convert_to_tensor(test_label)
    ##增加维度
    train_data = tf.expand_dims(train_data, -1)
    test_data = tf.expand_dims(test_data, -1)
    print("train data shape", train_data.shape)
    model = CNNModel()
    my_callbacks = [tf.keras.callbacks.ModelCheckpoint('./logs/cnn_model.h5', verbose=1),
                    tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=2, mode='max')]
    ## 训练模型
    with tf.device("/gpu:0"):
        start_time = time.time()
        model.fit(train_data, train_label, batch_size=64, verbose=1, epochs=3, callbacks=my_callbacks, 
                  validation_data=(test_data, test_label)) #
        diff = time.time() - start_time
    return diff

In [13]:
def train_cpu():
    (train_data, train_label), (test_data, test_label) = tf.keras.datasets.mnist.load_data()
    ## 转换数据格式
    train_data = train_data.astype('float32')/255
    test_data = test_data.astype("float32")/255
    ##增加维度
    train_data = tf.expand_dims(train_data, -1)
    test_data = tf.expand_dims(test_data, -1)
    print("train data shape", train_data.shape)
    ## 对label进行one-hot编码
    train_label = tf.one_hot(train_label, 10)
    test_label = tf.one_hot(test_label, 10)
    model = CNNModel()
    my_callbacks = [tf.keras.callbacks.ModelCheckpoint('./logs/cnn_model.h5', verbose=1),
                    tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=2, mode='max')]
    ## 训练模型
    with tf.device("/cpu:0"):
        start_time = time.time()
        model.fit(train_data, train_label, batch_size=64, verbose=1, epochs=3, callbacks=my_callbacks, 
                  validation_data=(test_data, test_label))
        diff = time.time() - start_time
    return diff

In [14]:
gpu_time = train_gpu()
cpu_time = train_cpu()
print("gpu training takes time: ", gpu_time)
print("cpu training takes time: ", cpu_time)

train data shape (60000, 28, 28, 1)
Train on 60000 samples, validate on 10000 samples
Epoch 1/3
Epoch 00001: saving model to ./logs/cnn_model.h5
Epoch 2/3
Epoch 00002: saving model to ./logs/cnn_model.h5
Epoch 3/3
Epoch 00003: saving model to ./logs/cnn_model.h5
train data shape (60000, 28, 28, 1)
Train on 60000 samples, validate on 10000 samples
Epoch 1/3
Epoch 00001: saving model to ./logs/cnn_model.h5
Epoch 2/3
Epoch 00002: saving model to ./logs/cnn_model.h5
Epoch 3/3
Epoch 00003: saving model to ./logs/cnn_model.h5
gpu training takes time:  27.836952447891235
cpu training takes time:  184.38331365585327
