我们将介绍自动差异化，这是优化机器学习模型的关键技术。
#### 建立

In [1]:
from __future__ import absolute_import,division,print_function,unicode_literals
import tensorflow as tf
tf.enable_eager_execution()

#### 渐变色带
TensorFlow提供用于自动区分的`tf.GradientTape` API - 计算与其输入变量相关的计算梯度。Tensorflow“记录” `tf.GradientTape`在“磁带” 上下文中执行的所有操作。Tensorflow然后使用该磁带和与每个记录操作相关联的梯度来计算使用反向模式区分的“记录”计算的梯度。

例如：

In [5]:
x = tf.ones((2,2))

with tf.GradientTape() as t:
    t.watch(x)
    y = tf.reduce_sum(x)
    z = tf.multiply(y,y)
    
# Derivative of z with respect to the original input tensor x 
dz_dx = t.gradient(z,x)
for i in [0,1]:
    for j in [0,1]:
        assert dz_dx[i][j].numpy()== 8.0

您还可以根据在“记录” `tf.GradientTape`上下文中计算的中间值请求输出的渐变。

In [6]:
x = tf.ones((2,2))

with tf.GradientTape() as t:
    t.watch(x)
    y = tf.reduce_sum(x)
    z = tf.multiply(y,y)
    
# Use the tape to compute the derivative of z with respect to the 
# intermediate value y
dz_dy = t.gradient(z,y)
assert dz_dy.numpy() == 8.0

默认情况下，GradientTape持有的资源会在调用`GradientTape.gradient（）`方法后立即释放。要在同一计算中计算多个渐变，请创建`persistent`渐变磁带。这允许多次调用该`gradient()`方法。当磁带对象被垃圾收集时释放资源。例如：

In [7]:
x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as t:
    t.watch(x)
    y = x*x
    z = y*y
    
dz_dx = t.gradient(z,x) #108.0(4*x^3 at x =3)
dy_dx = t.gradient(y,x) #t.0
del t # Drop the reference to the tape

#### 记录控制流程
因为磁带在执行时记录操作，所以自然会处理Python控制流（例如使用ifs和whiles）

In [8]:
def f(x,y):
    output = 1.0
    for i in range(y):
        if i>1 and i<5:
            output = tf.multiply(output,x)
    return output

def grad(x,y):
    with tf.GradientTape() as t:
        t.watch(x)
        out = f(x,y)
    return t.gradient(out,x)
x = tf.convert_to_tensor(2.0)

assert grad(x,6).numpy() == 12.0
assert grad(x,5).numpy() == 12.0
assert grad(x,4).numpy() == 4.0

#### 高阶梯度
`GradientTape`记录上下文管理器内部的操作以自动区分。如果在该上下文中计算梯度，则也记录梯度计算。因此，完全相同的API也适用于高阶梯度。例如：

In [9]:
x = tf.Variable(1.0) # Create a Tensorflow variable initialized to 1.0

with tf.GradientTape() as t:
    with tf.GradientTape() as t2:
        y = x*x*x
    #Compute the gradient inside the 't' context manager
    # which means the gradient computation is differentiable as well
    dy_dx = t2.gradient(y,x)
d2y_dx2 = t.gradient(dy_dx,x)

assert dy_dx.numpy() == 3.0
assert d2y_dx2.numpy() == 6.0

Instructions for updating:
Colocations handled automatically by placer.
