[CS 20: TensorFlow for Deep Learning Research:](https://docs.google.com/presentation/d/1e1gE2JJXipWm1UJgor_y8pHcM8L8oMaCVtvQvZUBlQY/edit#slide=id.g2f115d1cc0_0_421)

### Eager Execution
- Motivation:  
    - TensorFlow today: Construct a graph and execute it.    
    This is declarative programming. Its benefits include performance and easy translation to other platforms; drawbacks include that declarative programming is non-Pythonic and difficult to debug.  
    - What if you could execute operations directly?   
    Eager execution offers just that: it is an imperative front-end to TensorFlow.    
- Key advantages: Eager execution …    
    - is compatible with Python debugging tools  
        - pdb.set_trace() to your heart's content!  
    - provides immediate error reporting  
    - permits use of Python data structures  
    - enables you to use and differentiate through Python control flow


### 设置和基本用法
要启动 Eager Execution，请将 ```tf.enable_eager_execution()``` 添加到程序或控制台会话的开头。不要将此操作添加到程序调用的其他模块。

In [1]:
# from __future__ import absolute_import, division, print_function

import tensorflow as tf
import tensorflow.contrib.eager as tfe

tfe.enable_eager_execution() # Call this at program start-up
print(tf.__version__)

1.11.0-dev20180817


现在可以运行 Tensorflow 操作了，结果将立即返回：

In [2]:
tfe.executing_eagerly()

True

In [3]:
x = [[2.]] # No need for placeholders!
m = tf.matmul(x,x)
print(m)   # No sessions!

tf.Tensor([[4.]], shape=(1, 1), dtype=float32)


启用 Eager Execution 会改变 TensorFlow 操作的行为方式 - 现在它们会立即评估并将值返回给 Python。tf.Tensor 对象会引用具体值，而不是指向计算图中的节点的符号句柄。由于不需要构建稍后在会话中运行的计算图，因此使用 print() 或调试程序很容易检查结果。评估、输出和检查张量值不会中断计算梯度的流程。

### TensorFlow Today: Declarative (Graphs)

**优点 Graphs are:**  
Optimizable  
- automati nuffer reuse 自动缓存重用  
- constant folding 不断折叠  
- inter-op parallelism  并行操作  
- automatic trade-off between compute and memory 计算与内存之间的自动权衡  

Deployable  
- the graph is an intermediate representation for models 图是模型的中间表示  

Rewritable  
- experiment with automatic device placement or quantization


**缺点 But graphs are also ...**  
Difficult to debug  
- errors are reported long after graph construction  
- execution cannot be debugged with pdb or print statements  

Un-Pythonic  
- writing a TensorFlow program is an exercise in metaprogramming  
- control flow (e.g., tf.while_loop) differs from Python  
- can't easily mix graph construction with custom data structures

tensorflow 缺点：难以 debug，跟 numpy 之间无法通用，之前就遇到过 tf.bool ，必须用 tf.cond()  
现在有了 Exger execution, 在也不用担心：  
- placeholders  
- sessions  
- control dependencies  
- "lazy loading"  
- {name, variable, op} scopes

#### "Lazy Loading"

In [None]:
x = tf.random_uniform([2,2])
with tf.Session() as sess:
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            print(sess.run(x[i,j]))

For example, the code here is what one might quickly hack up in the middle of their program to analyze the Tensor x.  
It’s easy to miss, but each iteration of the loop is adding operations to the in-memory representation of the graph  
每一次迭代都执行一次图的表示，占用了内存??? 是吗，可能因为这里是随机的？在神经网络训练的时候，变量都是可以保存的，并不会重新保存图吧。如果这里 x 用 Variable 包裹下，就不会有这样的情况了对吧？

In this particular case, there is also the fact that each call to session.run is executing the random_uniform operation, so this snippet here isn’t printing a consistent snapshot of the tensor.  
而是每迭代执行一次 session.run 都会重新执行 random_unifor 操作。因此每一次打印其实是不同的 x.

重新启动 jupyter kernel

In [None]:
import tensorflow as tf
import tensorflow.contrib.eager as tfe

tfe.enable_eager_execution() # Call this at program start-up

In [None]:
x = tf.random_uniform([2,2])
for i in range(x.shape[0]):
    for j in range(x.shape[1]):
        print(x[i,j])

#### Tensors Act Like NumPy Arrays

可以用 numpy 的函数来处理 tensor

In [None]:
import numpy as np
x = tf.constant([1.0,2.0,3.0])

assert type(x.numpy()) == np.ndarray
squared = np.square(x)  # Tensors are compatible with NumPy functions

In [None]:
for i in squared:
    print(i)

#### Gradients

Automatic differentiation is built into eager execution

Under the hood ...  
- Operations are recorded on a tape  
- The tape is played back to compute gradients  
    - This is reverse-mode differentiation (backpropagation).


In [None]:
def square(x):
    return x**2

grad = tfe.gradients_function(square) # Differentiate w.r.t. input of square

In [None]:
print(square(3.))
print(grad(3.))

In [None]:
x = tfe.Variable(2.0)
def loss(y):
    return (y-x**2)**2

grad = tfe.implicit_gradients(loss)
print(loss(7.))

In [None]:
print(grad(7.))

In [None]:
# 都不用初始化了吗？
print(x)

APIs for computing gradients work even when eager execution is not enabled  
- tfe.gradients_function()  
- tfe.value_and_gradients_function()  
- tfe.implicit_gradients()  
- tfe.implicit_value_and_gradients()


#### A collection of operations
TensorFlow = Operation Kernels + Execution  
- Graph construction: Execute compositions of operations with Sessions  
- Eager execution: Execute compositions with Python 


大部分 TF 的 API 是不管 eager execution 是否开启都可以使用的。  
但是一旦 eager execution 被 enabled 后：  
- prefer **tfe.Variable** under eager execution (compatible with graph construction)  
- manage your own variable storage — variable collections are not supported!  没有变量管理了，需要自行管理变量  
- use **tf.contrib.summary**  
- use **tfe.Iterator** to iterate over datasets under eager execution  
- prefer object-oriented layers (e.g., tf.layers.Dense) 
    - functional layers (e.g., tf.layers.dense) only work if wrapped in **tfe.make_template**  
- prefer **tfe.py_func** over tf.py_func

### example 可参考 linear_eager.py
其中一些平时不怎么用的 api 这里研究下，很奇怪在 pycharm 关于 tfe 的都不能直接用 ctrl+B 来看源代码，可是代码跑起来又没有问题。。。

In [1]:
from __future__ import absolute_import   # 绝对路径的引入
from __future__ import division
from __future__ import print_function

import argparse
import sys

import tensorflow as tf
import tensorflow.contrib.eager as tfe

tf.enable_eager_execution()
print(tfe.executing_eagerly())

True


In [2]:
true_w = [[-2.0],[4.0],[1.0]]  # list
true_b = [0.5]
noise_level = 0.01

# Training constants
batch_size = 64
learning_rate = 0.1

print("True w: %s" % true_w)
print("True b: %s\n" % true_b)

True w: [[-2.0], [4.0], [1.0]]
True b: [0.5]



#### list 可以直接当做 tensor 用了

In [3]:
tf.shape(true_w)[0]

<tf.Tensor: id=6, shape=(), dtype=int32, numpy=3>

In [4]:
import numpy as np
a = [[1],[2]]
b = [[1,2,3]]
tf.matmul(a, b)
print(type(a), tf.shape(a))

<class 'list'> tf.Tensor([2 1], shape=(2,), dtype=int32)


#### 构造数据

In [5]:
def synthetic_dataset(w,b, noise_level, batch_size, num_batches):
    """tf.data.Dataset that yields synthetic data for linear regression."""
    return synthetic_dataset_helper(w, b,
                                    tf.shape(w)[0], noise_level, batch_size,
                                    num_batches)

In [6]:
def synthetic_dataset_helper(w, b, num_features, noise_level, batch_size,
                                num_batches):
    """

    # w is a matrix with shape [N, M]
    # b is a vector with shape [M]
    # So:
    # - Generate x's as vectors with shape [batch_size N]
    # - y = tf.matmul(x, W) + b + noise
    """
    def batch(_):
        x = tf.random_normal([batch_size, num_features])  # [64, 3]
        y = tf.matmul(x, w) + b + noise_level * tf.random_normal([]) # [64, 1]

        return x, y

    with tf.device("/device:GPU:0"):
        return tf.data.Dataset.range(num_batches).map(batch)

```tf.data.dataset``` 需要好好研究下，有了这个貌似就不需要 placeholder

#### 使用 ```tf.Iterator()``` 迭代得到数据

In [7]:
dataset = synthetic_dataset(true_w, true_b, noise_level, batch_size, 20)

In [8]:
dataset

<MapDataset shapes: ((64, ?), (64, 1)), types: (tf.float32, tf.float32)>

In [9]:
tfe.Iterator(dataset)

<tensorflow.contrib.eager.python.datasets.Iterator at 0x7f2a237e4cf8>

In [None]:
x, y = tfe.Iterator(dataset).next()
x.shape, y.shape

#### 模型 

In [10]:
class LinearModel(tf.keras.Model):
    """A tensorflow linear regression model"""

    def __init__(self):
        super(LinearModel, self).__init__()
        self._hidden_layer = tf.layers.Dense(1)

    def call(self, xs, ys):
        """Invoke the linear model

        :param xs: input features, as a tensor of size [batch_size, ndims].
        :return:  the predictions of the linear mode, as a tensor of size [batch_size]
        """
        logits = self._hidden_layer(xs)
        ### 损失函数是均方差
        return tf.reduce_mean(tf.square(tf.subtract(logits, ys)))

#### keras 可以这么写。。。。

In [None]:
tf.layers.Dense(1)(x).shape

#### 计算梯度和loss

In [13]:
mse = lambda xs, ys: model(xs, ys)
loss_and_grads = tfe.implicit_value_and_gradients(mse)
loss_and_grads

<function tensorflow.python.eager.backprop.implicit_val_and_grad.<locals>.grad_fn(*args, **kwds)>

In [14]:
type(loss_and_grads)

function

#### 定义模型和优化器

In [15]:
model = LinearModel()
device = "gpu:0" if tfe.num_gpus() else "cpu:0"
print("Using device: %s" % device)
with tf.device(device):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)

Using device: gpu:0


#### 执行 mini-batch 梯度下降

In [16]:
for i,(xs, ys) in enumerate(tfe.Iterator(dataset)):
        loss, grads = loss_and_grads(xs, ys)
        print("Iteration {}: loss = {}".format(i, loss.numpy()))
        optimizer.apply_gradients(grads)

print("\nAfter training: w=%s" % model.variables[0].numpy())
print("\nAfter training: b=%s" % model.variables[1].numpy())

Iteration 0: loss = 43.0392951965332
Iteration 1: loss = 16.165969848632812
Iteration 2: loss = 10.76777172088623
Iteration 3: loss = 7.971755504608154
Iteration 4: loss = 3.6797871589660645
Iteration 5: loss = 2.895902156829834
Iteration 6: loss = 1.5528433322906494
Iteration 7: loss = 0.6692441701889038
Iteration 8: loss = 0.5839046239852905
Iteration 9: loss = 0.24569039046764374
Iteration 10: loss = 0.2409191131591797
Iteration 11: loss = 0.16683286428451538
Iteration 12: loss = 0.11612279713153839
Iteration 13: loss = 0.0448516346514225
Iteration 14: loss = 0.04058341309428215
Iteration 15: loss = 0.022977981716394424
Iteration 16: loss = 0.02016042359173298
Iteration 17: loss = 0.011417818255722523
Iteration 18: loss = 0.007630700711160898
Iteration 19: loss = 0.003977485932409763

After training: w=[[-1.993047  ]
 [ 3.9545689 ]
 [ 0.97655946]]

After training: b=[0.5055817]
