# TensorFlow入门教程

TensorFlow是谷歌大脑团队开发的一个强大的用于数值计算的开源软件库，尤其适用于大规模的机器学习模型训练。其基本使用原则很简单：用户用python代码定义各种计算图，然后Tensorflow将其转换为高效的C++代码去运行得出结果。

优点：
- 支持大部分平台，包括Win, Linux, maxOS, iOS, Andrioid
- 提供兼容Scikit-Learn的简单API：`TF.Learn`
- 提供简单的API用于简化构建，训练，计算神经网络模型：`TF-slim`
- 其主要的python API灵活性很强，可用于构建很复杂的计算图
- 包含了大部分机器学习算法的高效C++实现
- 提供了一些高级的优化算法去搜索最优参数。提供了自动计算梯度的机制：`autodiff`
- 提供可视化工具TensorBoard
- 支持运行计算图的云服务
- 不断增长的开源社区：https://www.tensorflow.org/, or https://github.com/jtoy/awesome-tensorflow

----

# 0. Setup

安装教程：参考[官网](https://www.tensorflow.org/install/)

这里讲讲利用Anaconda安装的情形。刚安装完后我打开jupyter notebook想 `import tensorflow as tf` 却找不到相应模块，这里卡了我好久……

按照官网流程操作后我们发现Anaconda的Environments选项多出了一个tensorflow。安装的过程实际上是创建了一个虚拟环境 `Virtualenv`。这个环境和base是不共享的，如果需要其他库比如`scikit-learn`, `matplotlib`, `ipython`, `jupyter notebook`这些功能都是需要自行安装的。用Anaconda Navigator很方便就装上了，又可以愉快的用notebook打代码了。

另外，不建议在base里直接安装tensorflow，貌似版本太低了。

In [3]:
import os
import tensorflow as tf
import numpy as np

# to make this notebook's output stable across runs
def reset_graph(seed=42):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

In [4]:
tf.__version__

'1.8.0'

# 1. 创建计算图与运行

TensorFlow 的程序可以分为两部分：
- 构建计算图（构建阶段）
- 运行计算图（执行阶段）

创建计算图

In [2]:
x = tf.Variable(3, name="x")
y = tf.Variable(4, name="y")
f = x*x*y + y + 2

在session中运行。session会决定把操作放到哪些设备（CPU或GPU）去运行，同时保存了所有的变量值

In [3]:
sess = tf.Session()
sess.run(x.initializer)
sess.run(y.initializer)
result = sess.run(f)
result

42

In [88]:
sess.close()

一些简化的运行方式，注意 `with tf.Session() as sess:` 语句将 `sess` 暂时设为默认session，也就是仅在 `with` 语句里生效。

In [4]:
# 1. 
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()
result

42

In [5]:
# 2.
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run() # 初始化所有变量
    result = f.eval()

result

42

`tf.InteractiveSession()` 会将其设置为默认 session

In [6]:
# 3. 
sess = tf.InteractiveSession()
init.run()
result = f.eval()
result

42

In [7]:
sess.close()

# 2. 对图进行操作

添加节点，自动添加到默认图中

In [8]:
reset_graph()

x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()

True

管理多个图的时候，可以用 `with` 语句暂时设为默认图

In [9]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)
    ...

In [10]:
x2.graph is graph

True

In [11]:
x2.graph is tf.get_default_graph()

False

重置默认图

In [12]:
tf.reset_default_graph()

# 3. 节点（Node）值的生命周期

In [13]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

with tf.Session() as sess:
    print(y.eval())
    print(z.eval())

10
15


调用 `eval()` 的时候TensorFlow会自动检测其依赖于哪些节点，然后逐个的去计算 。但要注意的一点是这些节点值是**不可重复利用**的！

在计算图的执行之间，除了变量（`tf.Variable`）外的节点值都会被删除。变量是由Session维护的，一个变量的生命周期是从其 `initializer()` 调用开始，直到其所在的Session关闭。

如果希望节点值重复利用可以在一次运行中求值。

In [14]:
with tf.Session() as sess:
    y_val, z_val = sess.run([y, z])
    print(y_val)
    print(z_val)

10
15


# 4. 线性回归

TensorFlow的操作（ops）可以接受任意个输入和产生任意个输出。输入输出都是称为tensors的多维向量。在Python API里tensor用Numpy ndarrays表示。

## 4.1 Normal Equation

用TensorFlow运行Normal Equation的一个优点是TensorFlow会利用GPU的计算资源，使其运行速度更快

In [15]:
import numpy as np
from sklearn.datasets import fetch_california_housing

reset_graph()

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()

In [16]:
theta_value

array([[ -3.71851807e+01],
       [  4.36337471e-01],
       [  9.39523336e-03],
       [ -1.07113101e-01],
       [  6.44792199e-01],
       [ -4.03380000e-06],
       [ -3.78137082e-03],
       [ -4.23484027e-01],
       [ -4.37219113e-01]], dtype=float32)

## 4.2 梯度下降

### 手动计算

In [17]:
from sklearn.preprocessing import StandardScaler

std_scaler = StandardScaler()
scaled_housing_data = std_scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

In [18]:
reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
gradients = 2/m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE = ", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

Epoch 0 MSE =  9.16154
Epoch 100 MSE =  0.714501
Epoch 200 MSE =  0.566705
Epoch 300 MSE =  0.555572
Epoch 400 MSE =  0.548811
Epoch 500 MSE =  0.543636
Epoch 600 MSE =  0.539629
Epoch 700 MSE =  0.536509
Epoch 800 MSE =  0.534068
Epoch 900 MSE =  0.532147


In [19]:
best_theta

array([[ 2.06855226],
       [ 0.88740271],
       [ 0.14401656],
       [-0.34770885],
       [ 0.36178368],
       [ 0.00393811],
       [-0.04269556],
       [-0.66145283],
       [-0.63752782]], dtype=float32)

### 利用TensorFlow的autodiff

autodiff可以自动实现求梯度，免去了手动推到表达式的步骤，这对于一些复杂的cost function是很有用的。

将

```python
gradients = 2/m * tf.matmul(tf.transpose(X), error)
```

替换为

```python
gradients = tf.gradients(mse, [theta])[0]
```

In [20]:
reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
gradients = tf.gradients(mse, [theta])[0]
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

print("Best theta:")
print(best_theta)

Epoch 0 MSE = 9.16154
Epoch 100 MSE = 0.714501
Epoch 200 MSE = 0.566705
Epoch 300 MSE = 0.555572
Epoch 400 MSE = 0.548811
Epoch 500 MSE = 0.543636
Epoch 600 MSE = 0.539629
Epoch 700 MSE = 0.536509
Epoch 800 MSE = 0.534068
Epoch 900 MSE = 0.532147
Best theta:
[[ 2.06855249]
 [ 0.88740271]
 [ 0.14401658]
 [-0.34770882]
 [ 0.36178368]
 [ 0.00393811]
 [-0.04269556]
 [-0.66145277]
 [-0.6375277 ]]


![autodiff](./images/ch09/autodiff.png)

### 使用优化器（Optimizer）

将 `gradients = ...` 和 `training_op = ...` 替换为：

```python
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)
```
或者
```python
optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9)
training_op = optimizer.minimize(mse)
```

In [21]:
reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

print("Best theta:")
print(best_theta)

Epoch 0 MSE = 9.16154
Epoch 100 MSE = 0.714501
Epoch 200 MSE = 0.566705
Epoch 300 MSE = 0.555572
Epoch 400 MSE = 0.548811
Epoch 500 MSE = 0.543636
Epoch 600 MSE = 0.539629
Epoch 700 MSE = 0.536509
Epoch 800 MSE = 0.534068
Epoch 900 MSE = 0.532147
Best theta:
[[ 2.06855249]
 [ 0.88740271]
 [ 0.14401658]
 [-0.34770882]
 [ 0.36178368]
 [ 0.00393811]
 [-0.04269556]
 [-0.66145277]
 [-0.6375277 ]]


# 5. 传入数据（Feeding Data)

这里介绍一种新的节点——占位节点（placeholder node）,用来向TensorFLow传入数据。

- 运行时不做计算（不调用`eval()`），而是由用户提供数据
- 运行时不提供数据会抛出异常

声明语句：`tf.placeholder()`

- 必须提供数据类型
- 可以限制shape
  - `None` 表示任何大小

In [22]:
A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5
with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})
    
print(B_val_1)

[[ 6.  7.  8.]]


In [23]:
print(B_val_2)

[[  9.  10.  11.]
 [ 12.  13.  14.]]


## Mini-batch Gradient Descent

In [24]:
n_epochs = 1000
learning_rate = 0.01

In [25]:
reset_graph()

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

In [26]:
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

In [27]:
n_epochs = 10

In [28]:
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

In [29]:
def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)
    indices = np.random.randint(m, size=batch_size)
    X_batch = scaled_housing_data_plus_bias[indices]
    y_batch = housing.target.reshape(-1, 1)[indices]
    return X_batch, y_batch

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()

In [30]:
best_theta

array([[ 2.07033372],
       [ 0.86371452],
       [ 0.12255151],
       [-0.31211874],
       [ 0.38510373],
       [ 0.00434168],
       [-0.01232954],
       [-0.83376896],
       [-0.80304712]], dtype=float32)

# 6. 保存模型

在构建阶段的最后添加一个保存节点（Saver Node）

```python
saver = tf.train.Saver()
```

在执行阶段想要保存的时候就调用其 `save()` 函数，默认保存所有的变量。也可以只保存自己想要的变量，比如：

```python
saver = tf.train.Saver({"weights": theta})
```

保存节点的一个很好的用途使作为模型训练过程中的Checkpoints，这样当程序意外中断时可以方便的还原。

In [31]:
reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0: # checkpoint every 100 epoches
            print("Epoch", epoch, "MSE =", mse.eval())
            save_path = saver.save(sess, "/tmp/my_model.ckpt")
        sess.run(training_op)
    
    best_theta = theta.eval()
    save_path = saver.save(sess, "/tmp/my_model_final.ckpt")

Epoch 0 MSE = 9.16154
Epoch 100 MSE = 0.714501
Epoch 200 MSE = 0.566705
Epoch 300 MSE = 0.555572
Epoch 400 MSE = 0.548811
Epoch 500 MSE = 0.543636
Epoch 600 MSE = 0.539629
Epoch 700 MSE = 0.536509
Epoch 800 MSE = 0.534068
Epoch 900 MSE = 0.532147


还原Session，只要把 `sess.run(init)` 替换成

```python
saver.restore(sess, "/tmp/my_model.ckpt")
```

即可

# 7. 利用TensorBoard可视化计算图和训练曲线

我们需要把计算图的定义和训练数据保存到本地的log目录里，TensorBoard会在这个目录读取数据然后进行可视化。注意：每次运行程序，数据都要保存在不同的log目录里，所以在文件名里加上timestamp确保唯一性。

In [None]:
reset_graph()

from datetime import datetime

now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "tf_logs"
logdir = "{}/run-{}/".format(root_logdir, now)

In [33]:
# construction phase
n_epochs = 1000
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

In [34]:
mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

将以上代码插入到构建阶段的最后。

- 第一行计算MSE的值，然后写到 `summary`里。（`summary`是TensorFLow可识别的二进制日志字符串（binary log string））
- 第二行 `FileWriter` 用来将 `summary` 写到日志文件（logfiles），保存在 `logdir` 目录下。第二个参数是你想要可视化的计算图。计算图的定义保存在二进制日志文件event file中。

In [35]:
n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            if batch_index % 10 == 0:
                # 每10个mini-batches写一次summary
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch_index
                file_writer.add_summary(summary_str, step)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()

In [36]:
file_writer.close()

In [37]:
best_theta

array([[ 2.07033372],
       [ 0.86371452],
       [ 0.12255151],
       [-0.31211874],
       [ 0.38510373],
       [ 0.00434168],
       [-0.01232954],
       [-0.83376896],
       [-0.80304712]], dtype=float32)

## 打开TensorBoard

```shell
cd $ML_PATH
# 激活tensorflow环境
source ~/tensorflow/bin/activate
tensorboard --logdir tf_logs/
```

# 8. 名字域（Name Scopes）

当处理更复杂的模型（比如神经网络）时，计算图的节点的数量可能成千上万，为了方便可视化，可以创建名字域来把相关的节点分为一组。比如定义 `error` 和 `mse` 的名字域为 `loss`

In [38]:
with tf.name_scope("loss") as scope:
    error = y_pred - y
    mse = tf.reduce_mean(tf.square(error), name="mse")

In [39]:
print(error.op.name)

loss/sub


In [40]:
print(mse.op.name)

loss/mse


# 9. 模块化（Modularity）

修正线性单元（rectified linear units，ReLU）

$$
h_{w, b}(X) = max(X \cdot w+b, 0)
$$

非模块化版

In [41]:
reset_graph()

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")

w1 = tf.Variable(tf.random_normal((n_features, 1)), name="weights1")
w2 = tf.Variable(tf.random_normal((n_features, 1)), name="weights2")
b1 = tf.Variable(0.0, name="bias1")
b2 = tf.Variable(0.0, name="bias2")

z1 = tf.add(tf.matmul(X, w1), b1, name="z1")
z2 = tf.add(tf.matmul(X, w2), b2, name="z2")

relu1 = tf.maximum(z1, 0., name="relu1")
relu2 = tf.maximum(z2, 0., name="relu2")

output = tf.add(relu1, relu2, name="output")

模块化，利用函数来计算

In [42]:
reset_graph()

def relu(X):
    w_shape = (int(X.get_shape()[1]), 1)
    w = tf.Variable(tf.random_normal(w_shape), name="weights")
    b = tf.Variable(0.0, name="bias")
    z = tf.add(tf.matmul(X, w), b, name="z")
    return tf.maximum(z, 0., name="relu")

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name="output")

In [45]:
file_writer = tf.summary.FileWriter("tf_logs/relu", tf.get_default_graph())

利用名字域在组织更加简洁。当变量名字有重复的时候tensorflow自动加后缀 "_1" "_2"等等

In [46]:
reset_graph()

def relu(X):
    with tf.name_scope("relu"):
        w_shape = (int(X.get_shape()[1]), 1)
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias")
        z = tf.add(tf.matmul(X, w), b, name="z")
        return tf.maximum(z, 0., name="relu")

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name="output")
file_writer = tf.summary.FileWriter("tf_logs/relu_name_scope", tf.get_default_graph())

# 10. 共享变量

法一：提前创建变量，作为函数参数传递

当共享变量很多时可以：
- 用dict保存所有共享变量，然后作为函数参数
- 为每个模块建立一个类来处理共享变量

In [75]:
def relu(X, threshold):
    with tf.name_scope("relu"):
        w_shape = (int(X.get_shape()[1]), 1)
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias")
        z = tf.add(tf.matmul(X, w), b, name="z")
        return tf.maximum(z, threshold, name="relu")
    
threshold = tf.Variable(0.0, name="threshold")
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X, threshold) for i in range(5)]
output = tf.add_n(relus, name="output")

法二：把共享变量设为函数的属性（Attribute）

In [76]:
def relu(X):
    with tf.name_scope("relu"):
        if not hasattr(relu, "threshold"):
            relu.threshold = tf.Variable(0.0, name="threshold")
        w_shape = (int(X.get_shape()[1]), 1)
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias")
        z = tf.add(tf.matmul(X, w), b, name="z")
        return tf.maximum(z, threshold, name="relu")

法三：利用TensorFLow提供的共享机制

核心思想是利用 `tf.get_variable()` 函数。当共享变量不存在时，它会创建这个变量，当变量已经存在时则使用变量。这个机制需要当前的 `variable_scope()` 来控制。

In [78]:
reset_graph()
# create variable named "relu/threshold"
# if threshold already exist, raise exception
with tf.variable_scope("relu"):
    threshold = tf.get_variable("threshold", shape=(), 
                               initializer=tf.constant_initializer(0.0))

In [79]:
# fetch the existing "relu/threshold" variable
# if not created before will raise exception
with tf.variable_scope("relu", reuse=True):
    threshold = tf.get_variable("threshold")
    
# alternative
with tf.variable_scope("relu") as scope:
    scope.reuse_variables()
    threshold = tf.get_variable("threshold")

**注意：只有通过 `get_variable()` 创建的变量才能这样重用**

应用到上面的relu函数

In [81]:
reset_graph()

def relu(X):
    with tf.variable_scope("relu", reuse=True):
        threshold = tf.get_variable("threshold") # reuse existing varialbe
        w_shape = (int(X.get_shape()[1]), 1)
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias")
        z = tf.add(tf.matmul(X, w), b, name="z")
        return tf.maximum(z, threshold, name="relu")

X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
# create the variable
with tf.variable_scope("relu"):
    threshold = tf.get_variable("threshold", shape=(),
                                initializer=tf.constant_initializer(0.0))
relus = [relu(X) for relu_index in range(5)]
output = tf.add_n(relus, name="output")

In [83]:
file_writer = tf.summary.FileWriter("tf_logs/relu_with_shared_threshold", tf.get_default_graph())
file_writer.close()

将threshold的定义放到函数中

In [84]:
reset_graph()

def relu(X):
    threshold = tf.get_variable("threshold", shape=(),
                                initializer=tf.constant_initializer(0.0))
    w_shape = (int(X.get_shape()[1]), 1)
    w = tf.Variable(tf.random_normal(w_shape), name="weights")
    b = tf.Variable(0.0, name="bias")
    z = tf.add(tf.matmul(X, w), b, name="z")
    return tf.maximum(z, threshold, name="relu")

X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = []
for relu_index in range(5):
    with tf.variable_scope("relu", reuse=(relu_index >= 1)) as scope:
        relus.append(relu(X))
output = tf.add_n(relus, name="output")

In [85]:
file_writer = tf.summary.FileWriter("tf_logs/relu_with_threshold_defined_inside", tf.get_default_graph())
file_writer.close()

`name_scope` 和 `variable_scope` 的差别参考：[csdn](https://blog.csdn.net/jerr__y/article/details/60877873)

# 补充材料

## variable_scope

除了共享变量之外，`tf.variable_scope`的作用与一般名字域的作用相同。

reset_graph()

with tf.variable_scope("my_scope"):
    x0 = tf.get_variable("x", shape=(), initializer=tf.constant_initializer(0.))
    x1 = tf.Variable(0., name="x")
    x2 = tf.Variable(0., name="x")

with tf.variable_scope("my_scope", reuse=True):
    x3 = tf.get_variable("x")
    x4 = tf.Variable(0., name="x")

with tf.variable_scope("", default_name="", reuse=True):
    x5 = tf.get_variable("my_scope/x")

print("x0:", x0.op.name)
print("x1:", x1.op.name)
print("x2:", x2.op.name)
print("x3:", x3.op.name)
print("x4:", x4.op.name)
print("x5:", x5.op.name)
print(x0 is x3 and x3 is x5)

## Strings

In [10]:
reset_graph()

text = np.array("Do you want some café?".split())
text_tensor = tf.constant(text)

with tf.Session() as sess:
    print(text_tensor.eval())

[b'Do' b'you' b'want' b'some' b'caf\xc3\xa9?']


# 习题

1. 相比直接执行计算，TensorFlow先建立计算图再执行的方法有什么优缺点？

>优点：
>  - 自动计算梯度
>  - 并行化执行操作
>  - 方便跨设备运行
>  - 方便可视化（TensorBoard）
>
>缺点：
>  - 学习曲线陡峭
>  - 逐步调试比较困难

2. `a_val = a.eval(session=sess)` 和 `a_val = sess.run(a)` 相同吗？

> 相同
> ```python
> a = tf.constant(2)
>
>sess = tf.Session()
>a_val_1 = a.eval(session=sess)
>a_val_2 = sess.run(a)
>
>print(a_val_1, a_val_2)
>```
> 输出：2 2

3. `a_val, b_val = a.eval(session=-sess), b.eval(session=sess)` 和 `a_val, b_val = sess.run([a, b])` 是否相同？

> 不同。
> 前者实际上执行了两次计算图，而后者只执行了一次。
> 如果有side effect，那么side effect是不同的。如果没有，那么两个语句返回的值是相同的。

4. 能不能在同一个session里运行两个图？

> 不能。运行两个图之前必须先合并。

5. 如果一个计算图`g`中包含变量`w`，然后开启两个线程，每个线程各自创建一个session运行计算图，那么这个变量 `w` 是共同维护的还是各自有一份copy？

> 分情况讨论。
>
> - 对于local的tensorflow：各自有一份copy，一个session维护一个变量 `w`
> - 对于分布式的tensorflow：变量保存在集群的容器（container）里，所以如果两个session都连接在同一个集群上，使用相同的容器，那么 `w` 就是共享的

6. 变量什么时候初始化？什么时候被销毁？

> 一般来说。调用 `initializer` 时初始化，所在session关闭时销毁。
>
> 但在分布式TensorFlow里，变量保存在集群的容器上，所以session关闭不会销毁变量，想要销毁变量必须清空容器。

7. placeholder 和 variable 的区别

>- variable：与平常编程中的变量相同。需要初始化，可以赋值，可以取值。
>- placeholder：占位符
>  - 本身没有值
>  - 需要传入数据才能使用：`feed_dict`，否则会抛出异常
>  - 通常用于传入训练数据和测试数据

8. 如何在执行阶段修改变量的值？

> 在构建阶段添加一个赋值结点（assign node），例如：

In [9]:
import tensorflow as tf

x = tf.Variable(tf.random_uniform(shape=(), minval=0.0, maxval=1.0))
x_new_val = tf.placeholder(shape=(), dtype=tf.float32)
x_assign = tf.assign(x, x_new_val)

with tf.Session():
    x.initializer.run()
    print(x.eval())
    x_assign.eval(feed_dict={x_new_val: 5.0})
    print(x.eval())

0.79410625
5.0


9. 假设一个cost function有10个变量，则利用resverse-mode autodiff, forward-mode autodiff, symbolic differentiation三种方法计算梯度，各自需要遍历计算图几次？

> 待补充

10. 利用TensorFlow实现Logistic Regression with Mini-batch Gradient Descent，并在moons数据集（chapter 5）上跑一遍。

> 待做

# EOF