<span style="font-size: 24px; font-weight: bold;">第9章·启动并运行TensorFlow</span>

主要目的：
1. 熟悉TensorFlow
2. 使用TensorFlow创建简单的机器学习模型

# TensorFlow介绍

TensorFlow将图分解成多个块并在多个CPU或GPU上并行运行。

亮点：
- 多平台运行：Windows、Linux、MaxOS、IOS、Android
- 训练不同神经网络：`tensorflow.contrib.learn` 与 Scikit-Learn兼容。
- 简化创建、训练和求出神经网络：`tensorflow.contrib.slim`
- 高级API：**Keras** 或 **Pretty Tensor**
- 灵活性：主要的Python API
- 高效实现：C++ API
- 高级优化节点来搜索最小化损失函数的参数。自动处理定义的函数的梯度。这成为自动分解(autodi)
- 可视化工具：TensorBoard
- 云服务
- 社区强大

TensorFlow和其他一些深度学习框架的对比

![tensorflow-and-others](../images/09-tensorflow-and-others.png)

# tutorial

## 创建第一个图谱，然后运行它

In [2]:
import tensorflow as tf

x = tf.Variable(3., name="x")
y = tf.Variable(4., name="y")
f = x * x * y + y + 2

上面的代码不执行任何计算。只是创建一个计算图谱。

In [6]:
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()
print(result)

42.0


调用 `x.initializer.run()` 等效于调用 `tf.get_default_session().run(x.initial)`

`f.eval()` 等效于 `tf.get_default_session().run(f)`

可以使用 `global_variables_initializer()`，代替手动初始化每个变量。（没有立即初始化，实际上是创建了一个所有变量都会初始化的节点）

In [7]:
init = tf.global_variables_initializer()
with tf.Session() as sess:
    init.run()
    result = f.eval()
print(result)

42.0


`Jupyter` 内部或 `Python Shell` 中， `InteractiveSession`，将自动设置自身为默认会话，因此您不需要使用模块。

In [9]:
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
init.run()  # 已经是在sess上下文中运行的了，但是为了直接转换成.py，我还是偏向于with
result = f.eval()
print(result)
sess.close()

42.0


## 管理图谱

In [10]:
x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()

True

创建新的图形并暂时设置为块中的默认图形

In [11]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)

print(x2.graph is graph)
print(x2.graph is tf.get_default_graph())

True
False


重置默认图：`tf.reset_default_graph()`

## 节点值的生命周期

In [12]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3
with tf.Session() as sess:
    print(y.eval())
    print(z.eval())

10
15


上面的代码，不会复用之前x和w的结果。所有的节点值都在图运行之间删除，出了变量值，由会话跨图形运行维护。

如果要有效的求出y和z，应该在一个图形运行中求出值：

In [13]:
with tf.Session() as sess:
    y_val, z_val = sess.run([y, z])
    print(y_val)
    print(z_val)

10
15


单进程：
1. 多个会话不共享任何状态(即使复用同一个图)

分布式：
1. 变量状态存储在服务器上
2. 多个会话可以共享相同的变量

# Linear Regression with TensorFlow

In [15]:
import numpy as np
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)
with tf.Session() as sess:
    theta_val = theta.eval()
print(theta_val)

[[-3.7185181e+01]
 [ 4.3633747e-01]
 [ 9.3952334e-03]
 [-1.0711310e-01]
 [ 6.4479220e-01]
 [-4.0338000e-06]
 [-3.7813708e-03]
 [-4.2348403e-01]
 [-4.3721911e-01]]


[最小二乘法求$\theta$](https://blog.csdn.net/akon_wang_hkbu/article/details/77503725)

## 实现梯度下降

### 手动计算梯度

- `random_uniform()` 创建一个节点，生成包含随机数的张量。
- `assign()` 为变量分配新值的节点。即 $\theta(nextstep)=\theta-\eta\nabla_\theta MSE(\theta)$
- 主循环一次又一次执行训练步骤，每100epoch打印一次MSE

In [17]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]
scaled_housing_data_plus_bias = scaler.fit_transform(housing_data_plus_bias)
epochs = 1000
learning_rate = 0.01
X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
gradients = 2. / m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - learning_rate * gradients)
init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for epoch in range(epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    best_theta = theta.eval()
print(best_theta)

Epoch 0 MSE = 8.138382
Epoch 100 MSE = 5.1309924
Epoch 200 MSE = 4.993041
Epoch 300 MSE = 4.9375887
Epoch 400 MSE = 4.9000688
Epoch 500 MSE = 4.8731575
Epoch 600 MSE = 4.853745
Epoch 700 MSE = 4.839729
Epoch 800 MSE = 4.829614
Epoch 900 MSE = 4.8223033
[[ 0.98489666]
 [ 0.77840763]
 [ 0.15502639]
 [-0.08296237]
 [ 0.11699436]
 [ 0.00935718]
 [-0.04130271]
 [-0.6835345 ]
 [-0.64407814]]


### 自动扩展(autodiff)

只需要替换：

In [18]:
gradients = tf.gradients(mse, [theta])[0]

![tensorflow-autodiff](../images/09-tensorflow-autodiff.png)

### TensorFlow优化器

提供了一些直接可用的优化器，包括梯度下降优化器。

In [19]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

In [20]:
momentum_optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9)

## 将数据提供给训练算法

**小批量梯度下降**

每次迭代时，使用下一次小批量替代 `X` 和 `y`。(使用占位符(placeholder))

placeholder: 不执行任何计算，这是输出您在运行时输出的数据。

In [21]:
A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5
with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})
print(B_val_1)
print(B_val_2)

[[6. 7. 8.]]
[[ 9. 10. 11.]
 [12. 13. 14.]]


实现小批量梯度下降：

In [22]:
X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

batch_size = 100
n_batches = int(np.ceil(m * 1. / batch_size))

In [23]:
### 部分需要修改的代码
# def fetch_batch(epoch, batch_index, batch_size):
#     return X_batch, y_batch


# with tf.Session() as sess:
#     sess.run(init)
    
#     for epoch in range(epochs):
#         for batch_index in range(n_batches):
#             X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
#             sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
#     best_theta = theta.eval()

In [25]:
import numpy as np
from sklearn.datasets import fetch_california_housing
import tensorflow as tf
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()
m, n = housing.data.shape
print("数据集:{}行,{}列".format(m,n))
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

n_epochs = 1000
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()
# n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

def fetch_batch(epoch, batch_index, batch_size):
    know = np.random.seed(epoch * n_batches + batch_index)  # not shown in the book
    indices = np.random.randint(m, size=batch_size)  # not shown
    X_batch = scaled_housing_data_plus_bias[indices] # not shown
    y_batch = housing.target.reshape(-1, 1)[indices] # not shown
    return X_batch, y_batch

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
    best_theta = theta.eval()
print(best_theta)

数据集:20640行,8列
[[ 2.0714476 ]
 [ 0.8462012 ]
 [ 0.11558535]
 [-0.26835832]
 [ 0.32982782]
 [ 0.00608358]
 [ 0.07052915]
 [-0.87988573]
 [-0.8634251 ]]


## 保存和恢复模型

创建保存节点。

In [29]:
saver = tf.train.Saver()
# save_path = saver.save(sess, "/tmp/my_model_final.ckpt")
# saver.restore(sess, "/tmp/my_model_final.ckpt")
# 指定要保存或还原的变量以及要使用的名称
# saver = tf.train.Saver({"weights": theta})

使用 TensorBoard

In [35]:
tf.reset_default_graph()

housing = fetch_california_housing()
m, n = housing.data.shape
print("数据集:{}行,{}列".format(m,n))
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

now = datetime.utcnow().strftime("%Y%m%d%H%M%S")  # add
root_logdir = r"/tmp"                      # add
logdir = "{}/run-{}/".format(root_logdir, now)    # add
n_epochs = 1000
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)
init = tf.global_variables_initializer()
mse_summary = tf.summary.scalar('MSE', mse)        # add
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())  # add
n_epochs = 1000
batch_size = 100
n_batches = int(np.ceil(m / batch_size))
def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)  
    indices = np.random.randint(m, size=batch_size)  
    X_batch = scaled_housing_data_plus_bias[indices] 
    y_batch = housing.target.reshape(-1, 1)[indices]
    return X_batch, y_batch

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            if batch_index % 10 == 0:
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})  # add
                step = epoch * n_batches + batch_index
                file_writer.add_summary(summary_str, step)                          # add
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
    best_theta = theta.eval()
    file_writer.close()
print(best_theta)

数据集:20640行,8列
[[ 2.0714476 ]
 [ 0.8462012 ]
 [ 0.11558535]
 [-0.26835832]
 [ 0.32982782]
 [ 0.00608358]
 [ 0.07052915]
 [-0.87988573]
 [-0.8634251 ]]


# 名称作用域

处理更复杂的模型时，该图可以很容易与数千个节点混淆。

In [33]:
with tf.name_scope("loss") as scope:
    error = y_pred - y
    mse = tf.reduce_mean(tf.square(error), name="mse")

In [34]:
print(error.op.name)
print(mse.op.name)

loss/sub
loss/mse


# 模块性

ReLU:

$$h_{w, b}(X) = max(X \cdot w + b, 0)$$

将两个ReLU的输出值相加

下面的代码做这个工作，但是相当重复：

In [36]:
tf.reset_default_graph()
n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
w1 = tf.Variable(tf.random_normal((n_features, 1)), name="weights1")
w2 = tf.Variable(tf.random_normal((n_features, 1)), name="weights2")
b1 = tf.Variable(0.0, name="bias1")
b2 = tf.Variable(0.0, name="bias2")
z1 = tf.add(tf.matmul(X, w1), b1, name="z1")
z2 = tf.add(tf.matmul(X, w2), b2, name="z2")
relu1 = tf.maximum(z1, 0, name="relu1")
relu2 = tf.maximum(z2, 0, name="relu2")
output = tf.add(relu1, relu2, name="output")

永远不要这样做

In [37]:
def relu(X):
    w_shape = (int(X.get_shape()[1]), 1)
    w = tf.Variable(tf.random_normal(w_shape), name="weights")
    b = tf.Variable(0.0, name="bias")
    z = tf.add(tf.matmul(X, w), b, name="z")
    return tf.maximum(z, 0, name="relu")

In [38]:
n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X) for _ in range(5)]
output = tf.add_n(relus, name="output")

使用名称作用域，可以使图形更清晰

In [39]:
def relu(X):
    with tf.name_scope("relu"):
        w_shape = (int(X.get_shape()[1]), 1)
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias")
        z = tf.add(tf.matmul(X, w), b, name="z")
        return tf.maximum(z, 0, name="relu")

# 共享变量

In [45]:
with tf.variable_scope("relu", reuse=False) as scope:
#     scope.reuse_variables()  # reuse=True
    threshold = tf.get_variable("threshold", shape=(), initializer=tf.constant_initializer(0.0))

在第一次调用时，设置为False，并在其他时候设置为False，可以在代码中设置默认值