# 线性回归进阶

## 2.1 自动计算梯度  
  
|方法         |精确度         |是否支持任意代码      |备注               |   
|----------  |:----------:  |:----------:       |-----:             |    
|数值微分      |低            |是                 |实现琐碎            |  
|符号微分      |高            |否                 |会构建一个完全不同的图 |
|前向自动微分   |高            |是                 |基于二元树          |
|反向自动微分   |高            |是                 |由Tensorflow实现    |

<center>表2-1.自动计算梯度的主要方法</center>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;TensorFlow是利用反向自动微分来实现自动计算梯度。

导入相关包

In [12]:
import tensorflow as tf
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler
import datetime

下载并整理数据

In [13]:
housing = fetch_california_housing()
m,n = housing.data.shape # 获取数据的行列数
housing_data_plus_bias = np.c_[np.ones((m,1)),housing.data] # 为数据添加偏差项，即添加y=ax+b中的b

In [14]:
# 数据预处理
scaler = StandardScaler().fit(housing_data_plus_bias)
scaled_housing_data_plus_bias = scaler.transform(housing_data_plus_bias)

In [11]:
# 创建计算图（一）
# 数据转换为常量
# 设置各种参数
n_epochs = 1000
global_learning_rate = 0.01
X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name="y") # 数据标签
XT = tf.transpose(X)
theta = tf.Variable(tf.random_uniform([n+1,1],-1.0,1.0),name="theta")     # 参数
y_pred =  tf.matmul(X, theta, name="prediction")                          # 预测值
error = y_pred-y                                                          # 误差
mse = tf.reduce_mean(tf.square(error), name="mse")                        # 均方误差(成本函数)
gradient = tf.gradients(mse, [theta])[0]                                  # 使用反向自动微分计算梯度
training_op = tf.assign(theta, theta-global_learning_rate*gradient)       # 训练

## 2.2 使用优化器 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;在TensorFLow中不仅可以自动求得梯度，还可以调用各种优化器对参数$\theta$进行优化。调用优化器时，只需在构建计算图步骤时更改对`training_op=...`赋值的语句即可，详情如下：

In [15]:
# 创建计算图（二）
n_epochs = 1000
global_learning_rate = 0.01
X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name="y") # 数据标签
XT = tf.transpose(X)
theta = tf.Variable(tf.random_uniform([n+1,1],-1.0,1.0),name="theta")     # 参数
y_pred =  tf.matmul(X, theta, name="prediction")                          # 预测值
error = y_pred-y                                                          # 误差
mse = tf.reduce_mean(tf.square(error), name="mse")                        # 均方误差(成本函数)
# gradient = tf.gradients(mse, [theta])[0]                                  # 使用反向自动微分计算梯度
# 调用特定的优化器对参数进行优化
## 定义优化器(梯度下降)
# optimizer = tf.train.GradientDescentOptimizer(learning_rate = global_learning_rate)
## 定义优化器（动量）
optimizer = tf.train.MomentumOptimizer(learning_rate = global_learning_rate, momentum = 0.9)
training_op = optimizer.minimize(mse)       

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;创建会话，运行计算图，获得并观察结果

In [16]:
init = tf.global_variables_initializer()                                  # 添加初始化节点

starttime = datetime.datetime.now()
with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):                                       # 逐步训练
        if epoch%100==0:
            print("Epoch:", epoch, "MSE=", mse.eval())                    # 每一步均方误差
        sess.run(training_op)                                             # 执行每一步训练，更新梯度
        
    best_theta = theta.eval()                                             # 训练完毕，返回最佳参数
    print("The best theta is", best_theta)
endtime = datetime.datetime.now()
print("The running time:", (endtime - starttime))

Epoch: 0 MSE= 10.322025
Epoch: 100 MSE= 4.806012
Epoch: 200 MSE= 4.803412
Epoch: 300 MSE= 4.803272
Epoch: 400 MSE= 4.8032565
Epoch: 500 MSE= 4.8032546
Epoch: 600 MSE= 4.8032537
Epoch: 700 MSE= 4.8032537
Epoch: 800 MSE= 4.8032537
Epoch: 900 MSE= 4.8032537
The best theta is [[ 0.18474627]
 [ 0.82961535]
 [ 0.118751  ]
 [-0.26551932]
 [ 0.30569002]
 [-0.00450321]
 [-0.03932611]
 [-0.8998956 ]
 [-0.87055033]]
The running time: 0:00:00.346600


## 2.3 给训练算法提供数据
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;要实现最小批量梯度下降算法，需要每次训练时用小批量替换输入数据X和y。可以添加一个占位符节点执行该替换操作。它不进行任何计算，只在运行时输出需要输出的值。 

创建计算图（三），定义占位符节点，设置各种参数

In [9]:
X = tf.placeholder(tf.float32, shape=(None, n+1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
n_epochs = 1000
batch_size = 100
n_batches= int(np.ceil(m/batch_size))
global_learning_rate = 0.01
XT = tf.transpose(X)
theta = tf.Variable(tf.random_uniform([n+1,1],-1.0,1.0),name="theta")     # * 参数 seed=42
y_pred =  tf.matmul(X, theta, name="prediction")                          # 预测值
error = y_pred-y                                                          # 误差
mse = tf.reduce_mean(tf.square(error), name="mse")                        # 均方误差(成本函数)
# 手工计算梯度
## 使用反向自动微分计算梯度
# * gradient = tf.gradients(mse, [theta])[0]                                  
# 调用特定的优化器对参数进行优化
## 定义优化器(梯度下降)
optimizer = tf.train.GradientDescentOptimizer(learning_rate = global_learning_rate)
## * 定义优化器（动量）由于使用批量梯度下降算法，所以不可以使用动量优化器，否则会报错
# optimizer = tf.train.MomentumOptimizer(learning_rate = global_learning_rate, momentum = 0.9)
training_op = optimizer.minimize(mse)       

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;创建会话，运行计算图，获得并观察结果

In [10]:
init = tf.global_variables_initializer()                                  # 添加初始化节点

def fetch_batch(epoch, batch_index, batch_size):
    # ？？？？
    np.random.seed(epoch * n_batches + batch_index) 
    indices = np.random.randint(m, size=batch_size)
    X_batch = scaled_housing_data_plus_bias[indices] 
    y_batch = housing.target.reshape(-1, 1)[indices] 
    return X_batch, y_batch

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X:X_batch, y:y_batch})
    best_theta = theta.eval()
    print("The best theta is", best_theta)

The best theta is [[ 0.4020946 ]
 [ 0.8377844 ]
 [ 0.10645497]
 [-0.25947902]
 [ 0.29196444]
 [ 0.00181689]
 [ 0.2128084 ]
 [-0.89034677]
 [-0.85242176]]
