# sklearn中随机梯度下降
参见 自己封装[代码](playML/LinearRegression.py)

科学方法，把所有的样本都看1遍，把所有的信息都考虑进来了


`np.random.permutation(m)` 生成随机序列，乱序，重新排列

### 具体实现也是挺复杂的

计算每1个 $\theta$ 都是取固定的1项，这个前提不变

改进：每一轮迭代，取不同样本，更新1次 $\theta$，直到把所有的样本都用到，更新m次 $\theta$

考虑到所有的`样本`信息

迭代次数，要整体`看样本`看几遍


In [2]:
import numpy as np
import matplotlib.pyplot as plt

In [3]:
m = 10000

x = np.random.normal(size=m)
X = x.reshape(-1, 1)
y = 4.*x + 3. + np.random.normal(0, 3, size=m)

In [3]:
from playML.LinearRegression import LinearRegression

lin_reg = LinearRegression()
lin_reg.fit_sgd(X, y, n_iters=2)

LinearRegression()

In [4]:
lin_reg.coef_

array([4.03882587])

In [5]:
lin_reg.intercept_

3.0337469167769564

In [None]:
# 随机梯度下降
def fit_sgd(self, X_train, y_train, n_iters=50, t0=5, t1=50):
    """根据训练数据集X_train, y_train, 使用梯度下降法训练Linear Regression模型"""
    assert X_train.shape[0] == y_train.shape[0], \
        "the size of X_train must be equal to the size of y_train"
    assert n_iters >= 1

    def dJ_sgd(theta, X_b_i, y_i):
        return X_b_i * (X_b_i.dot(theta) - y_i) * 2.

    def sgd(X_b, y, initial_theta, n_iters=5, t0=5, t1=50):

        def learning_rate(t):
            return t0 / (t + t1)

        theta = initial_theta
        m = len(X_b)

        # 科学方法，把所有的样本都看1遍，把所有的信息都考虑进来了
        for i_iter in range(n_iters):
            indexes = np.random.permutation(m) # 随机打乱序列
            X_b_new = X_b[indexes,:] # 新样本序列
            y_new = y[indexes]
            for i in range(m):
                gradient = dJ_sgd(theta, X_b_new[i], y_new[i])
                theta = theta - learning_rate(i_iter * m + i) * gradient

        return theta

    X_b = np.hstack([np.ones((len(X_train), 1)), X_train]) # 组合数组
    initial_theta = np.random.randn(X_b.shape[1])
    self._theta = sgd(X_b, y_train, initial_theta, n_iters, t0, t1)

    self.intercept_ = self._theta[0]
    self.coef_ = self._theta[1:]

    return self

<br><br>

## 真实数据使用我们自己的SGD（波士顿房价）

迭代次数小，很具有随机性

增加迭代次数，结果越稳定

In [4]:
from sklearn import datasets

boston = datasets.load_boston()
X = boston.data
y = boston.target

X = X[y<50.0]
y = y[y<50.0]

In [5]:
from playML.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, seed=666)

In [6]:
from sklearn.preprocessing import StandardScaler
standardScaler = StandardScaler()

standardScaler.fit(X_train)
X_train_standard = standardScaler.transform(X_train)
X_test_standard = standardScaler.transform(X_test)

In [13]:
from playML.LinearRegression import LinearRegression

lin_reg = LinearRegression()
%time lin_reg.fit_sgd(X_train_standard, y_train, n_iters=2)
lin_reg.score(X_test_standard, y_test)

CPU times: user 10.7 ms, sys: 1.31 ms, total: 12 ms
Wall time: 12.5 ms


0.7538905076039513

In [17]:
from playML.LinearRegression import LinearRegression

lin_reg = LinearRegression()
%time lin_reg.fit_sgd(X_train_standard, y_train, n_iters=50)
lin_reg.score(X_test_standard, y_test)

CPU times: user 141 ms, sys: 15.1 ms, total: 156 ms
Wall time: 145 ms


0.8106749214200866

In [22]:
from playML.LinearRegression import LinearRegression

lin_reg = LinearRegression()
%time lin_reg.fit_sgd(X_train_standard, y_train, n_iters=100)
lin_reg.score(X_test_standard, y_test)

CPU times: user 245 ms, sys: 7.8 ms, total: 253 ms
Wall time: 252 ms


0.813282692448816

<br><br>

## scikit-learn中的SGD

做了很多优化，实现也更复杂

参数`n_iter`：浏览样本多少遍，默认值是5

开源工程：[linear_model/stochastic_gradient.py](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/stochastic_gradient.py)

In [24]:
# 只能解决线性模型
from sklearn.linear_model import SGDRegressor

In [29]:
sgd_reg = SGDRegressor()
%time sgd_reg.fit(X_train_standard, y_train)
sgd_reg.score(X_test_standard, y_test)

CPU times: user 3.04 ms, sys: 4.13 ms, total: 7.16 ms
Wall time: 4.69 ms




0.805097826144864

In [25]:
sgd_reg = SGDRegressor(n_iter=100)
%time sgd_reg.fit(X_train_standard, y_train)
sgd_reg.score(X_test_standard, y_test)

CPU times: user 7.98 ms, sys: 3.44 ms, total: 11.4 ms
Wall time: 44.7 ms




0.812996392314848