# Ridge回归

Ridge回归为线性回归的L2正则化，它和一般线性回归的区别是在损失函数上增加了一个L2正则化的项

Ridge回归损失函数为：

$$
    J(\theta) = \frac{1}{2}{(X\theta-Y)^T}(X\theta-Y) + \frac{1}{2}\alpha{\parallel \alpha \parallel_2^2}
$$

其中$\alpha$为常数系数， 需要进行调优。$\parallel \alpha \parallel_2$为L2范数。

算法需要找到一个合适的超参数$\alpha$，在该$\alpha$值下，求出使得$J(\theta)$最小的$\theta$。一般可采用梯度下降法和最小二乘法求解。scikit-learn中用的是最小二乘法。

In [1]:
import numpy as np
import pandas as pd
from sklearn import datasets, linear_model

import matplotlib.pyplot as plt
plt.style.use("ggplot")
plt.rcParams['axes.unicode_minus'] = False
plt.rcParams['font.sans-serif'] = ['SimHei']
%matplotlib inline
%config InlineBackend.figure_format = 'svg'

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [2]:
data = pd.read_excel('Folds5x2_pp.xlsx', sheet_name=0)

In [3]:
data.info()
data.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9568 entries, 0 to 9567
Data columns (total 5 columns):
AT    9568 non-null float64
V     9568 non-null float64
AP    9568 non-null float64
RH    9568 non-null float64
PE    9568 non-null float64
dtypes: float64(5)
memory usage: 373.8 KB


Unnamed: 0,AT,V,AP,RH,PE
0,14.96,41.76,1024.07,73.17,463.26
1,25.18,62.96,1020.04,59.08,444.37
2,5.11,39.4,1012.16,92.14,488.56
3,20.86,57.32,1010.24,76.64,446.48
4,10.82,37.5,1009.23,96.62,473.9


In [4]:
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data.drop('PE', axis=1), data['PE'], test_size=0.3, random_state=0)

In [5]:
from sklearn.linear_model import Ridge
ridge = Ridge(alpha=1)
ridge.fit(X_train, y_train)

Ridge(alpha=1, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [6]:
ridge.intercept_
ridge.coef_

448.55315321721673

array([-1.96429059, -0.24011045,  0.06801806, -0.15644825])

这里只是假设$\alpha = 1$，实际上并不知道最优参数$\alpha$为多少，需要在多组自选$\alpha$中选择最优的。

scikit-learn中提供了一个交叉验证选择最优$\alpha$的API

In [7]:
from sklearn.linear_model import RidgeCV
ridgecv = RidgeCV(alphas=[0.01, 0.1, 0.5, 1, 3, 5, 7, 10, 20, 100])
ridgecv.fit(X_train, y_train)
ridgecv.alpha_

RidgeCV(alphas=[0.01, 0.1, 0.5, 1, 3, 5, 7, 10, 20, 100], cv=None,
    fit_intercept=True, gcv_mode=None, normalize=False, scoring=None,
    store_cv_values=False)

7.0

<font color=#CC0066><strong>通过Ridge回归的损失函数表达式可以看到，$\alpha$ 越大，那么正则项惩罚的就越厉害，得到回归系数 $\theta$ 就越小，最终趋近与0。而如果 $\alpha$ 越小，即正则化项越小，那么回归系数 $\theta$ 就越来越接近于普通的线性回归系数。</strong></font>