## [作業重點]
使用 Sklearn 中的 Lasso, Ridge 模型，來訓練各種資料集，務必了解送進去模型訓練的**資料型態**為何，也請了解模型中各項參數的意義。

機器學習的模型非常多種，但要訓練的資料多半有固定的格式，確保你了解訓練資料的格式為何，這樣在應用新模型時，就能夠最快的上手開始訓練！

## 練習時間
試著使用 sklearn datasets 的其他資料集 (boston, ...)，來訓練自己的線性迴歸模型，並加上適當的正則化來觀察訓練情形。

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

In [2]:
# 讀取 Boston 資料集
boston = datasets.load_boston()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.1, random_state=4)

# 建立一個線性回歸模型
regr = linear_model.LinearRegression()

# 將訓練資料丟進去模型訓練
regr.fit(x_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred = regr.predict(x_test)

In [3]:
y_pred

array([11.46030778, 26.80269335, 17.43478925, 17.5563101 , 37.39156424,
       25.07675556, 31.05825852, 20.30845531, 19.66757374, 22.82655375,
       28.47083056, 28.53331605, 18.72883256, 33.11375161, 21.34282974,
       15.20554693, 21.57309275, 10.92841589, 11.69603405, 13.54311508,
        5.07126801, 17.40464043, 20.69379268, 22.72981238, 16.4634139 ,
       20.42666271, 17.53377349, 14.22644356, 21.56292745, 17.33136115,
       14.28888479, 23.92829804, 34.31523522, 22.03799035, 17.47895779,
       20.20386005, 30.70896335, 35.21599528, 24.07063567, 24.51445184,
       36.77425366, 33.15265201, 19.67545976, 31.93505104, 33.55222906,
       25.59147737, 40.59239607, 17.99555017, 19.92780188, 23.65319423,
       33.48950986])

In [4]:
print(regr.coef_)

[-1.25856659e-01  4.84257396e-02  1.84085281e-02  3.08509569e+00
 -1.73277018e+01  3.61674713e+00  2.19181853e-03 -1.49361132e+00
  3.19979200e-01 -1.27294649e-02 -9.27469086e-01  9.50912468e-03
 -5.33592471e-01]


In [5]:
# 預測值與實際值的差距，使用 MSE
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

Mean squared error: 17.04


In [6]:
# 建立一個線性回歸模型
lasso = linear_model.Lasso(alpha=1.0)

# 將訓練資料丟進去模型訓練
lasso.fit(x_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred2 = lasso.predict(x_test)

In [7]:
y_pred2

array([10.98845397, 26.26733735, 18.84027042, 14.03396535, 33.9161886 ,
       24.21323798, 31.36554042, 18.9196332 , 15.69351869, 23.38789944,
       29.09906859, 28.1805469 , 21.22869297, 30.95079178, 22.09159482,
       14.50252492, 23.73770771,  8.12869234, 13.23494213, 16.37186703,
        8.45343784, 22.13333462, 21.00862599, 21.97599589, 19.37901832,
       20.3699175 , 14.80291144, 15.53809506, 20.14113844, 16.56301773,
       12.81548494, 27.07977642, 32.05780762, 22.78035352, 18.77018175,
       16.7032285 , 29.23804465, 31.21772803, 26.72122522, 24.94116917,
       33.73331724, 32.4775655 , 20.13647962, 30.71577076, 28.2870268 ,
       26.4729775 , 36.89948701, 17.97236918, 20.82882567, 23.50832122,
       32.80209832])

In [8]:
# 印出各特徵對應的係數，可以看到許多係數都變成 0，Lasso Regression 的確可以做特徵選取
lasso.coef_

array([-0.07256057,  0.04967103, -0.        ,  0.        , -0.        ,
        0.80886056,  0.02328171, -0.68444051,  0.26862528, -0.01526566,
       -0.71692899,  0.00828412, -0.77123108])

In [9]:
# 預測值與實際值的差距，使用 MSE
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred2))

Mean squared error: 23.24


加入 $\alpha = 1.0$ 之 Lasso 之後 MSE 反而變大

In [10]:
# 建立一個線性回歸模型
lasso2 = linear_model.Lasso(alpha=0.5)

# 將訓練資料丟進去模型訓練
lasso2.fit(x_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred3 = lasso2.predict(x_test)

print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred3))

Mean squared error: 19.66


加入  $\alpha = 0.5$ 之 Lasso 之後 MSE 仍然變大, 但是比 $\alpha = 0.1$ 時小

一次測試多種不同的 $\alpha$ 值

In [11]:
MSEs = []
coefs = []
for i in range(1, 21):
    # 建立一個線性回歸模型
    l = linear_model.Lasso(alpha=0.1 * i)
    # 將訓練資料丟進去模型訓練
    l.fit(x_train, y_train)
    # 將測試資料丟進模型得到預測結果
    y_pred4 = l.predict(x_test)
    MSE = mean_squared_error(y_test, y_pred4)
    MSEs.append(MSE)
    coefs.append(l.coef_)
    print(f"Mean squared error for alpha =  {0.1 * i:.1f}: {MSE:.2f}")

Mean squared error for alpha =  0.1: 18.19
Mean squared error for alpha =  0.2: 18.42
Mean squared error for alpha =  0.3: 18.77
Mean squared error for alpha =  0.4: 19.17
Mean squared error for alpha =  0.5: 19.66
Mean squared error for alpha =  0.6: 20.22
Mean squared error for alpha =  0.7: 20.85
Mean squared error for alpha =  0.8: 21.57
Mean squared error for alpha =  0.9: 22.37
Mean squared error for alpha =  1.0: 23.24
Mean squared error for alpha =  1.1: 24.19
Mean squared error for alpha =  1.2: 25.22
Mean squared error for alpha =  1.3: 26.01
Mean squared error for alpha =  1.4: 26.29
Mean squared error for alpha =  1.5: 26.60
Mean squared error for alpha =  1.6: 26.94
Mean squared error for alpha =  1.7: 27.29
Mean squared error for alpha =  1.8: 27.68
Mean squared error for alpha =  1.9: 28.08
Mean squared error for alpha =  2.0: 28.51


In [12]:
min_ = min(MSEs)
min_index = MSEs.index(min_)
print(f"when alpha = {0.1 * (min_index + 1):.1f}, there is the minimal value of MSE, {min_:.2f}.")

when alpha = 0.1, there is the minimal value of MSE, 18.19.


可以看出以 0.1 為單位遞增, MSE 之最小值大約在 $\alpha = 0.1$ 時(但是還是比沒使用 lasso 大)

其對應的係數

In [13]:
print(coefs[min_index])

[-0.11567831  0.05152311 -0.03346275  1.2230427  -0.          3.53216363
 -0.00922692 -1.19460642  0.28775344 -0.01473748 -0.75732817  0.01037228
 -0.58007751]


- 比較 (未使用lasso 之係數 - 使用 lasso 之係數) / (未使用lasso 之係數)

接近 0: 經過 lasso 後該項係數幾乎沒改變

接近 1: 經過 lasso 後該項係數幾乎被消除

其他情形: 變成別的值

In [14]:
for fld, val in zip(boston.feature_names, (regr.coef_ - coefs[min_index])/regr.coef_):
    print(f"{fld:10}: {val:+.5f}")

CRIM      : +0.08087
ZN        : -0.06396
INDUS     : +2.81779
CHAS      : +0.60356
NOX       : +1.00000
RM        : +0.02339
AGE       : +5.20971
DIS       : +0.20019
RAD       : +0.10071
TAX       : -0.15775
PTRATIO   : +0.18345
B         : -0.09077
LSTAT     : -0.08712


- 改變最少的是 RM
- 係數變 0 的有 NOX

改用 Ridge

In [15]:
MSE2s = []
coef2s = []
for i in range(1, 21):
    # 建立一個線性回歸模型
    r = linear_model.Ridge(alpha=0.1 * i)
    # 將訓練資料丟進去模型訓練
    r.fit(x_train, y_train)
    # 將測試資料丟進模型得到預測結果
    y_pred5 = r.predict(x_test)
    MSE = mean_squared_error(y_test, y_pred5)
    MSE2s.append(MSE)
    coef2s.append(r.coef_)
    #print(f"Mean squared error for alpha =  {0.1 * i:.1f}: {MSE:.2f}")

min2_ = min(MSE2s)
min_index2 = MSE2s.index(min2_)
print(f"when alpha = {0.1 * (min_index + 1):.1f}, there is the minimal value of MSE, {min2_:.2f}.")

when alpha = 0.1, there is the minimal value of MSE, 17.07.


In [16]:
print(coefs[min_index])

[-0.11567831  0.05152311 -0.03346275  1.2230427  -0.          3.53216363
 -0.00922692 -1.19460642  0.28775344 -0.01473748 -0.75732817  0.01037228
 -0.58007751]


In [17]:
for fld, val in zip(boston.feature_names, (regr.coef_ - coef2s[min_index])/regr.coef_):
    print(f"{fld:10}: {val:+.5f}")

CRIM      : +0.00438
ZN        : -0.00353
INDUS     : +0.26398
CHAS      : +0.00825
NOX       : +0.06763
RM        : -0.00269
AGE       : +0.49444
DIS       : +0.01117
RAD       : +0.00917
TAX       : -0.00592
PTRATIO   : +0.01321
B         : -0.00608
LSTAT     : -0.00277


沒有一項是變成 0
係數改變最少的仍然是 RM

RM 之係數

In [18]:
regr.coef_[boston.feature_names == "RM"]

array([3.61674713])