## [作業重點]
使用 Sklearn 中的 Lasso, Ridge 模型，來訓練各種資料集，務必了解送進去模型訓練的**資料型態**為何，也請了解模型中各項參數的意義。

機器學習的模型非常多種，但要訓練的資料多半有固定的格式，確保你了解訓練資料的格式為何，這樣在應用新模型時，就能夠最快的上手開始訓練！

## 練習時間
試著使用 sklearn datasets 的其他資料集 (boston, ...)，來訓練自己的線性迴歸模型，並加上適當的正則化來觀察訓練情形。

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

In [2]:
# 讀取boston資料集
boston = datasets.load_boston()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=4)
print(np.shape(x_train))
print(np.shape(x_test))

# 建立一個線性回歸模型
regr = linear_model.LinearRegression()

# 將訓練資料丟進去模型訓練
regr.fit(x_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred = regr.predict(x_test)

(404, 13)
(102, 13)


In [3]:
y_pred

array([12.07495986, 26.9894969 , 17.58803353, 18.15584511, 36.92091659,
       25.43267386, 31.09256932, 19.72549907, 19.66103377, 22.96358632,
       28.38841214, 28.48925986, 18.99690357, 32.41097504, 21.52350275,
       15.25945122, 21.23364112, 11.6220597 , 11.37109662, 13.63515584,
        5.62431971, 17.35323315, 20.80951594, 22.51311312, 16.39055556,
       20.32352451, 17.88994185, 14.23445109, 21.1187098 , 17.50765806,
       14.54295525, 23.63289896, 34.32419647, 22.23027161, 16.82396516,
       20.16274383, 30.67665825, 35.61882904, 23.50372003, 24.66451121,
       36.91269871, 32.33290254, 19.11785719, 32.19546605, 33.42795148,
       25.52705821, 40.63477427, 18.21762788, 19.34587461, 23.80167377,
       33.42122982, 26.1451108 , 18.10363121, 28.19906437, 13.37486655,
       23.34019279, 24.44952678, 33.54973856, 16.71263275, 36.56402224,
       15.69684554, 18.55447039, 32.14543203, 15.49568061, 39.02363234,
       27.38174402, 31.96333419, 10.09436162, 19.13214621, 21.73

In [4]:
boston.target

array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
       18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17.5, 20.2, 18.2, 13.6, 19.6,
       15.2, 14.5, 15.6, 13.9, 16.6, 14.8, 18.4, 21. , 12.7, 14.5, 13.2,
       13.1, 13.5, 18.9, 20. , 21. , 24.7, 30.8, 34.9, 26.6, 25.3, 24.7,
       21.2, 19.3, 20. , 16.6, 14.4, 19.4, 19.7, 20.5, 25. , 23.4, 18.9,
       35.4, 24.7, 31.6, 23.3, 19.6, 18.7, 16. , 22.2, 25. , 33. , 23.5,
       19.4, 22. , 17.4, 20.9, 24.2, 21.7, 22.8, 23.4, 24.1, 21.4, 20. ,
       20.8, 21.2, 20.3, 28. , 23.9, 24.8, 22.9, 23.9, 26.6, 22.5, 22.2,
       23.6, 28.7, 22.6, 22. , 22.9, 25. , 20.6, 28.4, 21.4, 38.7, 43.8,
       33.2, 27.5, 26.5, 18.6, 19.3, 20.1, 19.5, 19.5, 20.4, 19.8, 19.4,
       21.7, 22.8, 18.8, 18.7, 18.5, 18.3, 21.2, 19.2, 20.4, 19.3, 22. ,
       20.3, 20.5, 17.3, 18.8, 21.4, 15.7, 16.2, 18. , 14.3, 19.2, 19.6,
       23. , 18.4, 15.6, 18.1, 17.4, 17.1, 13.3, 17.8, 14. , 14.4, 13.4,
       15.6, 11.8, 13.8, 15.6, 14.6, 17.8, 15.4, 21

In [3]:
print(regr.coef_)

[-1.15966452e-01  4.71249231e-02  8.25980146e-03  3.23404531e+00
 -1.66865890e+01  3.88410651e+00 -1.08974442e-02 -1.54129540e+00
  2.93208309e-01 -1.34059383e-02 -9.06296429e-01  8.80823439e-03
 -4.57723846e-01]


In [4]:
# 預測值與實際值的差距，使用 MSE
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

Mean squared error: 25.42


### LASSO

In [5]:
# 讀取boston資料集
boston = datasets.load_boston()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=4)
print(np.shape(x_train))
print(np.shape(x_test))

# 建立一個線性回歸模型
lasso = linear_model.Lasso(alpha=1.0)

# 將訓練資料丟進去模型訓練
lasso.fit(x_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred = lasso.predict(x_test)

(404, 13)
(102, 13)


In [6]:
# 印出各特徵對應的係數
lasso.coef_

array([-0.06494981,  0.04581458, -0.        ,  0.        , -0.        ,
        1.18140024,  0.01109101, -0.73695809,  0.23350042, -0.01551065,
       -0.69270805,  0.00763157, -0.6927848 ])

In [7]:
# 預測值與實際值的差距，使用 MSE
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

Mean squared error: 28.95


### LASSO 2

In [8]:
lasso2 = linear_model.Lasso(alpha=2.0)
lasso2.fit(x_train, y_train)
y_pred = lasso2.predict(x_test)
print(lasso2.coef_)
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

[-0.0181519   0.03043393 -0.          0.         -0.          0.
  0.03717309 -0.12778153  0.1407538  -0.01207991 -0.54243977  0.00603438
 -0.77311473]
Mean squared error: 34.09


### LASSO 5

In [9]:
lasso2 = linear_model.Lasso(alpha=5.0)
lasso2.fit(x_train, y_train)
y_pred = lasso2.predict(x_test)
print(lasso2.coef_)
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

[-0.          0.03027739 -0.          0.          0.          0.
  0.02872114 -0.          0.         -0.00951653 -0.          0.00494745
 -0.70024171]
Mean squared error: 41.58


### Ridge

In [10]:
ridge = linear_model.Ridge(alpha=1.0)
ridge.fit(x_train, y_train)
y_pred = ridge.predict(x_test)
print(ridge.coef_)
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

[-1.12499445e-01  4.79562332e-02 -2.40438147e-02  2.96199458e+00
 -9.33966118e+00  3.93079015e+00 -1.73821202e-02 -1.43347691e+00
  2.75239392e-01 -1.38920708e-02 -8.31116943e-01  9.15637729e-03
 -4.66460539e-01]
Mean squared error: 25.74


### Ridge2

In [11]:
ridge2 = linear_model.Ridge(alpha=2.0)
ridge2.fit(x_train, y_train)
y_pred = ridge2.predict(x_test)
print(ridge.coef_)
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

[-1.12499445e-01  4.79562332e-02 -2.40438147e-02  2.96199458e+00
 -9.33966118e+00  3.93079015e+00 -1.73821202e-02 -1.43347691e+00
  2.75239392e-01 -1.38920708e-02 -8.31116943e-01  9.15637729e-03
 -4.66460539e-01]
Mean squared error: 25.93


### Ridge5

In [12]:
ridge5 = linear_model.Ridge(alpha=5.0)
ridge5.fit(x_train, y_train)
y_pred = ridge5.predict(x_test)
print(ridge.coef_)
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

[-1.12499445e-01  4.79562332e-02 -2.40438147e-02  2.96199458e+00
 -9.33966118e+00  3.93079015e+00 -1.73821202e-02 -1.43347691e+00
  2.75239392e-01 -1.38920708e-02 -8.31116943e-01  9.15637729e-03
 -4.66460539e-01]
Mean squared error: 26.15
