## [作業重點]
使用 Sklearn 中的 Lasso, Ridge 模型，來訓練各種資料集，務必了解送進去模型訓練的**資料型態**為何，也請了解模型中各項參數的意義。

機器學習的模型非常多種，但要訓練的資料多半有固定的格式，確保你了解訓練資料的格式為何，這樣在應用新模型時，就能夠最快的上手開始訓練！

## 練習時間
試著使用 sklearn datasets 的其他資料集 (boston, ...)，來訓練自己的線性迴歸模型，並加上適當的正則話來觀察訓練情形。

In [4]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

In [5]:
# 讀取糖尿病資料集
boston = datasets.load_boston()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=4)

# 建立一個線性回歸模型
regr = linear_model.LinearRegression()

# 將訓練資料丟進去模型訓練
regr.fit(x_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred = regr.predict(x_test)

In [6]:
y_pred

array([ 12.07495986,  26.9894969 ,  17.58803353,  18.15584511,
        36.92091659,  25.43267386,  31.09256932,  19.72549907,
        19.66103377,  22.96358632,  28.38841214,  28.48925986,
        18.99690357,  32.41097504,  21.52350275,  15.25945122,
        21.23364112,  11.6220597 ,  11.37109662,  13.63515584,
         5.62431971,  17.35323315,  20.80951594,  22.51311312,
        16.39055556,  20.32352451,  17.88994185,  14.23445109,
        21.1187098 ,  17.50765806,  14.54295525,  23.63289896,
        34.32419647,  22.23027161,  16.82396516,  20.16274383,
        30.67665825,  35.61882904,  23.50372003,  24.66451121,
        36.91269871,  32.33290254,  19.11785719,  32.19546605,
        33.42795148,  25.52705821,  40.63477427,  18.21762788,
        19.34587461,  23.80167377,  33.42122982,  26.1451108 ,
        18.10363121,  28.19906437,  13.37486655,  23.34019279,
        24.44952678,  33.54973856,  16.71263275,  36.56402224,
        15.69684554,  18.55447039,  32.14543203,  15.49

In [7]:
print(regr.coef_)

[ -1.15966452e-01   4.71249231e-02   8.25980146e-03   3.23404531e+00
  -1.66865890e+01   3.88410651e+00  -1.08974442e-02  -1.54129540e+00
   2.93208309e-01  -1.34059383e-02  -9.06296429e-01   8.80823439e-03
  -4.57723846e-01]


In [8]:
# 預測值與實際值的差距，使用 MSE
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

Mean squared error: 25.42


In [9]:
# 建立一個線性回歸模型
lasso = linear_model.Lasso(alpha=0.1)

# 將訓練資料丟進去模型訓練
lasso.fit(x_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred = lasso.predict(x_test)

In [10]:
y_pred

array([ 11.35769564,  26.63065774,  17.07212795,  14.88066872,
        36.41257162,  24.9585628 ,  31.94678858,  18.71968836,
        18.03333259,  24.31205853,  29.37517697,  28.21096667,
        19.31608322,  29.77081668,  22.02956911,  15.80101535,
        21.40010518,  11.55888929,  10.03696639,  14.21676695,
         5.93013576,  20.67875375,  20.28901268,  22.045776  ,
        16.91359062,  20.01181348,  14.60702376,  14.47106462,
        19.94873173,  16.80168678,  14.47686108,  23.94606474,
        35.12159987,  22.18768349,  17.36984591,  19.82812892,
        30.64690326,  35.83418403,  24.01776652,  24.25497155,
        36.65259504,  31.76859732,  19.93445419,  31.94878121,
        30.55626307,  24.85315173,  40.25718892,  17.35967841,
        20.58594129,  23.65915748,  33.33055041,  25.46122166,
        18.25223929,  27.45084254,  13.61083007,  22.98211928,
        24.36098849,  33.24773708,  17.77844029,  34.20142858,
        16.18855141,  20.46046193,  31.34454514,  14.83

In [11]:
print(lasso.coef_)

[-0.10618872  0.04886351 -0.04536655  1.14953069 -0.          3.82353877
 -0.02089779 -1.23590613  0.26008876 -0.01517094 -0.74673362  0.00963864
 -0.49877104]


In [12]:
# 預測值與實際值的差距，使用 MSE
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

Mean squared error: 26.45


In [22]:
ridge = linear_model.Ridge(alpha=0.1)

# 將訓練資料丟進去模型訓練
ridge.fit(x_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred = regr.predict(x_test)

In [23]:
y_pred

array([ 12.07495986,  26.9894969 ,  17.58803353,  18.15584511,
        36.92091659,  25.43267386,  31.09256932,  19.72549907,
        19.66103377,  22.96358632,  28.38841214,  28.48925986,
        18.99690357,  32.41097504,  21.52350275,  15.25945122,
        21.23364112,  11.6220597 ,  11.37109662,  13.63515584,
         5.62431971,  17.35323315,  20.80951594,  22.51311312,
        16.39055556,  20.32352451,  17.88994185,  14.23445109,
        21.1187098 ,  17.50765806,  14.54295525,  23.63289896,
        34.32419647,  22.23027161,  16.82396516,  20.16274383,
        30.67665825,  35.61882904,  23.50372003,  24.66451121,
        36.91269871,  32.33290254,  19.11785719,  32.19546605,
        33.42795148,  25.52705821,  40.63477427,  18.21762788,
        19.34587461,  23.80167377,  33.42122982,  26.1451108 ,
        18.10363121,  28.19906437,  13.37486655,  23.34019279,
        24.44952678,  33.54973856,  16.71263275,  36.56402224,
        15.69684554,  18.55447039,  32.14543203,  15.49

In [24]:
print(ridge.coef_)

[ -1.15381303e-01   4.72528249e-02   2.87371589e-03   3.19642306e+00
  -1.54713824e+01   3.89388927e+00  -1.19943742e-02  -1.52347878e+00
   2.90133016e-01  -1.34816989e-02  -8.93679905e-01   8.86599187e-03
  -4.58983115e-01]


In [25]:
# 預測值與實際值的差距，使用 MSE
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

Mean squared error: 25.42
