## [作業重點]
使用 Sklearn 中的 Lasso, Ridge 模型，來訓練各種資料集，務必了解送進去模型訓練的**資料型態**為何，也請了解模型中各項參數的意義。

機器學習的模型非常多種，但要訓練的資料多半有固定的格式，確保你了解訓練資料的格式為何，這樣在應用新模型時，就能夠最快的上手開始訓練！

## 練習時間
試著使用 sklearn datasets 的其他資料集 (boston, ...)，來訓練自己的線性迴歸模型，並加上適當的正則話來觀察訓練情形。

In [3]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score
wine = datasets.load_wine()##多分類
boston = datasets.load_boston()##回歸
breast_cancer = datasets.load_breast_cancer()##二分類

In [4]:
boston

{'data': array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
         4.9800e+00],
        [2.7310e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9690e+02,
         9.1400e+00],
        [2.7290e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9283e+02,
         4.0300e+00],
        ...,
        [6.0760e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
         5.6400e+00],
        [1.0959e-01, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9345e+02,
         6.4800e+00],
        [4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
         7.8800e+00]]),
 'target': array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
        18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17.5, 20.2, 18.2, 13.6, 19.6,
        15.2, 14.5, 15.6, 13.9, 16.6, 14.8, 18.4, 21. , 12.7, 14.5, 13.2,
        13.1, 13.5, 18.9, 20. , 21. , 24.7, 30.8, 34.9, 26.6, 25.3, 24.7,
        21.2, 19.3, 20. , 16.6, 14.4, 19.4, 19.7, 20.5, 25. , 23.4, 18.9,
        35.4, 24.7, 3

In [35]:
#輸入特徵資料
X = boston.data
#輸出實際值
Y = boston.target
#分割資料
x_train,x_test,y_train,y_test = train_test_split(X,Y,test_size = 0.1,random_state = 1)

In [13]:
#使用線性回歸模型
regr = linear_model.LinearRegression()
#訓練模型
regr.fit(x_train,y_train)
#預測測試資料
y_pred = regr.predict(x_test)

In [9]:
y_pred

array([33.04785477, 27.94755558, 17.94404892, 21.23007949, 18.38639243,
       19.85798354, 32.51067614, 18.05728098, 24.76894794, 27.19236393,
       27.05894522, 28.67961485, 20.99869523, 26.25235172, 23.32788903,
       20.29182697, 17.78543326, 39.2352323 , 29.88469502,  8.39894175,
       20.55439258, 15.54745482, 25.2799218 , 25.02514543, 30.92984888,
       10.29052289, 13.86888688, 16.29300327, 36.84734437, 14.51958637,
       20.4684512 , 13.51942531, 43.49684586, 18.06319021, 21.54262781,
       20.82480875, 17.68911611, 27.55526371,  8.7590835 , 19.73132303,
       24.23351532, 21.31236948, 29.69206642, 16.41225643, 19.34869984,
       14.70998095, 39.69433382, 18.05760645, 25.17471583, 20.09718192,
       25.51195207])

In [10]:
#顯示線性回歸係數
print('Coefficients:',regr.coef_)

Coefficients: [-1.16259926e-01  5.56061815e-02  2.42066570e-03  2.58498034e+00
 -1.91474547e+01  3.54362973e+00 -5.86296897e-04 -1.59321860e+00
  3.16227116e-01 -1.20965602e-02 -9.20798781e-01  8.75217941e-03
 -5.18191990e-01]


In [16]:
#顯示目標函數值，其中只有cost funcyion為MSE
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

Mean squared error: 20.54


In [48]:
#使用LASSO L2
lasso = linear_model.Lasso(alpha = 1)
lasso.fit(x_train,y_train)
y_pred2 = lasso.predict(x_test)

In [49]:
y_pred2

array([31.36363205, 28.47780965, 17.86777604, 22.79473517, 23.86890003,
       21.26004486, 32.0652693 , 20.426788  , 21.57207616, 26.53895959,
       25.25482539, 29.41215899, 21.26211571, 27.1887026 , 24.26066346,
       22.24552207, 14.32331167, 32.77540922, 30.34744329, 12.23685361,
       23.18262435, 20.73843867, 25.8158912 , 26.16600219, 31.47990487,
        9.77981989, 14.13981376, 19.34296279, 34.81629662, 15.94418681,
       24.40268607, 15.96211358, 38.75070259, 19.24210613, 22.09669796,
       20.94130639, 18.07608209, 27.6402915 , 12.58106073, 19.30476983,
       23.80206187, 24.82985538, 28.99845556, 12.97963107, 17.92769958,
       11.96437244, 34.58420853, 19.17032588, 26.02249112, 20.14036964,
       22.81827806])

In [50]:
print('Coefficients:',lasso.coef_)

Coefficients: [-0.07225543  0.05742599 -0.01352393  0.         -0.          0.6272048
  0.01816239 -0.73447453  0.26600335 -0.01483617 -0.69300437  0.00783093
 -0.75703835]


In [51]:
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred2))

Mean squared error: 28.90


In [52]:
#使用Ridge L1
ridge = linear_model.Ridge(alpha =1)
ridge.fit(x_train,y_train)
y_pred3 = ridge.predict(x_test)

In [53]:
y_pred3

array([32.86464752, 28.10626971, 17.45623416, 21.11800656, 19.17758305,
       20.01572019, 32.4810883 , 18.22065581, 24.11031756, 27.08470725,
       26.33546377, 29.15924236, 20.84767308, 25.88275506, 23.18368878,
       20.07340457, 17.68891725, 38.97468525, 29.56218779,  9.75189755,
       20.62096963, 16.94764449, 25.29704826, 25.30703878, 30.7313962 ,
       10.14098907, 13.89534664, 16.46314981, 37.32513346, 14.60960033,
       20.90886234, 13.27271741, 43.24404331, 18.63138664, 21.54595121,
       21.26523253, 18.19725133, 27.83068912,  8.40249912, 19.31127889,
       24.29509558, 22.13675144, 29.3963501 , 15.60335833, 18.88017926,
       15.05527361, 39.16924227, 18.21210661, 24.69044085, 19.88566721,
       25.37055541])

In [54]:
print('Coefficients:',ridge.coef_)

Coefficients: [-1.12540170e-01  5.64127893e-02 -3.14006541e-02  2.41014421e+00
 -1.11071049e+01  3.59024094e+00 -7.84448926e-03 -1.47495891e+00
  2.96065569e-01 -1.26953934e-02 -8.35465703e-01  9.10491350e-03
 -5.27383041e-01]


In [56]:
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred3))

Mean squared error: 20.14
