## [作業重點]
使用 Sklearn 中的 Lasso, Ridge 模型，來訓練各種資料集，務必了解送進去模型訓練的**資料型態**為何，也請了解模型中各項參數的意義。

機器學習的模型非常多種，但要訓練的資料多半有固定的格式，確保你了解訓練資料的格式為何，這樣在應用新模型時，就能夠最快的上手開始訓練！

## 練習時間
試著使用 sklearn datasets 的其他資料集 (boston, ...)，來訓練自己的線性迴歸模型，並加上適當的正則化來觀察訓練情形。

## Boston

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

### Linear Regression

In [2]:
# 讀取糖尿病資料集
boston = datasets.load_boston()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=4)

# 建立一個線性回歸模型
regr = linear_model.LinearRegression()

# 將訓練資料丟進去模型訓練
regr.fit(x_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred = regr.predict(x_test)

In [3]:
print(regr.coef_)

[-1.15966452e-01  4.71249231e-02  8.25980146e-03  3.23404531e+00
 -1.66865890e+01  3.88410651e+00 -1.08974442e-02 -1.54129540e+00
  2.93208309e-01 -1.34059383e-02 -9.06296429e-01  8.80823439e-03
 -4.57723846e-01]


In [4]:
# 預測值與實際值的差距，使用 MSE
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

Mean squared error: 25.42


### LASSO

In [5]:
alpha = [0.1, 0.3, 0.5, 0.7, 1.0, 2.0]

In [6]:
# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=4)

# 建立一個線性回歸模型

for aph in alpha:
    print("alpha為：", aph)
    lasso = linear_model.Lasso(alpha = aph)

    # 將訓練資料丟進去模型訓練
    lasso.fit(x_train, y_train)

    # 將測試資料丟進模型得到預測結果
    y_pred = lasso.predict(x_test)

    # 印出各特徵對應的係數，可以看到許多係數都變成 0，Lasso Regression 的確可以做特徵選取
    print(lasso.coef_ )

    # 預測值與實際值的差距，使用 MSE

    print("Mean squared error: %.2f"
          % mean_squared_error(y_test, y_pred), "\n")

alpha為： 0.1
[-0.10618872  0.04886351 -0.04536655  1.14953069 -0.          3.82353877
 -0.02089779 -1.23590613  0.26008876 -0.01517094 -0.74673362  0.00963864
 -0.49877104]
Mean squared error: 26.45 

alpha為： 0.3
[-0.09855422  0.04870073 -0.02312395  0.         -0.          3.26381653
 -0.01153515 -1.11060846  0.26323033 -0.01580522 -0.74800739  0.00933465
 -0.54713872]
Mean squared error: 26.65 

alpha為： 0.5
[-0.08860117  0.04829133 -0.01107435  0.         -0.          2.66101769
 -0.00307949 -0.98440282  0.25664031 -0.01593271 -0.73252329  0.00884426
 -0.59210164]
Mean squared error: 26.94 

alpha為： 0.7
[-7.91297148e-02  4.72549632e-02 -0.00000000e+00  0.00000000e+00
 -0.00000000e+00  2.08599802e+00  2.49156109e-04 -8.94686247e-01
  2.48898278e-01 -1.59368404e-02 -7.17088164e-01  8.39764158e-03
 -6.29632147e-01]
Mean squared error: 27.59 

alpha為： 1.0
[-0.06494981  0.04581458 -0.          0.         -0.          1.18140024
  0.01109101 -0.73695809  0.23350042 -0.01551065 -0.69270805  

### Ridge

In [7]:
# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=4)

for aph in alpha:
    # 建立一個線性回歸模型
    print("alpha為：", aph)
    ridge = linear_model.Ridge(alpha = aph)

    # 將訓練資料丟進去模型訓練
    ridge.fit(x_train, y_train)

    # 將測試資料丟進模型得到預測結果
    y_pred = regr.predict(x_test)

    # 印出 Ridge 的參數，可以很明顯看到比起 Linear Regression，參數的數值都明顯小了許多
    print(ridge.coef_)

    # 預測值與實際值的差距，使用 MSE

    print("Mean squared error: %.2f"
          % mean_squared_error(y_test, y_pred), "\n")

alpha為： 0.1
[-1.15381303e-01  4.72528249e-02  2.87371589e-03  3.19642306e+00
 -1.54713824e+01  3.89388927e+00 -1.19943742e-02 -1.52347878e+00
  2.90133016e-01 -1.34816989e-02 -8.93679905e-01  8.86599187e-03
 -4.58983115e-01]
Mean squared error: 25.42 

alpha為： 0.3
[-1.14440833e-01  4.74655954e-02 -5.82205210e-03  3.13107153e+00
 -1.35035586e+01  3.90852838e+00 -1.37563477e-02 -1.49461840e+00
  2.85214450e-01 -1.36071503e-02 -8.73357569e-01  8.95943721e-03
 -4.61131388e-01]
Mean squared error: 25.42 

alpha為： 0.5
[-1.13720313e-01  4.76370805e-02 -1.25294762e-02  3.07531514e+00
 -1.19789169e+01  3.91845004e+00 -1.51046851e-02 -1.47224633e+00
  2.81475089e-01 -1.37075697e-02 -8.57737917e-01  9.03172343e-03
 -4.62924119e-01]
Mean squared error: 25.42 

alpha為： 0.7
[-1.13152721e-01  4.77797521e-02 -1.78531666e-02  3.02636342e+00
 -1.07629157e+01  3.92508896e+00 -1.61650971e-02 -1.45439235e+00
  2.78555770e-01 -1.37904976e-02 -8.45391201e-01  9.08925806e-03
 -4.64468379e-01]
Mean squared err