## [作業重點]
使用 Sklearn 中的 Lasso, Ridge 模型，來訓練各種資料集，務必了解送進去模型訓練的**資料型態**為何，也請了解模型中各項參數的意義。

機器學習的模型非常多種，但要訓練的資料多半有固定的格式，確保你了解訓練資料的格式為何，這樣在應用新模型時，就能夠最快的上手開始訓練！

## 練習時間
試著使用 sklearn datasets 的其他資料集 (boston, ...)，來訓練自己的線性迴歸模型，並加上適當的正則化來觀察訓練情形。

In [89]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# 用簡單線性模型做看看:

In [90]:
# 讀取資料集
boston = datasets.load_boston()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=4)

# 建立一個線性回歸模型
regr = linear_model.LinearRegression()

# 將訓練資料丟進去模型訓練
regr.fit(x_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred1 = regr.predict(x_test)

In [91]:
print(regr.coef_)

[-1.15966452e-01  4.71249231e-02  8.25980146e-03  3.23404531e+00
 -1.66865890e+01  3.88410651e+00 -1.08974442e-02 -1.54129540e+00
  2.93208309e-01 -1.34059383e-02 -9.06296429e-01  8.80823439e-03
 -4.57723846e-01]


In [92]:
# 預測值與實際值的差距，使用 MSE
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred1))

Mean squared error: 25.42




這是一個baseline，看看Redge和Lasso表現如何?



# 用Lasso做看看:

In [93]:
# 讀取資料集
boston = datasets.load_boston()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=4)

# 建立一個Lasso線性回歸模型
lasso = linear_model.Lasso(alpha=1.0)

# 將訓練資料丟進去模型訓練
lasso.fit(x_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred2 = lasso.predict(x_test)

In [94]:
lasso.coef_

array([-0.06494981,  0.04581458, -0.        ,  0.        , -0.        ,
        1.18140024,  0.01109101, -0.73695809,  0.23350042, -0.01551065,
       -0.69270805,  0.00763157, -0.6927848 ])

In [95]:
# 預測值與實際值的差距，使用 MSE
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred2))

Mean squared error: 28.95


In [97]:
# 讀取資料集
boston = datasets.load_boston()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=4)


seq = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
lasso_mse = []

for alpha in seq:
    lasso_test = linear_model.Lasso(alpha = alpha)
    lasso_test.fit(x_train, y_train)
    y_pred3 = lasso_test.predict(x_test)
    lasso_mse.append( mean_squared_error(y_test, y_pred3))
    
    #print("Mean squared error: %.2f"
    #  % mean_squared_error(y_test, y_pred3))
    
    #print("Using alpha:%.2f"
    #  %alpha)
    #print("============================")

  del sys.path[0]
  positive)
  positive)


許多係數都變成 0，Lasso Regression 的確可以做特徵選取，但是MSE變得比之前要大。

# 用Ridege做看看:

In [98]:
# 讀取資料集
boston = datasets.load_boston()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=4)

# 建立一個線性回歸模型
ridge = linear_model.Ridge(alpha=1.0)

# 將訓練資料丟進去模型訓練
ridge.fit(x_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred4 = ridge.predict(x_test)

In [99]:
print(ridge.coef_)

[-1.12499445e-01  4.79562332e-02 -2.40438147e-02  2.96199458e+00
 -9.33966118e+00  3.93079015e+00 -1.73821202e-02 -1.43347691e+00
  2.75239392e-01 -1.38920708e-02 -8.31116943e-01  9.15637729e-03
 -4.66460539e-01]


In [100]:
# 預測值與實際值的差距，使用 MSE
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred4))

Mean squared error: 25.74


In [101]:
# 讀取資料集
boston = datasets.load_boston()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=4)


seq = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
ridge_mse = []

for alpha in seq:
    Ridge_test = linear_model.Ridge(alpha=alpha)
    Ridge_test.fit(x_train, y_train)
    y_pred5 = Ridge_test.predict(x_test)
    ridge_mse.append(mean_squared_error(y_test, y_pred5))
    
    #print("Mean squared error: %.2f"
     # % mean_squared_error(y_test, y_pred5))
    
    #print("Using alpha:%.2f"
     # %alpha)
    #print("============================")


看看Ridge 的參數，明顯比起 Linear Regression的參數都小了許多，然而MSE一樣比原來的要大。



LASSO 與 Ridge 的結果並沒有比原本的線性回歸來得好，因為目標函數被加上了正規化函數，讓模型不能過於複雜，相當於限制模型擬和資料的能力。

因此若沒有發現 Over-fitting 的情況，不需要一開始就加上太強的正規化。


# Supplement:

使用Elastic Net

涵蓋Ridge和Lasso兩個模型的就是Elastic Net模型，該模型綜合了兩個懲罰限制式。

雖然Lasso模型會執行變數挑選，但一個源自於懲罰參數的結果就是，通常當兩個高度相關的變數的係數在被逼近成為0的過程中，可能一個會完全變成0但另為一個仍保留在模型中。此外，這種一個在內、一個在外的處理方法不是很有系統。相對的，Ridge模型的懲罰參數就稍具效率一點，可以有系統的
將高相關性變數的係數一起降低。於是，Elastic Net模型的優勢就在於，它綜合了Ridge Penalty達到有效正規化優勢以及Lasso Penalty能夠進行變數挑選優勢。

https://www.jamleecute.com/regularized-regression-ridge-lasso-elastic/

跟Ridge和Lasso一樣，並調整介於0~1之間的alpha參數。

當alpha = 0.5時，Ridge和Lasso的組合是平均的，而當alpha→0時，會有較多的Ridge Penalty權重，而當alpha→1時，則會有較多的Lasso Penalty權重。


In [102]:
#建立一個Tuning Grid來做tuning:

# 讀取資料集
boston = datasets.load_boston()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=4)

from pandas import DataFrame

seq = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]

ElasticNet_mse = []

for alpha in seq:
    lasso_test = linear_model.Lasso(alpha=alpha)
    lasso_test.fit(x_train, y_train)
    y_pred6 = lasso_test.predict(x_test)
    ElasticNet_mse.append(mean_squared_error(y_test, y_pred6))
 
    print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred6))
    
    print("Using alpha:%.2f"
      %alpha)
    print("============================")
    
    

Mean squared error: 25.42
Using alpha:0.00
Mean squared error: 26.45
Using alpha:0.10
Mean squared error: 26.60
Using alpha:0.20
Mean squared error: 26.65
Using alpha:0.30
Mean squared error: 26.76
Using alpha:0.40
Mean squared error: 26.94
Using alpha:0.50
Mean squared error: 27.22
Using alpha:0.60
Mean squared error: 27.59
Using alpha:0.70
Mean squared error: 27.98
Using alpha:0.80
Mean squared error: 28.43
Using alpha:0.90
Mean squared error: 28.95
Using alpha:1.00


  positive)
  positive)


# 綜合比較:

In [103]:
seq = [0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]

df_comparison = DataFrame(seq, columns = ["alpha"])
df_comparison["Ridge"] = ridge_mse
df_comparison["Lasso"] = lasso_mse
df_comparison["ElasticNet"] = ElasticNet_mse

df_comparison

Unnamed: 0,alpha,Ridge,Lasso,ElasticNet
0,0.0,25.419587,25.419587,25.419587
1,0.1,25.455212,26.452889,26.452889
2,0.2,25.491958,26.603395,26.603395
3,0.3,25.528482,26.645595,26.645595
4,0.4,25.564014,26.759413,26.759413
5,0.5,25.598134,26.944839,26.944839
6,0.6,25.630629,27.217349,27.217349
7,0.7,25.661416,27.589858,27.589858
8,0.8,25.69049,27.977013,27.977013
9,0.9,25.71789,28.43048,28.43048
