## [作業重點]
使用 Sklearn 中的 Lasso, Ridge 模型，來訓練各種資料集，務必了解送進去模型訓練的**資料型態**為何，也請了解模型中各項參數的意義。

機器學習的模型非常多種，但要訓練的資料多半有固定的格式，確保你了解訓練資料的格式為何，這樣在應用新模型時，就能夠最快的上手開始訓練！

## 練習時間
試著使用 sklearn datasets 的其他資料集 (boston, ...)，來訓練自己的線性迴歸模型，並加上適當的正則話來觀察訓練情形。

In [10]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings

warnings.simplefilter('ignore')

# Datasets
from sklearn import datasets

# Preprocessing
from sklearn.model_selection import train_test_split

# Model
from sklearn.linear_model import LinearRegression, Lasso, Ridge

# Evaluation
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score

In [11]:
# 讀取波士頓房產資料集(回歸問題)，其中 boston 為一個字典
boston = datasets.load_boston()
print(f"Keys in boston: {list(boston.keys())}")

# 轉成 DataFrame 比較方便觀察
boston_df = pd.DataFrame(boston.data, columns=boston.feature_names)
display(boston_df.head())

# 使用資料集中的所有特徵
X = boston_df # X 需要為一個 matrix
y = boston.target
print("X shape: ", X.shape)
print("y shape: ", y.shape)

# 切分訓練集/測試集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)

Keys in boston: ['data', 'target', 'feature_names', 'DESCR', 'filename']


Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33


X shape:  (506, 13)
y shape:  (506,)


### Ols

In [17]:
# 建立一個 Ols 模型
lin_reg = LinearRegression()

# 將訓練資料丟進去模型訓練
lin_reg.fit(X_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred = lin_reg.predict(X_test)
print('y_pred:\n', y_pred)

# 印出 Ols 回歸的係數與及截距
print('\n Features\n', boston.feature_names)
print('\n Coeff.\n', lin_reg.coef_)
print('\n Intercept\n', lin_reg.intercept_)

# 回歸問題的衡量採用 MSE 及 R square
print('\n Mean squared error\n', mean_squared_error(y_test, y_pred))
print('\n R square\n' , r2_score(y_test, y_pred))

y_pred:
 [25.01207787 23.70673643 29.0945173  12.31397471 21.61347812 19.13354868
 20.81580439 21.37329011 18.38618961 19.34579424  5.25912036 16.65767507
 17.52569896  5.77456709 39.90010353 32.4334732  22.86945378 36.53576421
 30.95591345 23.0906515  24.91430476 24.08929781 20.54441681 30.23258421
 22.3642316   8.72252642 17.58062573 17.65060042 36.10230383 20.91252213
 18.77553493 18.18471313 19.85999794 23.90528147 28.93272041 19.23050276
 12.01526727 24.24058855 17.68050031 16.09113614 26.38479882 21.06267915
 22.32605647 15.61632473 22.9796011  25.12377027 20.21458841 22.45911017
  9.8519346  24.41614999 20.21336125]

 Features
 ['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']

 Coeff.
 [-1.14644795e-01  3.62004052e-02  6.53873262e-03  2.19924733e+00
 -1.59109961e+01  4.26798929e+00 -1.01602089e-02 -1.34698690e+00
  2.71154731e-01 -1.16326045e-02 -1.01714981e+00  9.81293722e-03
 -4.43797298e-01]

 Intercept
 33.17002936336296

 Mean squared e

### Lasso

In [30]:
# 建立一個 Lasso 模型
lasso_reg = Lasso(alpha=1)

# 將訓練資料丟進去模型訓練
lasso_reg.fit(X_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred = lasso_reg.predict(X_test)
print('y_pred:\n', y_pred)

# 印出 Ols 回歸的係數與及截距
print('\n Features\n', boston.feature_names)
print('\n Coeff.\n', lasso_reg.coef_)
print('\n Intercept\n', lasso_reg.intercept_)

# 回歸問題的衡量採用 MSE 及 R square
print('\n Mean squared error\n', mean_squared_error(y_test, y_pred))
print('\n R square\n' , r2_score(y_test, y_pred))

y_pred:
 [24.4230389  23.89746374 27.32400364 14.73342388 20.70850456 22.4384992
 20.98552337 24.0959632  21.1734491  19.55162532  8.75228638 13.26798164
 17.65547845  8.24169988 35.22528701 31.26174819 21.87098095 34.71826526
 30.14285192 24.64731542 25.75568492 25.22150528 19.84704073 29.37328869
 23.94883154 15.7663962  19.54459648 22.20949787 31.58945142 19.73857903
 17.50092776 19.52821382 22.54420552 24.50025177 28.27424269 19.53046596
 10.92181361 23.71749778 15.58307134 12.93098679 26.21114575 21.40738414
 22.6751459  16.15951612 23.92697881 25.8821277  20.71907374 23.65929269
 13.23116263 23.42057847 21.55312032]

 Features
 ['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']

 Coeff.
 [-0.06174164  0.03936912 -0.          0.         -0.          1.3255211
  0.00960452 -0.6067475   0.24067056 -0.01521031 -0.82319207  0.00835679
 -0.67854148]

 Intercept
 40.366054712827406

 Mean squared error
 43.683254678618525

 R square
 0.492197244697209

### Ridge

In [32]:
# 建立一個 Ridge 模型
ridge_reg = Ridge(alpha=1)

# 將訓練資料丟進去模型訓練
ridge_reg.fit(X_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred = ridge_reg.predict(X_test)
print('y_pred:\n', y_pred)

# 印出 Ols 回歸的係數與及截距
print('\n Features\n', boston.feature_names)
print('\n Coeff.\n', ridge_reg.coef_)
print('\n Intercept\n', ridge_reg.intercept_)

# 回歸問題的衡量採用 MSE 及 R square
print('\n Mean squared error\n', mean_squared_error(y_test, y_pred))
print('\n R square\n' , r2_score(y_test, y_pred))

y_pred:
 [25.0505207  23.32706418 28.75654424 12.41413685 21.04732972 19.5950973
 20.55574879 21.73862921 18.52662782 19.14839334  5.33615227 15.91853591
 17.92947743  5.79113964 39.54025911 32.36167078 22.22692471 36.25612401
 31.07394122 23.24953639 25.03030276 23.67012213 20.35536729 30.08959666
 22.25416291  8.22649033 17.74967827 18.94567532 35.78161554 20.5575629
 18.3371061  18.58023834 19.64105601 23.47140428 28.64699692 19.84415933
 12.04574706 24.12601348 17.0000725  15.35772894 25.97061613 20.92951523
 22.30965732 14.96252676 23.27904915 24.88850504 19.55941074 23.77219553
 11.04620036 24.37672849 21.55107692]

 Features
 ['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']

 Coeff.
 [-0.11061719  0.03736061 -0.02260947  2.1574538  -8.94164181  4.29598809
 -0.01659573 -1.25273893  0.25153116 -0.01214523 -0.94555225  0.01011837
 -0.45123209]

 Intercept
 28.567541866603328

 Mean squared error
 42.76314460180224

 R square
 0.5028932066997358