## 練習時間
試著使用 sklearn datasets 的其他資料集 (boston, ...)，來訓練自己的線性迴歸模型，並加上適當的正則話來觀察訓練情形。

In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

In [10]:
# Load dataset
boston = datasets.load_boston()
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33


In [15]:
train_x, test_x, train_y, test_y = train_test_split(df, boston.target, test_size = 0.25)

In [28]:
# Linear Regression
LR = linear_model.LinearRegression()
LR.fit(train_x, train_y)
pred_y = LR.predict(test_x)

# Result
print(LR.coef_)
print('MSE = %.2f' % mean_squared_error(test_y, pred_y))

[-9.16901690e-02  5.20080366e-02  4.05858543e-02  3.29612016e+00
 -1.52723961e+01  3.52164172e+00  6.92532992e-03 -1.38681596e+00
  2.88621230e-01 -1.31673960e-02 -7.10345578e-01  1.02425728e-02
 -5.99086191e-01]
MSE = 22.22


In [29]:
# LASSO
lasso = linear_model.Lasso(alpha=1.0)
lasso.fit(train_x, train_y)
pred_y = lasso.predict(test_x)

# Result
print(lasso.coef_)
print('MSE = %.2f' % mean_squared_error(test_y, pred_y))

[-0.05988842  0.0554556  -0.          0.         -0.          0.57251756
  0.03058593 -0.67817208  0.25228142 -0.01574046 -0.51334058  0.00810912
 -0.82538236]
MSE = 30.64


In [31]:
# Ridge
ridge = linear_model.Ridge(alpha=1.0)
ridge.fit(train_x, train_y)
pred_y = ridge.predict(test_x)

# Result
print(ridge.coef_)
print('MSE = %.2f' % mean_squared_error(test_y, pred_y))

[-8.67375332e-02  5.30533624e-02  1.00943928e-02  3.23763752e+00
 -7.83166023e+00  3.55444484e+00  5.11226592e-04 -1.27989688e+00
  2.68108094e-01 -1.36263541e-02 -6.38479650e-01  1.06532004e-02
 -6.10186408e-01]
MSE = 22.87


# ANS
* 使用 LASSO 或 Ridge 正規化後，降低了模型複雜度，反而得到較差的結果（在 MSE 上）。
* 以下嘗試用標準化方法處理資料後再進行一般線性回歸，得到較佳的結果。

In [43]:
from sklearn.preprocessing import MinMaxScaler

df_temp = MinMaxScaler().fit_transform(df)

train_x, test_x, train_y, test_y = train_test_split(df_temp, boston.target, test_size = 0.25)

# Linear Regression
LR = linear_model.LinearRegression()
LR.fit(train_x, train_y)
pred_y = LR.predict(test_x)

# Result
print(LR.coef_)
print('MSE = %.2f' % mean_squared_error(test_y, pred_y))

[-10.58026422   4.0421942    1.45206651   1.85264878 -10.13064789
  17.42560833   0.44990265 -15.67440336   7.34297171  -6.99276082
 -10.16574938   3.02457898 -18.89645439]
MSE = 18.94
