## [作業重點]
使用 Sklearn 中的 Lasso, Ridge 模型，來訓練各種資料集，務必了解送進去模型訓練的**資料型態**為何，也請了解模型中各項參數的意義。

機器學習的模型非常多種，但要訓練的資料多半有固定的格式，確保你了解訓練資料的格式為何，這樣在應用新模型時，就能夠最快的上手開始訓練！

## 練習時間
試著使用 sklearn datasets 的其他資料集 (boston, ...)，來訓練自己的線性迴歸模型，並加上適當的正則話來觀察訓練情形。

In [1]:
# 將需要的都import進來
import os
import copy
import time
import math
import numpy             as np
import pandas            as pd
import seaborn           as sns
import datetime          as dt
import warnings
import matplotlib.pyplot as plt
from scipy                   import stats
from itertools               import compress
from sklearn.metrics         import roc_curve,mean_squared_error,r2_score,accuracy_score,precision_score,recall_score,fbeta_score
from sklearn.ensemble        import GradientBoostingRegressor,GradientBoostingClassifier,RandomForestClassifier
from sklearn.datasets        import load_boston, load_wine
from sklearn.linear_model    import LogisticRegression,LinearRegression,Lasso,Ridge
from sklearn.preprocessing   import LabelEncoder, MinMaxScaler, StandardScaler,OneHotEncoder
from sklearn.model_selection import cross_val_score,train_test_split
from IPython.display         import YouTubeVideo

# 將較長的函式改名一下
MSE  = mean_squared_error
ACC  = accuracy_score
MME  = MinMaxScaler()
LE   = LabelEncoder()
LR   = LogisticRegression()
LIR  = LinearRegression()
GBR  = GradientBoostingRegressor()
GBC  = GradientBoostingClassifier()
RFC  = RandomForestClassifier()
OHE  = OneHotEncoder()

# 一些必要的設定
warnings.filterwarnings('ignore')
%matplotlib inline

# 設定【data的資料夾路徑】，命名為【data_folder】
data_folder = 'C:/Users/Ynitsed/Documents/GitHub/2nd-ML100Days/data'

In [2]:
# 讀取 Boston 資料
t001 = load_boston()
# 轉成dataframe觀看資料：X
train_X_t1 = pd.DataFrame(t001.data, columns=t001.feature_names)
print(train_X_t1.shape)
print(train_X_t1.head())
# 轉成dataframe觀看資料：Y
train_Y_t1 = pd.DataFrame({"target": t001.target})
print(train_Y_t1.shape)
print(train_Y_t1.head())

(506, 13)
      CRIM    ZN  INDUS  CHAS    NOX     RM   AGE     DIS  RAD    TAX  \
0  0.00632  18.0   2.31   0.0  0.538  6.575  65.2  4.0900  1.0  296.0   
1  0.02731   0.0   7.07   0.0  0.469  6.421  78.9  4.9671  2.0  242.0   
2  0.02729   0.0   7.07   0.0  0.469  7.185  61.1  4.9671  2.0  242.0   
3  0.03237   0.0   2.18   0.0  0.458  6.998  45.8  6.0622  3.0  222.0   
4  0.06905   0.0   2.18   0.0  0.458  7.147  54.2  6.0622  3.0  222.0   

   PTRATIO       B  LSTAT  
0     15.3  396.90   4.98  
1     17.8  396.90   9.14  
2     17.8  392.83   4.03  
3     18.7  394.63   2.94  
4     18.7  396.90   5.33  
(506, 1)
   target
0    24.0
1    21.6
2    34.7
3    33.4
4    36.2


In [3]:
# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(train_X_t1, train_Y_t1, test_size=0.1, random_state=4)
# 看切完長怎樣
print(x_train.shape)
print(x_train.head())
print(y_train.shape)
print(y_train.head())
print(x_test.shape)
print(x_test.head())
print(y_test.shape)
print(y_test.head())

(455, 13)
        CRIM   ZN  INDUS  CHAS    NOX     RM    AGE     DIS   RAD    TAX  \
169  2.44953  0.0  19.58   0.0  0.605  6.402   95.2  2.2625   5.0  403.0   
402  9.59571  0.0  18.10   0.0  0.693  6.404  100.0  1.6390  24.0  666.0   
295  0.12932  0.0  13.92   0.0  0.437  6.678   31.1  5.9604   4.0  289.0   
134  0.97617  0.0  21.89   0.0  0.624  5.757   98.4  2.3460   4.0  437.0   
117  0.15098  0.0  10.01   0.0  0.547  6.021   82.6  2.7474   6.0  432.0   

     PTRATIO       B  LSTAT  
169     14.7  330.04  11.32  
402     20.2  376.11  20.31  
295     16.0  396.90   6.27  
134     21.2  262.76  17.31  
117     17.8  394.51  10.30  
(455, 1)
     target
169    22.3
402    12.1
295    28.6
134    15.6
117    19.2
(51, 13)
        CRIM    ZN  INDUS  CHAS    NOX     RM    AGE     DIS  RAD    TAX  \
8    0.21124  12.5   7.87   0.0  0.524  5.631  100.0  6.0821  5.0  311.0   
289  0.04297  52.5   5.32   0.0  0.405  6.565   22.9  7.3172  6.0  293.0   
68   0.13554  12.5   6.07   0.0  0.

In [4]:
# LASSO
# 跑完背後就已經有整個回歸模型了
LAS  = Lasso(alpha=0.1)
LAS.fit(x_train, y_train)
# 印出coef
print(LAS.coef_)
print(LAS.intercept_)
print(LAS.score(x_train, y_train))
# 將x_test丟進上面跑好的回歸模型裡，得到y_pred，也就是預測出來的y_pred。
y_pred = LAS.predict(x_test)
print(y_pred.shape)
print(pd.DataFrame(y_pred).head())
# 看一下預測出來的y_pred和實際的y_test差多少？
print("Mean squared error: %.2f"% MSE(y_test, y_pred))

[-0.11567831  0.05152311 -0.03346275  1.2230427  -0.          3.53216363
 -0.00922692 -1.19460642  0.28775344 -0.01473748 -0.75732817  0.01037228
 -0.58007751]
[26.51430651]
0.7213721091124857
(51,)
           0
0  10.564299
1  26.448298
2  16.872990
3  14.403291
4  36.824084
Mean squared error: 18.19


In [5]:
# RIDGE
# 跑完背後就已經有整個回歸模型了
RIG  = Ridge(alpha=0.1)
RIG.fit(x_train, y_train)
# 印出coef
print(RIG.coef_)
print(RIG.intercept_)
print(RIG.score(x_train, y_train))
# 將x_test丟進上面跑好的回歸模型裡，得到y_pred，也就是預測出來的y_pred。
y_pred = RIG.predict(x_test)
print(y_pred.shape)
print(pd.DataFrame(y_pred).head())
# 看一下預測出來的y_pred和實際的y_test差多少？
print("Mean squared error: %.2f"% MSE(y_test, y_pred))

[[-1.25305527e-01  4.85968956e-02  1.35490467e-02  3.05965839e+00
  -1.61558736e+01  3.62646355e+00  1.10808781e-03 -1.47693015e+00
   3.17043498e-01 -1.28048402e-02 -9.15220623e-01  9.56698213e-03
  -5.35070934e-01]]
[36.24753622]
0.7343810440447924
(51, 1)
           0
0  11.393837
1  26.774051
2  17.388518
3  17.434389
4  37.356057
Mean squared error: 17.07


In [6]:
# LIR
# 跑完背後就已經有整個回歸模型了
LIR.fit(x_train, y_train)
# 印出coef
print(LIR.coef_)
print(LIR.intercept_)
print(LIR.score(x_train, y_train))
# 將x_test丟進上面跑好的回歸模型裡，得到y_pred，也就是預測出來的y_pred。
y_pred = LIR.predict(x_test)
print(y_pred.shape)
print(pd.DataFrame(y_pred).head())
# 看一下預測出來的y_pred和實際的y_test差多少？
print("Mean squared error: %.2f"% MSE(y_test, y_pred))

[[-1.25856659e-01  4.84257396e-02  1.84085281e-02  3.08509569e+00
  -1.73277018e+01  3.61674713e+00  2.19181853e-03 -1.49361132e+00
   3.19979200e-01 -1.27294649e-02 -9.27469086e-01  9.50912468e-03
  -5.33592471e-01]]
[37.06602854]
0.734430389314123
(51, 1)
           0
0  11.460308
1  26.802693
2  17.434789
3  17.556310
4  37.391564
Mean squared error: 17.04
