## 練習時間
試著使用 sklearn datasets 的其他資料集 (wine, boston, ...)，來訓練自己的線性迴歸模型。

### HINT: 注意 label 的型態，確定資料集的目標是分類還是回歸，在使用正確的模型訓練！

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score

### Linear regssion

In [2]:
# load boston house-prices dataset
boston = datasets.load_boston()
print(boston.data.shape)
print(boston.feature_names)

(506, 13)
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']


In [3]:
# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.1, random_state=42)
# 建立一個線性回歸模型
regr = linear_model.LinearRegression()
# 將訓練資料丟進去模型訓練
regr.fit(x_train, y_train)
# 將測試資料丟進模型得到預測結果
y_pred = regr.predict(x_test)

# 可以看回歸模型的參數值
print(f'Coefficients: {regr.coef_}')
# 預測值與實際值的差距，使用 MSE
print(f'Mean squared error: {mean_squared_error(y_test, y_pred): .2f}')
# 預測值與實際值的差距，使用 r2 score
print(f'r2_score: {r2_score(y_test, y_pred): .2f}')

Coefficients: [-1.19886262e-01  3.99134691e-02  2.12938504e-02  2.77565167e+00
 -1.85854960e+01  3.75579160e+00  4.57076424e-03 -1.47064595e+00
  3.11878023e-01 -1.18109903e-02 -9.47556337e-01  1.03287982e-02
 -5.50096256e-01]
Mean squared error:  15.00
r2_score:  0.76


### Logistics regression

In [4]:
# load breast cancer wisconsin dataset
breast_cancer = datasets.load_breast_cancer()
print(breast_cancer.data.shape)
print(breast_cancer.feature_names)

(569, 30)
['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']


#### use LogisticRegression

In [5]:
# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(breast_cancer.data, breast_cancer.target, test_size=0.1, random_state=42)
# 建立模型
logreg = linear_model.LogisticRegression()
# 訓練模型
logreg.fit(x_train, y_train)
# 預測測試集
y_pred = logreg.predict(x_test)

# 準確率
acc = accuracy_score(y_test, y_pred)
print("Accuracy: ", acc)

Accuracy:  0.9824561403508771




#### use LogisticRegressionCV

In [6]:
# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(breast_cancer.data, breast_cancer.target, test_size=0.1, random_state=42)
# 建立模型
C = list(2**i for i in range(-5, 16, 2))
logreg = linear_model.LogisticRegressionCV(Cs = C, cv = 5, penalty = 'l2', solver = 'liblinear', random_state=42)
# 訓練模型
logreg.fit(x_train, y_train)
print(f'The best parameters of linear are: {logreg.C_}')
# 預測測試集
y_pred = logreg.predict(x_test)

acc = accuracy_score(y_test, y_pred)
print("Accuracy: ", acc)

The best parameters of linear are: [2048.]
Accuracy:  0.9824561403508771


### 嘗試將LogisticRegressionCV和LogisticRegression共通參數調為一致，發現結果一樣的好