## [作業重點]
使用 Sklearn 中的線性迴歸模型，來訓練各種資料集，務必了解送進去模型訓練的**資料型態**為何，也請了解模型中各項參數的意義

## 作業
試著使用 sklearn datasets 的其他資料集 (wine, boston, ...)，來訓練自己的線性迴歸模型。

### HINT: 注意 label 的型態，確定資料集的目標是分類還是回歸，在使用正確的模型訓練！

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score

from sklearn.model_selection import cross_val_score

import warnings
warnings.filterwarnings('ignore')

## Boston

In [16]:
boston = datasets.load_boston()

print('target y : %s \n' % boston.target[0:5])
print('shape:\n', boston.data.shape)
print('\nfeature name:\n', boston.feature_names)

X = boston.data
y = boston.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 123)
reg = linear_model.LinearRegression()
reg.fit(X_train, y_train)
print(f'\n c.v. score : {cross_val_score(reg, X_train, y_train, cv=5).mean()}\n')

y_pred = reg.predict(X_test)
print('Coefficients: %s \n' % reg.coef_)
mse = mean_squared_error(y_test, y_pred)
print("MSE: %2f" % mse)

target y : [24.  21.6 34.7 33.4 36.2] 

shape:
 (506, 13)

feature name:
 ['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']

 c.v. score : 0.731164762811346

Coefficients: [-9.87931696e-02  4.75027102e-02  6.69491841e-02  1.26954150e+00
 -1.54697747e+01  4.31968412e+00 -9.80167937e-04 -1.36597953e+00
  2.84521838e-01 -1.27533606e-02 -9.13487599e-01  7.22553507e-03
 -5.43790245e-01] 

MSE: 28.192486


## Wine

In [22]:
wine = datasets.load_wine()

print("target y : %s \n" % wine.target[0:5])
print("shape : \n", wine.data.shape)
print('\nfeature name:\n', wine.feature_names)
print('\ntarget names:\n', wine.target_names)

X = wine.data
y = wine.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 123)
logreg = linear_model.LogisticRegression()
logreg.fit(X_train, y_train)
print(f'\n c.v. score : {cross_val_score(logreg, X_train, y_train, cv=5).mean()}\n')

y_pred = logreg.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print("Accuracy : ", acc)

target y : [0 0 0 0 0] 

shape : 
 (178, 13)

feature name:
 ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']

target names:
 ['class_0' 'class_1' 'class_2']

 c.v. score : 0.9365641025641025

Accuracy :  0.9444444444444444


## Breast_Cancer

In [25]:
breast = datasets.load_breast_cancer()

print('target y : %s \n' % breast.target[0:5])
print('shape:\n', breast.data.shape)
print('\nfeature name:\n', breast.feature_names)
print('\ntarget names:\n', breast.target_names)

X = breast.data
y = breast.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 123)
logistic_reg = linear_model.LogisticRegression()
logistic_reg.fit(X_train, y_train)
print(f'\n c.v. score : {cross_val_score(logistic_reg, X_train, y_train, cv=5).mean()}\n')

y_pred = logistic_reg.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print("Accuracy: ", acc)

target y : [0 0 0 0 0] 

shape:
 (569, 30)

feature name:
 ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']

target names:
 ['malignant' 'benign']

 c.v. score : 0.9419871794871796

Accuracy:  0.9824561403508771
