## [作業重點]
使用 Sklearn 中的線性迴歸模型，來訓練各種資料集，務必了解送進去模型訓練的**資料型態**為何，也請了解模型中各項參數的意義

## 作業
試著使用 sklearn datasets 的其他資料集 (wine, boston, ...)，來訓練自己的線性迴歸模型。

### HINT: 注意 label 的型態，確定資料集的目標是分類還是回歸，在使用正確的模型訓練！

In [1]:
from sklearn import datasets, linear_model, metrics
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, LabelEncoder

import numpy as np

import matplotlib.pyplot as plt

In [2]:
def data(dataset, is_regression):
    print('data shape:', dataset.data.shape)
    print('target shape:', dataset.target.shape)
    
    X = dataset.data
    x_train, x_test, y_train, y_test = train_test_split(X, dataset.target, test_size=0.1, random_state=4)
    print('x_train', x_train[0])
    print('y_train', y_train[0])
    print('x_test', x_test[0])
    print('y_test', y_test[0])
    
    if is_regression:
        model = linear_model.LinearRegression()
    else:
        model = linear_model.LogisticRegression(penalty='l2',solver='newton-cg',multi_class='multinomial')

    model.fit(x_train, y_train)
    y_pred = model.predict(x_test)
    
    if is_regression:
        print("Mean squared error: %.2f"% mean_squared_error(y_test, y_pred))
    else:
        print("r2_score: %.2f"% r2_score(y_test, y_pred))
        print('accuracy_score: %.2f'% accuracy_score(y_test, y_pred))
    print()

In [3]:
diabetes = datasets.load_diabetes()
data(diabetes, True)

data shape: (442, 10)
target shape: (442,)
x_train [-0.04547248 -0.04464164 -0.04824063 -0.01944209 -0.00019301 -0.01603186
  0.06704829 -0.03949338 -0.02479119  0.01963284]
y_train 111.0
x_test [-0.04183994 -0.04464164 -0.04931844 -0.03665645 -0.00707277 -0.02260797
  0.08545648 -0.03949338 -0.06648815  0.00720652]
y_test 128.0
Mean squared error: 2840.79



In [4]:
boston = datasets.load_boston()
data(boston, True)

data shape: (506, 13)
target shape: (506,)
x_train [  2.44953   0.       19.58      0.        0.605     6.402    95.2
   2.2625    5.      403.       14.7     330.04     11.32   ]
y_train 22.3
x_test [2.1124e-01 1.2500e+01 7.8700e+00 0.0000e+00 5.2400e-01 5.6310e+00
 1.0000e+02 6.0821e+00 5.0000e+00 3.1100e+02 1.5200e+01 3.8663e+02
 2.9930e+01]
y_test 16.5
Mean squared error: 17.04



In [5]:
breast_cancer = datasets.load_breast_cancer()
data(diabetes, True)

data shape: (442, 10)
target shape: (442,)
x_train [-0.04547248 -0.04464164 -0.04824063 -0.01944209 -0.00019301 -0.01603186
  0.06704829 -0.03949338 -0.02479119  0.01963284]
y_train 111.0
x_test [-0.04183994 -0.04464164 -0.04931844 -0.03665645 -0.00707277 -0.02260797
  0.08545648 -0.03949338 -0.06648815  0.00720652]
y_test 128.0
Mean squared error: 2840.79



In [6]:
iris = datasets.load_iris()
data(iris, False)

data shape: (150, 4)
target shape: (150,)
x_train [4.9 3.1 1.5 0.2]
y_train 0
x_test [6.4 2.8 5.6 2.1]
y_test 2
r2_score: 0.92
accuracy_score: 0.93



In [7]:
wine = datasets.load_wine()
data(wine, False)

data shape: (178, 13)
target shape: (178,)
x_train [1.229e+01 2.830e+00 2.220e+00 1.800e+01 8.800e+01 2.450e+00 2.250e+00
 2.500e-01 1.990e+00 2.150e+00 1.150e+00 3.300e+00 2.900e+02]
y_train 1
x_test [1.296e+01 3.450e+00 2.350e+00 1.850e+01 1.060e+02 1.390e+00 7.000e-01
 4.000e-01 9.400e-01 5.280e+00 6.800e-01 1.750e+00 6.750e+02]
y_test 2
r2_score: 1.00
accuracy_score: 1.00

