# 作業三: 分類器的原理與評比實驗
## 資料來源: 來自 Yale Face 38 人的人臉影像共 2410 張，每張大小 192×168
## 目標: 
## 計畫執行這篇講義描述的分類器比較，即採用三種分類器分別對三組資料進行分類學習與測試。其中分類器包括： 
## 1.多元羅吉斯回歸 2.支援向量機 3.神經網路
## 影像資料處理: 
## 1.原始資料 2.進行PCA主成分分析
## 分類方法: 
- ## Logistic Regression
- ## SVM
- ## Neural Network
### 姓名: 鄭欣莉
### 學號: 410877039

## 標準化後原始資料 + 羅吉斯迴歸

- solver = 'lbfgs'

In [1]:
import numpy as np
import pandas as pd 
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import scipy.io 

D = scipy.io.loadmat('allFaces.mat')
X = D['faces'].T
Y = np.ndarray.flatten(D['nfaces']) #轉換成[64個0, 62個1, .....]
y = []
for i in range(len(Y)): #len(Y)
    for j in range(Y[i]):
        y.append(i)
y = np.array(y)
m = int(D['m']) #168
n = int(D['n']) #192
n_persons = int(D['person']) #38人
test_size = 0.30 #多少比較好
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size)
#標準化
scalar = StandardScaler()
X_train_ = scalar.fit_transform(X_train)
X_test_ = scalar.fit_transform(X_test)

In [2]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

def clf_LR(solver):
    opts = dict(tol = 1e-6, max_iter = int(1e6), verbose = 1)
    clf_LR = LogisticRegression(solver = solver, **opts)
    clf_LR.fit(X_train_,y_train)
    y_pred = clf_LR.predict(X_test_)
    print(f"{clf_LR.score(X_test_, y_test):.2%}\n")
    print(classification_report(y_test,y_pred))

In [3]:
clf_LR(solver = 'lbfgs')

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed: 45.0min finished


95.44%

              precision    recall  f1-score   support

           0       0.95      0.75      0.84        28
           1       0.95      1.00      0.97        19
           2       0.93      1.00      0.96        25
           3       0.89      1.00      0.94        17
           4       1.00      1.00      1.00        19
           5       1.00      1.00      1.00        14
           6       1.00      1.00      1.00        23
           7       0.94      1.00      0.97        17
           8       0.85      1.00      0.92        17
           9       0.90      0.95      0.93        20
          10       1.00      0.92      0.96        24
          11       1.00      1.00      1.00        19
          12       0.95      1.00      0.97        18
          13       1.00      0.79      0.88        19
          14       0.94      0.94      0.94        17
          15       1.00      0.89      0.94         9
          16       1.00      1.00      1.00        18
          17       

- solver = 'liblinear'

In [4]:
clf_LR(solver = 'liblinear')

[LibLinear]97.93%

              precision    recall  f1-score   support

           0       1.00      0.89      0.94        28
           1       0.95      0.95      0.95        19
           2       0.93      1.00      0.96        25
           3       1.00      1.00      1.00        17
           4       1.00      1.00      1.00        19
           5       1.00      1.00      1.00        14
           6       1.00      1.00      1.00        23
           7       1.00      1.00      1.00        17
           8       0.89      1.00      0.94        17
           9       1.00      0.95      0.97        20
          10       1.00      0.96      0.98        24
          11       1.00      1.00      1.00        19
          12       0.95      1.00      0.97        18
          13       1.00      0.84      0.91        19
          14       1.00      1.00      1.00        17
          15       1.00      1.00      1.00         9
          16       1.00      1.00      1.00        18
        

- solver = 'newton-cg'

## 結論：
### 1. Classification Report 的衡量指標個別代表的意思:
- ### 精確率(Precision) 為預測為真的樣本有幾個預測正確
- ### 召回率(Recall) 為事實為真的樣本中有幾個是預測正確的
- ### F1 score 為精確率和召回率的加權平均數
### 2. 以不同solver的第一筆資料為例:
- ### solver = 'lbfgs' 預測為真且預測正確的比例為95%，solver = 'liblinear'為100%
- ### solver = 'lbfgs 在事實為真的樣本中預測正確率只有75%，solver = 'liblinear'為89%
- ### solver = 'lbfgs'精確率和召回率的調和平均為84%，solver = 'liblinear'為94%
### 3. 標準化過後的AT&T資料，solver = 'lbfgs' 跟 'newton-cg' 表現均差不多，準確率均在90%以上，solver = 'liblinear' 準確率最高98%，對於38筆資料來說表現絕對不差。
### 4. solver = 'lbfgs' 需要跑的時間比較少

## PCA主成分分析(成分比例採0.95) + 羅吉斯迴歸

- solver = 'lbfgs'

In [5]:
from sklearn.decomposition import PCA

def PCA_LR(n_components, solver):
    pca = PCA(n_components = n_components).fit(X_train_)
    Z_train = pca.transform(X_train_)
    Z_test = pca.transform(X_test_)
    opts = dict(tol = 1e-6, max_iter = int(1e6), verbose = 1)
    clf_PCA = LogisticRegression(solver = solver, **opts)
    clf_PCA.fit(Z_train, y_train)
    y_pred = clf_PCA.predict(Z_test)
    print(f"{clf_PCA.score(Z_test, y_test):.2%}\n")
    print(classification_report(y_test,y_pred))

In [6]:
PCA_LR(n_components = 0.9, solver = 'lbfgs')

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


87.14%

              precision    recall  f1-score   support

           0       1.00      0.64      0.78        28
           1       1.00      0.89      0.94        19
           2       0.92      0.88      0.90        25
           3       0.94      1.00      0.97        17
           4       0.95      1.00      0.97        19
           5       0.82      1.00      0.90        14
           6       1.00      0.87      0.93        23
           7       0.76      0.76      0.76        17
           8       0.79      0.88      0.83        17
           9       0.94      0.80      0.86        20
          10       1.00      0.75      0.86        24
          11       0.81      0.89      0.85        19
          12       1.00      0.83      0.91        18
          13       1.00      0.84      0.91        19
          14       0.79      0.88      0.83        17
          15       0.89      0.89      0.89         9
          16       0.85      0.94      0.89        18
          17       

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   30.0s finished


- solver = 'liblinear'

In [7]:
PCA_LR(n_components = 0.9, solver = 'liblinear')

[LibLinear]79.39%

              precision    recall  f1-score   support

           0       0.94      0.61      0.74        28
           1       0.60      0.79      0.68        19
           2       0.95      0.84      0.89        25
           3       0.73      0.94      0.82        17
           4       0.90      0.95      0.92        19
           5       0.68      0.93      0.79        14
           6       0.87      0.57      0.68        23
           7       0.50      0.65      0.56        17
           8       0.88      0.88      0.88        17
           9       1.00      0.70      0.82        20
          10       0.95      0.75      0.84        24
          11       0.84      0.84      0.84        19
          12       0.94      0.94      0.94        18
          13       1.00      0.74      0.85        19
          14       0.87      0.76      0.81        17
          15       0.50      0.78      0.61         9
          16       0.84      0.89      0.86        18
        

In [8]:
PCA_LR(n_components = 0.95, solver = 'lbfgs')

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


94.05%

              precision    recall  f1-score   support

           0       1.00      0.71      0.83        28
           1       1.00      0.95      0.97        19
           2       1.00      1.00      1.00        25
           3       0.85      1.00      0.92        17
           4       1.00      1.00      1.00        19
           5       0.88      1.00      0.93        14
           6       0.96      1.00      0.98        23
           7       0.89      1.00      0.94        17
           8       0.94      1.00      0.97        17
           9       0.85      0.85      0.85        20
          10       1.00      0.88      0.93        24
          11       1.00      1.00      1.00        19
          12       1.00      0.89      0.94        18
          13       1.00      0.79      0.88        19
          14       0.89      0.94      0.91        17
          15       0.89      0.89      0.89         9
          16       1.00      1.00      1.00        18
          17       

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   18.7s finished


## 結論:
### PCA的AT&T資料，當成分比例採<=0.9時，準確率均在90%以下，solver = 'lbfgs' 的準確率為87%，solver = 'liblinear' 準確率為79%，因此表示無法在降低維度的同時保留重要資訊；當成分比例採0.95時，準確率94%，所以下面主成分比例將採0.95。 

## 標準化後原始資料+SVC

### one vs one

- kernel = 'linear'

In [9]:
from sklearn.svm import SVC,LinearSVC

def clf_SVC(C, opts, clf_svm):
    clf_svm.fit(X_train_,y_train)
    predictions = clf_svm.predict(X_test_)
    print(classification_report(y_test, predictions))

In [10]:
C = 1
opts = dict(C = C, decision_function_shape = 'ovo', \
             tol = 1e-6, max_iter = int(1e6))
clf_svm = SVC(kernel = 'linear', **opts)
clf_SVC(C = C, opts = opts, clf_svm = clf_svm)

              precision    recall  f1-score   support

           0       1.00      0.75      0.86        28
           1       0.81      0.89      0.85        19
           2       0.89      0.96      0.92        25
           3       0.74      1.00      0.85        17
           4       0.73      1.00      0.84        19
           5       0.88      1.00      0.93        14
           6       0.96      1.00      0.98        23
           7       0.68      1.00      0.81        17
           8       0.88      0.88      0.88        17
           9       0.71      0.85      0.77        20
          10       1.00      0.88      0.93        24
          11       0.95      1.00      0.97        19
          12       0.88      0.83      0.86        18
          13       1.00      0.89      0.94        19
          14       0.94      0.94      0.94        17
          15       1.00      0.89      0.94         9
          16       1.00      1.00      1.00        18
          17       1.00    

- kernel = 'sigmoid'

In [11]:
C = 1
opts = dict(C = C, decision_function_shape = 'ovo', \
             tol = 1e-6, max_iter = int(1e6))
clf_svm = SVC(kernel = 'sigmoid', **opts)
clf_SVC(C = C, opts = opts, clf_svm = clf_svm)

              precision    recall  f1-score   support

           0       0.71      0.18      0.29        28
           1       0.38      0.32      0.34        19
           2       1.00      0.08      0.15        25
           3       0.59      0.59      0.59        17
           4       0.79      0.58      0.67        19
           5       0.80      0.29      0.42        14
           6       0.58      0.30      0.40        23
           7       0.50      0.82      0.62        17
           8       0.75      0.88      0.81        17
           9       0.15      0.20      0.17        20
          10       0.85      0.46      0.59        24
          11       1.00      0.42      0.59        19
          12       0.03      0.44      0.06        18
          13       0.50      0.16      0.24        19
          14       0.50      0.06      0.11        17
          15       0.10      0.11      0.11         9
          16       0.67      0.22      0.33        18
          17       1.00    

## 結論:
### 在決策函數中選擇一對一(one vs one)的模式，可以看到當 kernel 選擇 'linear' 準確率有91%，kernel 選擇 'sigmoid' 準確率卻只有36%，表現得非常不好

## PCA主成分分析(成分比例採0.95) + SVM

### one vs one

- kernel = 'linear'

In [12]:
from sklearn.decomposition import PCA

pca = PCA(n_components=0.95).fit(X_train_)
Z_train = pca.transform(X_train_) #降維
Z_test = pca.transform(X_test_)
C = 1
opts = dict(C=C,tol = 1e-6,max_iter = int(1e6))
clf_svm = SVC(kernel = 'linear',**opts)
clf_svm.fit(Z_train,y_train)
predictions = clf_svm.predict(Z_test)
print(classification_report(y_test, predictions))

              precision    recall  f1-score   support

           0       0.88      0.75      0.81        28
           1       0.80      0.84      0.82        19
           2       0.91      0.84      0.87        25
           3       0.81      1.00      0.89        17
           4       0.76      1.00      0.86        19
           5       0.88      1.00      0.93        14
           6       0.79      0.96      0.86        23
           7       0.71      0.88      0.79        17
           8       0.88      0.88      0.88        17
           9       0.81      0.85      0.83        20
          10       1.00      0.88      0.93        24
          11       0.90      1.00      0.95        19
          12       1.00      0.83      0.91        18
          13       0.88      0.79      0.83        19
          14       0.88      0.88      0.88        17
          15       0.82      1.00      0.90         9
          16       0.90      1.00      0.95        18
          17       0.83    

### one vs the rest

In [13]:
from sklearn.decomposition import PCA

pca = PCA(n_components=0.95).fit(X_train_)
Z_train = pca.transform(X_train_) #降維
Z_test = pca.transform(X_test_)
C = 1
opts = dict(C=C,tol = 1e-6,max_iter = int(1e6))
clf_svm = SVC(kernel = 'linear',**opts)
clf_svm.fit(Z_train,y_train)
predictions = clf_svm.predict(Z_test)
print(classification_report(y_test, predictions))

              precision    recall  f1-score   support

           0       0.88      0.75      0.81        28
           1       0.80      0.84      0.82        19
           2       0.91      0.84      0.87        25
           3       0.81      1.00      0.89        17
           4       0.76      1.00      0.86        19
           5       0.88      1.00      0.93        14
           6       0.79      0.96      0.86        23
           7       0.71      0.88      0.79        17
           8       0.88      0.88      0.88        17
           9       0.81      0.85      0.83        20
          10       1.00      0.88      0.93        24
          11       0.90      1.00      0.95        19
          12       1.00      0.83      0.91        18
          13       0.88      0.79      0.83        19
          14       0.88      0.88      0.88        17
          15       0.82      1.00      0.90         9
          16       0.90      1.00      0.95        18
          17       0.83    

## 結論:
### 當成分比例採0.95時，不管選擇一對一或一對其他準確率皆為89%，原資料的準確率表現較好，但是成分比例採0.95時，所需要跑的時間減少了很多，因此或許準確率89%是很不錯的

## 標準化後原始資料 + 神經網路NN

- hidden layers = (30,)

In [14]:
from sklearn.neural_network import MLPClassifier

hidden_layers = (30,)
activation = 'logistic'
opts = dict(hidden_layer_sizes = hidden_layers,verbose = True,activation = activation,tol = 1e-6,max_iter = int(1e6))
solver = 'adam'
clf_MLP = MLPClassifier(solver = solver, **opts)
clf_MLP.fit(X_train_,y_train)
predictions = clf_MLP.predict(X_test_)
print(classification_report(y_test,predictions))

Iteration 1, loss = 3.67443798
Iteration 2, loss = 3.58665463
Iteration 3, loss = 3.53058939
Iteration 4, loss = 3.47661425
Iteration 5, loss = 3.42735088
Iteration 6, loss = 3.37680664
Iteration 7, loss = 3.32126342
Iteration 8, loss = 3.26326692
Iteration 9, loss = 3.19550576
Iteration 10, loss = 3.10940056
Iteration 11, loss = 3.03183927
Iteration 12, loss = 2.94673218
Iteration 13, loss = 2.87777310
Iteration 14, loss = 2.79979739
Iteration 15, loss = 2.69814833
Iteration 16, loss = 2.61701581
Iteration 17, loss = 2.52570641
Iteration 18, loss = 2.44533189
Iteration 19, loss = 2.37073854
Iteration 20, loss = 2.28154694
Iteration 21, loss = 2.20986532
Iteration 22, loss = 2.12091287
Iteration 23, loss = 2.06549788
Iteration 24, loss = 1.98986842
Iteration 25, loss = 1.92020395
Iteration 26, loss = 1.84137532
Iteration 27, loss = 1.74861831
Iteration 28, loss = 1.67951273
Iteration 29, loss = 1.64100552
Iteration 30, loss = 1.58924052
Iteration 31, loss = 1.52528314
Iteration 32, los

- hidden layers = (512,)

In [15]:
from sklearn.neural_network import MLPClassifier

hidden_layers = (512,)
activation = 'logistic'
opts = dict(hidden_layer_sizes = hidden_layers,verbose = True,activation = activation,tol = 1e-6,max_iter = int(1e6))
solver = 'adam'
clf_MLP = MLPClassifier(solver = solver, **opts)
clf_MLP.fit(X_train_,y_train)
predictions = clf_MLP.predict(X_test_)
print(classification_report(y_test,predictions))

Iteration 1, loss = 3.62200575
Iteration 2, loss = 2.93244924
Iteration 3, loss = 2.56313388
Iteration 4, loss = 2.24841519
Iteration 5, loss = 1.90345348
Iteration 6, loss = 1.59988835
Iteration 7, loss = 1.32108053
Iteration 8, loss = 1.09378386
Iteration 9, loss = 0.92433359
Iteration 10, loss = 0.78460455
Iteration 11, loss = 0.67130396
Iteration 12, loss = 0.56298065
Iteration 13, loss = 0.49679017
Iteration 14, loss = 0.43873536
Iteration 15, loss = 0.39398754
Iteration 16, loss = 0.37499996
Iteration 17, loss = 0.35666856
Iteration 18, loss = 0.32242074
Iteration 19, loss = 0.30268779
Iteration 20, loss = 0.27134169
Iteration 21, loss = 0.25116113
Iteration 22, loss = 0.23589284
Iteration 23, loss = 0.22829532
Iteration 24, loss = 0.21889326
Iteration 25, loss = 0.20618449
Iteration 26, loss = 0.19285980
Iteration 27, loss = 0.17468511
Iteration 28, loss = 0.16629536
Iteration 29, loss = 0.17334362
Iteration 30, loss = 0.17942741
Iteration 31, loss = 0.17119307
Iteration 32, los

## 結論:
### 1. 因為前兩組資料選擇不同的activation 跟 solver ，效果並沒有差太多，因此只對於hidden layers = (30,) 跟 hidden layers = (512,) 做比較
### 2. 因 hidden layers = (512,) 跟 hidden layers = (60,60,60) 或 hidden layers = (512,512,512)，差別並沒有太大，只用一層512個神經元就可以達到跟三層60個神經元或是三層512個神經元的效果，所以只比較 hidden layers = (30,) 跟 hidden layers = (512,) 
### 3. hidden layers = (30,) 的準確率為93% ， hidden layers = (512,) 的準確率為94%，表現均不錯 

## PCA主成分分析(成分比例採0.95) + 神經網路NN

In [16]:
from sklearn.decomposition import PCA

hidden_layers = (30,)
activation = 'logistic'
opts = dict(hidden_layer_sizes = hidden_layers,verbose = True,activation = activation,tol = 1e-6,max_iter = int(1e6))
solver = 'adam'
clf_MLP = MLPClassifier(solver = solver, **opts)
clf_MLP.fit(Z_train,y_train)
predictions = clf_MLP.predict(Z_test)
print(classification_report(y_test,predictions))

Iteration 1, loss = 3.68186379
Iteration 2, loss = 3.62970329
Iteration 3, loss = 3.58747715
Iteration 4, loss = 3.54822854
Iteration 5, loss = 3.51050639
Iteration 6, loss = 3.47079332
Iteration 7, loss = 3.43047724
Iteration 8, loss = 3.38711651
Iteration 9, loss = 3.34105788
Iteration 10, loss = 3.29267801
Iteration 11, loss = 3.24205058
Iteration 12, loss = 3.18776879
Iteration 13, loss = 3.13178065
Iteration 14, loss = 3.07436468
Iteration 15, loss = 3.01507833
Iteration 16, loss = 2.95194607
Iteration 17, loss = 2.88618106
Iteration 18, loss = 2.81776393
Iteration 19, loss = 2.74835676
Iteration 20, loss = 2.67912425
Iteration 21, loss = 2.61037599
Iteration 22, loss = 2.54210000
Iteration 23, loss = 2.47496461
Iteration 24, loss = 2.40804952
Iteration 25, loss = 2.34195478
Iteration 26, loss = 2.27624321
Iteration 27, loss = 2.21211917
Iteration 28, loss = 2.14857630
Iteration 29, loss = 2.08589285
Iteration 30, loss = 2.02532318
Iteration 31, loss = 1.96610706
Iteration 32, los

訓練資料的classification report

In [17]:
pred = clf_MLP.predict(Z_train)
print(classification_report(y_train,pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        36
           1       1.00      1.00      1.00        43
           2       1.00      1.00      1.00        39
           3       1.00      1.00      1.00        47
           4       1.00      1.00      1.00        43
           5       1.00      1.00      1.00        50
           6       1.00      1.00      1.00        41
           7       1.00      1.00      1.00        47
           8       1.00      1.00      1.00        47
           9       1.00      1.00      1.00        44
          10       1.00      1.00      1.00        36
          11       1.00      1.00      1.00        40
          12       1.00      1.00      1.00        42
          13       1.00      1.00      1.00        44
          14       1.00      1.00      1.00        45
          15       1.00      1.00      1.00        54
          16       1.00      1.00      1.00        45
          17       1.00    

## 結論:
### 1. 當成分比例採0.95時，且只用 hidden layers = (30,) 準確率就能達到92%，因此不需要將hidden layers = (512,)，有可能會過度擬合，如果擴展到過多的神經元，其實資料無法提供足夠多的細節，也因此造成了許多無用、重複的數據。
### 2. 訓練資料的準確率已達100%