<h3>使用資料: face_data</h3>

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegressionCV
import numpy as np
from sklearn.svm import SVC, LinearSVC
from sklearn.neural_network import MLPClassifier

<h3>函式說明:</h3>
將資料 X 和標籤 以70%、30%分成訓練集和測試集，並且進行標準化。<br>
利用主成分分析（PCA）將訓練集和測試集轉換為二維空間，並將轉換後的訓練集和測試集存儲在變數 Z_train 和 Z_test 中。<br>
利用邏輯回歸、支持向量機和多層感知機三種分類器，針對原始資料和 PCA 轉換後的資料分別進行評估，計算測試資料的準確度並將其存儲在相應的變數中。<br>
最後函數返回所有評估結果的陣列，包括原始資料的邏輯回歸、支持向量機和多層感知機的準確度，PCA 轉換後的資料的邏輯回歸、支持向量機和多層感知機的準確度。


In [21]:
def  classifier (X,y,dataname,h):
    data=dataname
    # Split data into training and testing data 7:3
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30) 
    # Standardize data
    scaler = StandardScaler()
    X_train_ = scaler.fit_transform(X_train)
    X_test_ = scaler.fit_transform(X_test)

#PCA
    pca = PCA(n_components = 2).fit(X_train_)
    Z_train = pca.transform(X_train_)
    Z_test = pca.transform(X_test_)

#LogisticRegression
    LR=np.zeros(3)
    LR_CV=np.zeros(3)
    LR_pca=np.zeros(3)
    LR_CV_pca=np.zeros(3)

    Cs = np.logspace(-5, 5, 20)
    opts = dict(tol = 1e-6, max_iter = int(1e6), verbose=1)
    solver = ['lbfgs','liblinear','newton-cg'] 
    for s in range(3):
        clf_original = LogisticRegression(solver = solver[s], **opts)
        clf_original.fit(X_train_, y_train)
        y_pred = clf_original.predict(X_test_)
        # 測 試 資 料 之 準 確 率 回 報
        LR[s]= accuracy_score(y_test, y_pred)

        clf_PCA = LogisticRegression(solver = solver[s], **opts)
        clf_PCA.fit(Z_train, y_train)
        LR_pca[s]=clf_PCA.score(Z_test, y_test)

 # SVM
    SVM=np.zeros((2,3))
    SVM_pca=np.zeros((2,3))
    C = 1 # SVM regularization parameter
    opts = [dict(C = C, tol = 1e-6, max_iter = int(1e6)),dict(C = C, decision_function_shape = 'ovo', tol = 1e-6, max_iter = int(1e6))]
    
    for i in range(2):
        for j in range(3):
            clf_svm = [SVC(kernel="linear", **opts[i]),\
            SVC(kernel="rbf", gamma=0.2, **opts[i]),\
             SVC(kernel="poly", degree=3, gamma="auto", **opts[i])]
             #LinearSVC(**opts[i]) ]

            clf_svm[j].fit(X_train, y_train)
            predictions = clf_svm[j].predict(X_test)
            SVM[i][j]= accuracy_score(y_test, predictions)
            
            clf_svm[j].fit(Z_train, y_train) #pca
            predictions = clf_svm[j].predict(Z_test)
            SVM_pca[i][j]= accuracy_score(y_test, predictions)

#MLPClassifier
    MLP= np.zeros((2,2))
    MLP_pca = np.zeros((2,2))
    hidden_layers = (h,)
    activation = ['logistic','relu']
    solver = ['adam','sgd']
    for i in range(2):
        for j in range(2):
            opts = dict(hidden_layer_sizes = hidden_layers, verbose = True, \
            activation = activation[i], tol = 1e-6, max_iter = int(1e6))
            clf_MLP = MLPClassifier(solver = solver[j], **opts)
            clf_MLP.fit(X_train, y_train)
            predictions_mlp = clf_MLP.predict(X_test)
            MLP[i][j]= accuracy_score(y_test, predictions_mlp)

            clf_MLP.fit(Z_train, y_train) #pca
            predictions_mlp = clf_MLP.predict(Z_test)
            MLP_pca[i][j]= accuracy_score(y_test, predictions_mlp)

    return data,LR,SVM,MLP,LR_pca,SVM_pca,MLP_pca

In [14]:
import scipy.io
D = scipy.io.loadmat('allFaces.mat')
X = D['faces'] # 32256 x 2410, each column represents an image
num = np.ndarray.flatten(D['nfaces']) #每位人臉的張數
#每張影像的大小與人數
m = int(D['m']) # 168
n = int(D['n']) # 192
n_persons = int(D['person']) # 38

將y加入label

In [19]:
y= []
for i in range(len(num)):
    for j in range(num[i]):
        y.append(i)

從中隨機選取了2410個影像，作為機器學習模型的輸入資料

In [23]:
indices = np.random.permutation(X.shape[1])
_X = X[indices[:2410], :]
y=np.array(y)
_y = y[indices[:2410]]
print(indices)

[2292  378 1824 ... 1121 1605 1865]


In [25]:
data,LR,SVM,MLP,LR_pca,SVM_pca,MLP_pca =classifier(_X,_y,"allFace.csv",30)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   54.8s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.9s finished


[LibLinear][LibLinear]

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   56.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    1.1s finished


Iteration 1, loss = 3.71031890
Iteration 2, loss = 3.68453823
Iteration 3, loss = 3.66762072
Iteration 4, loss = 3.65604378
Iteration 5, loss = 3.64828677
Iteration 6, loss = 3.64200657
Iteration 7, loss = 3.63864245
Iteration 8, loss = 3.63713875
Iteration 9, loss = 3.63616718
Iteration 10, loss = 3.63932296
Iteration 11, loss = 3.63787916
Iteration 12, loss = 3.63686447
Iteration 13, loss = 3.63631425
Iteration 14, loss = 3.63816405
Iteration 15, loss = 3.63938431
Iteration 16, loss = 3.63805600
Iteration 17, loss = 3.63694957
Iteration 18, loss = 3.63621825
Iteration 19, loss = 3.63598689
Iteration 20, loss = 3.63537714
Iteration 21, loss = 3.63521245
Iteration 22, loss = 3.63536257
Iteration 23, loss = 3.63544277
Iteration 24, loss = 3.63526832
Iteration 25, loss = 3.63525232
Iteration 26, loss = 3.63522201
Iteration 27, loss = 3.63562696
Iteration 28, loss = 3.63536492
Iteration 29, loss = 3.63510992
Iteration 30, loss = 3.63512970
Iteration 31, loss = 3.63537891
Iteration 32, los

In [26]:
print("dataname = ",data,"\n","LR = ",LR,"\n","SVM= ",SVM,"\n","MLP= ",MLP,\
      "\n--------------\n",\
      "LR_pca= ",LR_pca,"\n","SVM_pca= ",SVM_pca,"\n","MLP_pca= ",MLP_pca)

dataname =  allFace.csv 
 LR =  [0.54633472 0.47994467 0.54633472] 
 SVM=  [[0.96957123 0.01798064 0.96403873]
 [0.96957123 0.01798064 0.96403873]] 
 MLP=  [[0.02074689 0.07883817]
 [0.01798064 0.01798064]] 
--------------
 LR_pca=  [0.08713693 0.07607192 0.0857538 ] 
 SVM_pca=  [[0.10511757 0.22544952 0.11341632]
 [0.10511757 0.22544952 0.11341632]] 
 MLP_pca=  [[0.18810512 0.08990318]
 [0.17427386 0.06639004]]


<h3>實驗結果觀察</h3>
根據以上實驗結果, 只有SVM能夠達到較高的正確率,就連羅集思迴歸都只有0.5的正確率,且經過pca的資料集正確率相當低,因此對於此資料集,非常不建議進行pca,且只建議使用SVM
