<a href="https://colab.research.google.com/github/Panh2/DimensionalityReduction/blob/master/7403.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

安装相应包

In [1]:
!pip install sklearn scipy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sklearn
  Downloading sklearn-0.0.post1.tar.gz (3.6 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: sklearn
  Building wheel for sklearn (setup.py) ... [?25l[?25hdone
  Created wheel for sklearn: filename=sklearn-0.0.post1-py3-none-any.whl size=2955 sha256=4a78b4ce0bfe4642eb1c3e501329d4d6158082bd2c14e970e028aa7298ef49d6
  Stored in directory: /root/.cache/pip/wheels/f8/e0/3d/9d0c2020c44a519b9f02ab4fa6d2a4a996c98d79ab2f569fa1
Successfully built sklearn
Installing collected packages: sklearn
Successfully installed sklearn-0.0.post1


下载cifar-10数据集,该数据集具有较高的维度（每个样本具有784个特征）

In [13]:
!wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
!tar -xzvf cifar-10-python.tar.gz

--2023-04-10 13:24:05--  https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
Resolving www.cs.toronto.edu (www.cs.toronto.edu)... 128.100.3.30
Connecting to www.cs.toronto.edu (www.cs.toronto.edu)|128.100.3.30|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 170498071 (163M) [application/x-gzip]
Saving to: ‘cifar-10-python.tar.gz’


2023-04-10 13:24:07 (75.0 MB/s) - ‘cifar-10-python.tar.gz’ saved [170498071/170498071]

cifar-10-batches-py/
cifar-10-batches-py/data_batch_4
cifar-10-batches-py/readme.html
cifar-10-batches-py/test_batch
cifar-10-batches-py/data_batch_3
cifar-10-batches-py/batches.meta
cifar-10-batches-py/data_batch_2
cifar-10-batches-py/data_batch_5
cifar-10-batches-py/data_batch_1


解压数据集

导入相关包

In [14]:
import numpy as np
import os
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix


加载数据集

In [15]:
import pickle
import numpy as np
import os

def load_cifar10(data_folder):
    # 训练数据
    with open(os.path.join(data_folder, 'data_batch_1'), 'rb') as f:
        train_dict = pickle.load(f, encoding='bytes')
        X_train = train_dict[b'data']
        y_train = np.array(train_dict[b'labels'])
        
    # 测试数据
    with open(os.path.join(data_folder, 'test_batch'), 'rb') as f:
        test_dict = pickle.load(f, encoding='bytes')
        X_test = test_dict[b'data']
        y_test = np.array(test_dict[b'labels'])
    
    # 将像素值缩放到[0, 1]之间
    X_train = X_train / 255.0
    X_test = X_test / 255.0
    
    return X_train, y_train, X_test, y_test

最小马氏距离分类器实现

In [16]:
def mahalanobis_distance(x, mean, inv_cov):
    diff = x - mean
    return np.sqrt(np.dot(np.dot(diff, inv_cov), diff.T))

def min_mahalanobis_classifier(x, class_means, inv_cov_matrices):
    distances = [mahalanobis_distance(x, mean, inv_cov) for mean, inv_cov in zip(class_means, inv_cov_matrices)]
    return np.argmin(distances)
    

读取数据集

In [18]:
data_folder = '/content/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_cifar10(data_folder)


先使用PCA将784维降至50维，然后用LDA将50维降至9维

In [23]:
# PCA
pca = PCA(n_components=10)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

# LDA
lda = LDA(n_components=9)
X_train_lda = lda.fit_transform(X_train, y_train)
X_test_lda = lda.transform(X_test)


计算每个类别的均值和协方差矩阵的逆，供计算马氏距离

In [24]:
class_means_pca = []
class_means_lda = []
inv_cov_matrices_pca = []
inv_cov_matrices_lda = []

for i in range(10):
    class_data_pca = X_train_pca[y_train == i]
    class_mean_pca = np.mean(class_data_pca, axis=0)
    class_means_pca.append(class_mean_pca)

    cov_matrix_pca = np.cov(class_data_pca.T)
    inv_cov_matrix_pca = np.linalg.pinv(cov_matrix_pca)
    inv_cov_matrices_pca.append(inv_cov_matrix_pca)

for i in range(10):
    class_data_lda = X_train_lda[y_train == i]
    class_mean_lda = np.mean(class_data_lda, axis=0)
    class_means_lda.append(class_mean_lda)

    cov_matrix_lda = np.cov(class_data_lda.T)
    inv_cov_matrix_lda = np.linalg.pinv(cov_matrix_lda)
    inv_cov_matrices_lda.append(inv_cov_matrix_lda)

对最终测试数据进行分类（使用最小马氏距离分类器）并计算准确率

In [25]:
y_pred_pca = [min_mahalanobis_classifier(x, class_means_pca, inv_cov_matrices_pca) for x in X_test_pca]
y_pred_lda = [min_mahalanobis_classifier(x, class_means_lda, inv_cov_matrices_lda) for x in X_test_lda]
accuracy_pca= accuracy_score(y_test, y_pred_pca)
accuracy_lda= accuracy_score(y_test, y_pred_lda)
print(f"Classification accuracy: {accuracy_pca}")
print(f"Classification accuracy: {accuracy_lda}")

Classification accuracy: 0.3274
Classification accuracy: 0.2583
