**实验目标：**

通过本实验，你将深入了解和实践说话人识别技术，并掌握利用声音特征进行有效说话人识别的基本方法，了解不同特征和模型对识别准确率的影响。

实验的核心目标是使用TIMIT数据集来训练一个说话人识别系统，涵盖数据预处理、特征提取、模型训练和评估等关键步骤。


**实验方法：**

**1. 数据预处理和划分(可选)：**
  - 为了方便大家，我们提供了划分好的TIMIT数据集结构，当然你也可以根据训练结果自行划分该原数据集。
  - 原数据集下载地址：https://drive.google.com/file/d/180mSIiXN9RVDV2Xn1xcWNkMRm5J5MjN4/view?usp=sharing
  - 我们排除了SA的两个方言句子，并在剩余的8个句子中选取了SX的5个句子和SI的1个句子作为训练集，SI的另外2个句子作为测试集。
  
**2. 特征提取：**
  - 学习并实现包括但不限于MFCC特征等特征的提取，探索声音信号的频率和时间特性。
  - 鼓励尝试和比较其他特征提取方法，例如LPCC或声谱图特征，以理解不同特征对识别性能的影响。
  
**3. 模型选择和训练：**
  - 探索并选择适合的分类器和模型进行说话人识别，如GMM、Softmax分类器或深度学习模型。
  - 实现模型训练流程，使用训练集数据训练模型。
  
**4. 评估和分析：**
  - 使用准确率作为主要的评价指标在测试集上评估模型性能。
  - 对比不同特征和模型的性能，分析其对说话人识别准确率的影响。
  - 可视化不同模型的识别结果和错误率，讨论可能的改进方法。

**实验要求：**
  - 1.选择并实现至少一种特征的提取，并鼓励尝试其他特征提取方法。
  - 2.选择并实现至少一种分类器或模型进行说话人识别，并使用准确率评估指标评估其性能。
  - 3.通过实验对比、分析和可视化，撰写详细的实验报告，包括实验目的、实验方法、结果分析和结论。
  - 4.实验报告应以清晰、逻辑性强的形式呈现，图表和结果应清楚明了。

## 1. 实验准备

In [1]:
## 导入必要的库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score

# 可以根据需要导入其他库，比如librosa用于音频处理

import os
import librosa
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

# !pip install hmmlearn
from hmmlearn import hmm
from sklearn.metrics import accuracy_score

# !pip install --user tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from tensorflow.keras.optimizers import Adam


## 2. 数据预处理(加载数据集)

In [2]:
TrainDir = "Dataset\TRAIN"
TestDir = "Dataset\TEST"
## 请在这里写代码加载我们划分好的TIMIT训练集和测试集

# 定义函数加载数据集
def load_dataset(data_dir, le = None):
    data = []
    labels = []
    print(os.path.exists(data_dir))
    for speaker_dir in os.listdir(data_dir):
        speaker_path = os.path.join(data_dir, speaker_dir)
        for speaker_id in os.listdir(speaker_path):
            speaker_id_path = os.path.join(speaker_path, speaker_id)
            for file_name in os.listdir(speaker_id_path):
                file_path = os.path.join(speaker_id_path, file_name)
                # 在这里你可以处理每个音频文件，例如提取特征并将特征作为数据，将说话者ID作为标签
                # 这里暂时只返回文件路径，具体特征提取需要在后续完成
                data.append(file_path)
                labels.append(speaker_id)
    if le is None:
        le = LabelEncoder()
        le.fit(labels)
    y = le.transform(labels) # back: inverse_transform

    return data, y, le

# 加载训练集和测试集数据
train_data, train_labels, le = load_dataset(TrainDir)
test_data, test_labels, _ = load_dataset(TestDir, le)
print(train_labels)
print(len(set(train_labels)))
for l in train_labels:
    if l == len(set(train_labels)):
        print(l)
# 打印训练集和测试集大小
print("训练集大小:", len(train_data))
print("测试集大小:", len(test_data))

True
True
[ 16  16  16 ... 422 422 422]
462
训练集大小: 3234
测试集大小: 924


## 3. 特征提取

In [3]:
## 请编写或使用库函数提取MFCC等音频特征

num_mfcc=20
max_length = 100
input_channels = 5
# 定义函数提取MFCC特征
def extract_mfcc(file_path, num_mfcc=20, max_length = 100, input_channels = 5): # (default num_mfcc: 20)
    # 加载音频文件
    audio, sr = librosa.load(file_path, sr=None)
    # 提取MFCC特征
    mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=num_mfcc)
    # mfccs = np.mean(mfccs.T, axis=0)  # 对时间轴上的MFCC进行平均

    features = []
    # 遍历每个MFCC特征序列
    for index, feature_sequence in enumerate(mfccs):
        processed_features = []
        # 如果特征序列的长度大于最大长度，则切片成多段特征
        if len(feature_sequence) > max_length:
            num_slices = len(feature_sequence) // max_length
            remainder = len(feature_sequence) % max_length
            if remainder != 0:
                num_slices += 1
            for i in range(num_slices):
                start_idx = i * max_length
                end_idx = min((i + 1) * max_length, len(feature_sequence))
                sliced_feature = feature_sequence[start_idx:end_idx]
                # 如果切片长度不足最大长度，则用零填充
                if len(sliced_feature) < max_length:
                    padded_feature = np.zeros(max_length - len(sliced_feature))
                    sliced_feature = np.concatenate((sliced_feature, padded_feature), axis=0)
                processed_features.append(sliced_feature)
        # 如果特征序列的长度小于最大长度，则用零填充
        elif len(feature_sequence) < max_length:
            padded_feature = np.zeros(max_length - len(feature_sequence))
            padded_sequence = np.concatenate((feature_sequence, padded_feature), axis=0)
            processed_features.append(padded_sequence)
        # 如果特征序列的长度等于最大长度，则直接添加
        else:
            processed_features.append(feature_sequence)
        processed_features = np.array(processed_features)
        features.append(processed_features.T)
    # 将列表转换为数组
    features = np.array(features)
    
    # 初始化一个数组来存储处理后的特征
    processed_features = np.empty((0, max_length, input_channels))

    # 遍历每个MFCC特征序列
    for feature_sequence in features:
        # 进行通道变换
        if feature_sequence.shape[1] != input_channels:
            if feature_sequence.shape[1] < input_channels:
                # 如果MFCC特征通道数少于期望的通道数，则在末尾填充零
                padded_feature = np.zeros((feature_sequence.shape[0], input_channels - feature_sequence.shape[1]))
                feature_sequence = np.concatenate((feature_sequence, padded_feature), axis=1)
            else:
                # 如果MFCC特征通道数多于期望的通道数，则保留前input_channels个通道
                feature_sequence = feature_sequence[:, :input_channels]
        # 将特征序列添加到数组中
        processed_features = np.append(processed_features, [feature_sequence], axis=0)
    
    return processed_features # mfccs

# 测试MFCC特征提取
sample_mfcc = extract_mfcc(train_data[0])
print("MFCC特征形状:", sample_mfcc.shape)

# 加载训练集和测试集数据
train_mfcc = [extract_mfcc(file_path) for file_path in train_data]
test_mfcc = [extract_mfcc(file_path) for file_path in test_data]

print(train_mfcc[0].shape, train_mfcc[30].shape)
# LPCC

# 声谱图特征


MFCC特征形状: (20, 100, 5)
(20, 100, 5) (20, 100, 5)


## 4. 模型选择和训练

In [None]:
## 在这部分，你可以选择不同的分类器和模型如GMM模型来进行实验


In [24]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
# 定义CNN模型
class SimpleCNN(nn.Module):
    def __init__(self, input_channels, num_classes):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=input_channels, out_channels=3, kernel_size=3, stride=1, padding=1)
        self.relu = nn.ReLU()
        self.max_pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, stride=1, padding=1)
        self.max_pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.fc1 = nn.Linear(3*5*25, 128)  # 根据输入尺寸计算
        self.fc2 = nn.Linear(128, num_classes)  # 有num_classes个类别
    def forward(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        x = self.max_pool1(x)
        # print(x.shape) # [32, 3, 10, 50]
        x = self.conv2(x)
        x = self.relu(x)
        x = self.max_pool2(x)
        # print(x.shape) # [32, 3, 5, 25]
        x = x.reshape(x.shape[0], -1)  # 展平多维卷积层输出
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return F.softmax(x)
# 实例化模型
num_classes = len(set(train_labels))
model = SimpleCNN(input_channels, num_classes)
# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

print(type(train_mfcc), type(train_labels), torch.tensor(train_labels))
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(torch.tensor(np.array(train_mfcc)).float().permute(0, 3, 1, 2), torch.tensor(train_labels), test_size=0.2, random_state=42)
print(X_train.shape[0])

batch_size = 32

# 训练模型
for epoch in range(10):  # 进行10个训练周期
    for i in range(0, X_train.shape[0], batch_size):
        # print(i)
        # 前向传播
        try:
            x = X_train[i:i+batch_size]
            y = y_train[i:i+batch_size]
            # print(x.shape, y.shape)
        except:
            continue
        outputs = model(x)
        loss = criterion(outputs, y.long())  # 假设的目标
        # 反向传播和优化
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if (i//batch_size+1)%20 == 0:
            print(f'Epoch [{epoch+1}/10], Batch [{i//batch_size+1}/{X_train.shape[0]//batch_size}], Loss: {loss.item():.4f}')
        

<class 'list'> <class 'numpy.ndarray'> tensor([ 16,  16,  16,  ..., 422, 422, 422], dtype=torch.int32)
2587


  return F.softmax(x)


Epoch [1/10], Batch [20/80], Loss: 6.1370
Epoch [1/10], Batch [40/80], Loss: 6.1363
Epoch [1/10], Batch [60/80], Loss: 6.1356
Epoch [1/10], Batch [80/80], Loss: 6.1356
Epoch [2/10], Batch [20/80], Loss: 6.1354
Epoch [2/10], Batch [40/80], Loss: 6.1353
Epoch [2/10], Batch [60/80], Loss: 6.1361
Epoch [2/10], Batch [80/80], Loss: 6.1341
Epoch [3/10], Batch [20/80], Loss: 6.1341
Epoch [3/10], Batch [40/80], Loss: 6.1366
Epoch [3/10], Batch [60/80], Loss: 6.1391
Epoch [3/10], Batch [80/80], Loss: 6.1392
Epoch [4/10], Batch [20/80], Loss: 6.1382
Epoch [4/10], Batch [40/80], Loss: 6.1391
Epoch [4/10], Batch [60/80], Loss: 6.1391
Epoch [4/10], Batch [80/80], Loss: 6.1393
Epoch [5/10], Batch [20/80], Loss: 6.1392
Epoch [5/10], Batch [40/80], Loss: 6.1391
Epoch [5/10], Batch [60/80], Loss: 6.1391
Epoch [5/10], Batch [80/80], Loss: 6.1392
Epoch [6/10], Batch [20/80], Loss: 6.1380
Epoch [6/10], Batch [40/80], Loss: 6.1387
Epoch [6/10], Batch [60/80], Loss: 6.1389
Epoch [6/10], Batch [80/80], Loss:

In [4]:
from tensorflow.keras.utils import to_categorical

print(len(train_mfcc),len(train_labels))
num_classes = len(set(train_labels))

# 创建一个顺序模型
model = Sequential()

# 添加卷积层和池化层
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(num_mfcc, max_length, input_channels)))
model.add(MaxPooling2D((2, 2), padding='same'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2), padding='same'))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# 添加扁平层
model.add(Flatten())

# 添加全连接层
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))  # 对于多分类问题，最后一层使用softmax激活函数，并且输出单元数量等于类别数量

3234 3234


  super().__init__(


In [7]:
# 编译模型
model.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(np.array(train_mfcc), to_categorical(np.array(train_labels), num_classes), test_size=0.2, random_state=42)
print(X_train.shape)
# 训练模型
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test))
# 评估模型
loss, accuracy = model.evaluate(np.array(test_mfcc), to_categorical(np.array(test_labels), num_classes))
print(f"Test accuracy: {accuracy:.2f}")
# 使用模型进行预测
# predictions = model.predict(X_test)

(2587, 20, 100, 5)
Epoch 1/50
[1m81/81[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 144ms/step - accuracy: 0.0040 - loss: 6.1359 - val_accuracy: 0.0000e+00 - val_loss: 6.3331
Epoch 2/50
[1m81/81[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 138ms/step - accuracy: 0.0050 - loss: 6.1379 - val_accuracy: 0.0000e+00 - val_loss: 6.3323
Epoch 3/50
[1m81/81[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 175ms/step - accuracy: 0.0027 - loss: 6.1395 - val_accuracy: 0.0000e+00 - val_loss: 6.3315
Epoch 4/50
[1m81/81[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 142ms/step - accuracy: 0.0015 - loss: 6.1350 - val_accuracy: 0.0000e+00 - val_loss: 6.3307
Epoch 5/50
[1m81/81[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 197ms/step - accuracy: 0.0053 - loss: 6.1293 - val_accuracy: 0.0000e+00 - val_loss: 6.3302
Epoch 6/50
[1m81/81[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 172ms/step - accuracy: 0.0034 - loss: 6.1390 - val_accuracy: 0.0000e+0

## 5. 评价指标(准确率Accuracy)

In [5]:
## 请编写代码或使用库函数accuracy_score计算测试集上的准确率Accuracy

## 5. 评价指标(准确率Accuracy)

# 定义函数计算预测结果
def predict(models, features):
    predictions = []
    for feature in features:
        max_score = float("-inf")
        predicted_label = None
        for label, model in models.items():
            score = model.score(feature)
            if score > max_score:
                max_score = score
                predicted_label = label
        predictions.append(predicted_label)
    return predictions

# 进行预测
test_predictions = predict(trained_models, test_mfcc)

# 计算准确率
accuracy = accuracy_score(test_labels, test_predictions)
print("测试集准确率:", accuracy)

##  6. 分析和可视化

In [6]:
## 请使用matplotlib等可视化库对你的实验结果进行可视化分析。
## 包括但不限于准确率的对比、错误分类的分析、特征的影响等。


## 7. 结果讨论
讨论你的模型性能，尝试解释为什么某些模型比其他模型表现好，以及可能的改进方法。

## 8. 保存模型（可选）
如果需要，可以在这里添加代码保存你的模型。