# 机器学习 实验3

**基于全链接网络的手写数字体识别**

- 数据：MNIST data set
- 本题目考察如何设计并实现一个简单的图像分类器，设置本题目的目的如下：
1. 理解基本的图像识别流程的方法（预处理、训练、预测等阶段）
2. 实现一个全连接神经网络分类器
3. 理解不同的分类器之间的区别，以及使用不同的更新方法优化神经网络

- 课后作业：
1. 完成测试集上的测试过程

- 附加题： 
1. 尝试使用不同的损失函数和正则化方法，观察并分析其对实验结果的影响 
2. 尝试使用不同的优化算法，观察并分析其对训练过程和实验结果的影响， (如batch GD, online GD, mini-batch GD, SGD, 或其它的优化算法，如Momentum, Adsgrad, Adam, Admax)
3. 增加训练的epoch，并绘制loss变化的图像
4. 更改网络的结构，查看对训练过程额和最终结果有何影响

- 补充：MINST是一个手写数字数据集，包括了若干手写数字体及其对应的数字，共60000个训练样本，10000个测试样本。每个手写数字被表示为一个28*28的向量。  

## 1 准备数据+数据预处理

In [2]:
import torch
from torchvision import transforms
from torchvision.datasets import mnist  # 导入内置的 mnist 数据
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.optim as optim

In [3]:
transform = transforms.Compose([
    transforms.ToTensor(),   # 将图像转换为PyTorch的张量（tensor）格式，这是神经网络模型所需的数据格式
    transforms.Normalize([0.5], [0.5])   # 如果不缩放，像素值范围较大可能会导致梯度爆炸或梯度消失等问题，从而影响模型的训练效果
])

# 使用mnist.MNIST()函数创建训练集、测试集两个数据集对象
train_set = mnist.MNIST('./data copy', train=True, transform=transform, download=True)
test_set = mnist.MNIST('./data copy', train=False, transform=transform, download=True)

### 1.1 了解数据

In [4]:
a_data, a_label = train_set[0]
a_data.shape

torch.Size([1, 28, 28])

### 1.2 创建数据加载器（DataLoader）对象

用于在训练和测试神经网络模型时加载数据

In [5]:
from torch.utils.data import* # 导入所有需要的类与函数

train_data = DataLoader(train_set, batch_size=64,shuffle=True)
test_data = DataLoader(test_set, batch_size=128,shuffle=False)


## 2 neural network structure

In [9]:
from torch import nn

class FNN(nn.Module):
    def __init__(self):
        super(FNN,self).__init__()

        self.layer1 = nn.Sequential(
            nn.Linear(784,400),
            nn.ReLU()
        )

        self.layer2 = nn.Sequential(
            nn.Linear(400,200),
            nn.ReLU()
        )

        self.layer3 = nn.Sequential(
            nn.Linear(200,100),
            nn.ReLU()
        )

        self.layer4 = nn.Sequential(
            nn.Linear(100,10)
        )

    def forward(self,x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        output = self.layer4(x)

        return output

## 3 训练网络model

In [10]:
net = FNN()
net.parameters

<bound method Module.parameters of FNN(
  (layer1): Sequential(
    (0): Linear(in_features=784, out_features=400, bias=True)
    (1): ReLU()
  )
  (layer2): Sequential(
    (0): Linear(in_features=400, out_features=200, bias=True)
    (1): ReLU()
  )
  (layer3): Sequential(
    (0): Linear(in_features=200, out_features=100, bias=True)
    (1): ReLU()
  )
  (layer4): Sequential(
    (0): Linear(in_features=100, out_features=10, bias=True)
  )
)>

In [11]:
def train_model(net,train_data,loss_func,optimizer,num_epochs):
    train_losses = []
    train_accuracies = []

    for epoch in range(num_epochs):
        train_loss = 0
        correct = 0
        total = 0

        for image,label in train_data:
            image = image.view(image.size(0),-1)

            out = net(image)
            loss = loss_func(out,label)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item()

            # get predicted results
            _,predicted = torch.max(out.data,1)
            total += label.size(0)
            correct += (predicted == label).sum().item()

        avg_loss = train_loss / len(train_data)
        accuracy = correct / total

        train_losses.append(avg_loss)
        train_accuracies.append(accuracy)

        print(f'epoch:{epoch}, Train loss：{avg_loss:.6f},Train Accuracy: {accuracy:.6f}')

    return train_losses, train_accuracies

In [12]:
import torch
loss_cross = nn.CrossEntropyLoss()

optimizer_sgd = torch.optim.SGD(net.parameters(),weight_decay=1e-3,lr=1e-3)


In [14]:
losses_sgd, accs_sgd = train_model(net,train_data,loss_cross,optimizer_sgd,num_epochs=10)


epoch:0, Train loss：1.270624,Train Accuracy: 0.661267
epoch:1, Train loss：0.988138,Train Accuracy: 0.742067
epoch:2, Train loss：0.804196,Train Accuracy: 0.779083
epoch:3, Train loss：0.688883,Train Accuracy: 0.802833
epoch:4, Train loss：0.613475,Train Accuracy: 0.820283
epoch:5, Train loss：0.559467,Train Accuracy: 0.836083
epoch:6, Train loss：0.517604,Train Accuracy: 0.849917
epoch:7, Train loss：0.483484,Train Accuracy: 0.860817
epoch:8, Train loss：0.454978,Train Accuracy: 0.868900
epoch:9, Train loss：0.431787,Train Accuracy: 0.876283


## 4 尝试不同正则化、优化方法，并分析结果

- 使用SGD随机梯度下降算法，尝试不同的L2正则化的参数lambda
- 使用adam优化算法，正则化参数

In [15]:

# 定义全连接神经网络
class FullyConnectedNN(nn.Module):
    def __init__(self, input_size, hidden_sizes, output_size, activation=nn.ReLU()):
        """
        初始化全连接神经网络
        
        参数:
            input_size (int): 输入特征的数量
            hidden_sizes (list): 包含了每个隐藏层的神经元数量的列表
            output_size (int): 输出的类别数量
            activation (torch.nn.Module, optional): 激活函数，默认为ReLU
        """
        super(FullyConnectedNN, self).__init__()
        
        # 步骤1: 创建输入层到第一个隐藏层的线性变换
        self.input_layer = nn.Linear(input_size, hidden_sizes[0])
        
        # 步骤2: 创建隐藏层到隐藏层之间的线性变换和激活函数
        self.hidden_layers = nn.ModuleList([
            nn.Sequential(
                nn.Linear(hidden_sizes[i], hidden_sizes[i+1]),
                activation
            )
            for i in range(len(hidden_sizes) - 1)
        ])
        
        # 步骤3: 创建最后一个隐藏层到输出层的线性变换
        self.output_layer = nn.Linear(hidden_sizes[-1], output_size)
    
    def forward(self, x):
        """
        定义前向传播过程
        
        参数:
            x (torch.Tensor): 输入数据张量，形状为 (batch_size, input_size)
        
        返回:
            torch.Tensor: 输出数据张量，形状为 (batch_size, output_size)
        """
        # 步骤4: 输入数据通过输入层进行线性变换
        x = self.input_layer(x)
        
        # 步骤5: 数据通过每个隐藏层的线性变换和激活函数进行传递
        for hidden_layer in self.hidden_layers:
            x = hidden_layer(x)
        
        # 步骤6: 数据通过输出层的线性变换得到最终输出
        x = self.output_layer(x)
        return x


In [16]:
# 用不同优化算法、正则化参数训练

# 定义训练函数
def train(model, train_loader, test_loader, optimizer, criterion, num_epochs=5):
    train_losses = []
    test_losses = []
    train_accuracies = []
    test_accuracies = []

    for epoch in range(num_epochs):
        model.train()
        train_loss = 0.0
        correct_train = 0
        total_train = 0

        for images, labels in train_loader:
            images = images.view(images.size(0), -1)  # 展平图像为一维张量
            optimizer.zero_grad()  # 梯度清零
            outputs = model(images)  # 前向传播
            loss = criterion(outputs, labels)  # 计算损失
            loss.backward()  # 反向传播
            optimizer.step()  # 参数更新

            train_loss += loss.item()
            _, predicted_train = torch.max(outputs.data, 1)
            total_train += labels.size(0)
            correct_train += (predicted_train == labels).sum().item()

        train_losses.append(train_loss / len(train_loader))
        train_accuracy = correct_train / total_train
        train_accuracies.append(train_accuracy)

        # Evaluation on test set
        model.eval()
        test_loss = 0.0
        correct_test = 0
        total_test = 0

        with torch.no_grad():
            for images, labels in test_loader:
                images = images.view(images.size(0), -1)  # 展平图像为一维张量
                outputs = model(images)
                loss = criterion(outputs, labels)
                test_loss += loss.item()
                _, predicted_test = torch.max(outputs.data, 1)
                total_test += labels.size(0)
                correct_test += (predicted_test == labels).sum().item()

        # 计算平均测试损失和准确率
        test_losses.append(test_loss / len(test_loader))
        test_accuracy = correct_test / total_test
        test_accuracies.append(test_accuracy)

        print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {train_losses[-1]:.4f}, Train Acc: {train_accuracy:.4f}, Test Loss: {test_losses[-1]:.4f}, Test Acc: {test_accuracy:.4f}")

    return train_losses, test_losses, train_accuracies, test_accuracies


### 4.1 可视化不同优化算法、正则化参数的训练效果


In [17]:


# 数据预处理和加载
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

set1 = mnist.MNIST('./data', train=True, transform=transform, download=True)
set2 = mnist.MNIST('./data', train=False, transform=transform, download=True)
loader1 = DataLoader(set1, batch_size=64, shuffle=True)
loader2 = DataLoader(set2, batch_size=128, shuffle=False)


# 评估
def evaluate(model, test_loader, criterion):
    eval_loss = 0
    correct = 0
    total = 0
    model.eval()

    with torch.no_grad():
        for images, labels in test_loader:
            images = images.view(images.size(0), -1)
            outputs = model(images)
            loss = criterion(outputs, labels)
            eval_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    eval_loss /= len(test_loader)
    eval_acc = correct / total

    print('Eval Loss: {:.6f}, Eval Acc: {:.6f}'.format(eval_loss, eval_acc))

# 训练和评估
input_size = 784
hidden_sizes = [128, 64]
output_size = 10
num_epochs = 10

# 使用SGD优化算法，无正则化
model_sgd = FullyConnectedNN(input_size, hidden_sizes, output_size)
optimizer_sgd = optim.SGD(model_sgd.parameters(), lr=0.01)
criterion_sgd = nn.CrossEntropyLoss()
print("Training with SGD optimizer:")
train_losses_sgd, test_losses_sgd, train_accuracies_sgd, test_accuracies_sgd = train(model_sgd, loader1, loader2, optimizer_sgd, criterion_sgd, num_epochs)
evaluate(model_sgd, loader2, criterion_sgd)

# 使用Adam优化算法，设置L2正则化参数为0.001
model_adam = FullyConnectedNN(input_size, hidden_sizes, output_size)
optimizer_adam = optim.Adam(model_adam.parameters(), lr=0.001, weight_decay=0.001)
criterion_adam = nn.CrossEntropyLoss()
print("Training with Adam optimizer:")
train_losses_adam, test_losses_adam, train_accuracies_adam, test_accuracies_adam = train(model_adam, loader1, loader2, optimizer_adam, criterion_adam, num_epochs)
evaluate(model_adam, loader2, criterion_adam)

# 绘制损失曲线
import matplotlib.pyplot as plt
%matplotlib inline
# 绘制损失曲线
def plot_losses(train_losses, test_losses, title):
    plt.plot(train_losses, label='Train Loss')
    plt.plot(test_losses, label='Test Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title(title)
    plt.legend()
    plt.show()

# 绘制训练和测试损失曲线
plot_losses(train_losses_sgd, test_losses_sgd, 'SGD Optimizer Loss')
plot_losses(train_losses_adam, test_losses_adam, 'Adam Optimizer Loss')

Training with SGD optimizer:
Epoch 1/10, Train Loss: 0.9087, Train Acc: 0.7598, Test Loss: 0.4053, Test Acc: 0.8864
Epoch 2/10, Train Loss: 0.3738, Train Acc: 0.8929, Test Loss: 0.3260, Test Acc: 0.9030
Epoch 3/10, Train Loss: 0.3230, Train Acc: 0.9054, Test Loss: 0.2985, Test Acc: 0.9120
Epoch 4/10, Train Loss: 0.2984, Train Acc: 0.9126, Test Loss: 0.2826, Test Acc: 0.9151
Epoch 5/10, Train Loss: 0.2798, Train Acc: 0.9182, Test Loss: 0.2668, Test Acc: 0.9222
Epoch 6/10, Train Loss: 0.2640, Train Acc: 0.9223, Test Loss: 0.2582, Test Acc: 0.9219
Epoch 7/10, Train Loss: 0.2502, Train Acc: 0.9271, Test Loss: 0.2522, Test Acc: 0.9256
Epoch 8/10, Train Loss: 0.2368, Train Acc: 0.9311, Test Loss: 0.2329, Test Acc: 0.9302
Epoch 9/10, Train Loss: 0.2245, Train Acc: 0.9346, Test Loss: 0.2140, Test Acc: 0.9368
Epoch 10/10, Train Loss: 0.2132, Train Acc: 0.9375, Test Loss: 0.2119, Test Acc: 0.9387
Eval Loss: 0.211888, Eval Acc: 0.938700
Training with Adam optimizer:
Epoch 1/10, Train Loss: 0.3825

<Figure size 640x480 with 1 Axes>

<Figure size 640x480 with 1 Axes>

## 5 测试集上的准确率

同学们将测试过程补充在这里

In [12]:
def evaluate(model, test_loader, criterion):
    eval_loss = 0
    correct = 0
    total = 0
    model.eval()  # 将模型改为预测模式，不启用 BatchNormalization 和 Dropout

    with torch.no_grad():
        for images, labels in test_loader:
            images = images.view(images.size(0), -1)  # 将图片展平为一维向量

            # 前向传播计算loss
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            # 计算测试集上的总eval_loss
            eval_loss += loss.item()
            
            # 得到预测结果
            _, predicted = torch.max(outputs.data, 1)
            
            # 计算预测正确的图片数量
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    eval_loss /= len(test_loader)
    eval_acc = correct / total

    print('Eval Loss: {:.6f}, Eval Acc: {:.6f}'.format(eval_loss, eval_acc))

# 在训练完成后单独进行模型评估
evaluate(model_sgd, loader2, criterion_sgd)
evaluate(model_adam, loader2, criterion_adam)






Eval Loss: 0.264236, Eval Acc: 0.923500
Eval Loss: 0.135656, Eval Acc: 0.958500
