# LeNet模型
LeNet分为**卷积层块**和**全连接层**块两个部分。下面我们分别介绍这两个模块。

## 卷积层块
在卷积层块中，每个卷积层都使用`5×5`的窗口，并在输出上使用`sigmoid`激活函数。

`channels`:
第一个卷积层——6，
第二个卷积层——16。

卷积层块最大池化层的窗口形状:`2×2`，步幅为`2`。

卷积层块的输出形状为`(批量大小, 通道, 高, 宽)`。


## 全连接层块
全连接层块会将卷积层输出小批量中每个样本变平（flatten）。

全连接层的输入形状将变成二维，`其中第一维是小批量中的样本，第二维是每个样本变平后的向量表示`，且向量长度为通道、高和宽的乘积。

`3`个全连接层,输出个数:`120、84和10`

In [1]:
import torch
import time
from torch import nn,optim
from tqdm import tqdm

import sys
sys.path.append("..") 
import d2lzh_pytorch as d2l
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

D:\Anaconda\envs\torch\lib\site-packages\numpy\.libs\libopenblas.JPIJNSWNNAN3CE6LLI5FWSPHUT2VXMTH.gfortran-win_amd64.dll
D:\Anaconda\envs\torch\lib\site-packages\numpy\.libs\libopenblas.XWYDX2IKJW2NMTWSFYNGFUWKQU3LYTCZ.gfortran-win_amd64.dll
  stacklevel=1)


In [2]:
class LeNet(nn.Module):
    def __init__(self):
        super(LeNet,self).__init__()
        
        self.conv = nn.Sequential(
            nn.Conv2d(1,6,5),# in_channels, out_channels, kernel_size
            nn.Sigmoid(),
            nn.MaxPool2d(2,2),# kernel_size, stride
            nn.Conv2d(6,16,5),
            nn.Sigmoid(),
            nn.MaxPool2d(2,2)
        )
        
        self.fc = nn.Sequential(
            nn.Linear(16*4*4,120),
            nn.Sigmoid(),
            nn.Linear(120, 84),
            nn.Sigmoid(),
            nn.Linear(84, 10))
        
    def forward(self, img):
        feature= self.conv(img)
        output = self.fc(feature.view(img.shape[0], -1))
        return output

In [3]:
net = LeNet()
print(net)

LeNet(
  (conv): Sequential(
    (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
    (1): Sigmoid()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
    (4): Sigmoid()
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc): Sequential(
    (0): Linear(in_features=256, out_features=120, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=120, out_features=84, bias=True)
    (3): Sigmoid()
    (4): Linear(in_features=84, out_features=10, bias=True)
  )
)


## 获取数据和训练模型
下面我们来实验LeNet模型。实验中，我们仍然使用Fashion-MNIST作为训练数据集。

因为卷积神经网络计算比多层感知机要复杂，建议使用GPU来加速计算。因此，修改evaluate_accuracy函数，使其支持GPU计算。

In [4]:
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size=batch_size)

In [5]:
def evaluate_accuracy(data_iter, net, device=None):
    if device ==None and isinstance(net,nn.Module):
        device = list(net.parameters())[0].device
    acc_sum,n = 0.0,0 # 总准确数目，样本数
    with torch.no_grad():
        for X,y in tqdm(data_iter):
            if isinstance(net, torch.nn.Module):
                net.eval() # 评估模式, 这会关闭dropout
                acc_sum += (net(X.to(device)).argmax(dim=1) == y.to(device)).float().sum().cpu().item()
                net.train() # 改回训练模式
            else: # 自定义的模型, 3.13节之后不会用到, 不考虑GPU
                if('is_training' in net.__code__.co_varnames): # 如果有is_training这个参数
                    # 将is_training设置成False
                    acc_sum += (net(X, is_training=False).argmax(dim=1) == y).float().sum().item() 
                else:
                    acc_sum += (net(X).argmax(dim=1) == y).float().sum().item() 
            n += y.shape[0]
    return acc_sum / n

def train_ch5(net,train_iter,test_iter,batch_size,optimizer,device,num_epochs):
    net = net.to(device)
    print("training on ", device)
    loss = torch.nn.CrossEntropyLoss() # loss(input, target),input:(N,C)样本数以及,target:(N,)0<=target[i]<=C-1
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n, batch_count, start = 0.0, 0.0, 0, 0, time.time()
        for X,y in tqdm(train_iter):
            X = X.to(device)
            y = y.to(device)
            y_hat = net(X)
            l = loss(y_hat,y)
            optimizer.zero_grad()
            l.backward()
            optimizer.step()
            train_l_sum+=l.cpu().item()
            train_acc_sum+= (y_hat.argmax(dim=1) == y).sum().cpu().item()
            n += y.shape[0]
            batch_count += 1
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'
              % (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start))

In [6]:
lr, num_epochs = 0.001, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)

  0%|          | 0/235 [00:00<?, ?it/s]

training on  cuda


100%|██████████| 235/235 [00:10<00:00, 23.15it/s]
100%|██████████| 40/40 [00:01<00:00, 38.54it/s]
  1%|▏         | 3/235 [00:00<00:07, 29.79it/s]

epoch 1, loss 1.8416, train acc 0.330, test acc 0.580, time 11.2 sec


100%|██████████| 235/235 [00:07<00:00, 32.72it/s]
100%|██████████| 40/40 [00:01<00:00, 36.77it/s]
  1%|▏         | 3/235 [00:00<00:08, 27.35it/s]

epoch 2, loss 0.9649, train acc 0.624, test acc 0.671, time 8.3 sec


100%|██████████| 235/235 [00:07<00:00, 33.33it/s]
100%|██████████| 40/40 [00:01<00:00, 36.58it/s]
  2%|▏         | 4/235 [00:00<00:07, 31.40it/s]

epoch 3, loss 0.8083, train acc 0.701, test acc 0.712, time 8.1 sec


100%|██████████| 235/235 [00:06<00:00, 34.02it/s]
100%|██████████| 40/40 [00:01<00:00, 36.76it/s]
  1%|▏         | 3/235 [00:00<00:08, 28.92it/s]

epoch 4, loss 0.7171, train acc 0.733, test acc 0.739, time 8.0 sec


100%|██████████| 235/235 [00:06<00:00, 34.07it/s]
100%|██████████| 40/40 [00:01<00:00, 37.77it/s]

epoch 5, loss 0.6614, train acc 0.750, test acc 0.756, time 8.0 sec



