# 背景

在深度学习之前，最火热的方法是SVM，各种核。

核函数判断相关性：在高维空间，两个点的相关性。

核方法是通过变换空间，将空间变换成想要的样子。

然后转换为凸优化问题进行求解。

核方法有一套漂亮的定理。

## 特征工程

尤其是计算机视觉领域，比如提取各种各样的特征点描述子。特征提取更重要，而模型相对没那么重要。

## 深度学习的起源

数据量一直在增加。算力核RAM都更是成倍增加。

imagenet是一个重要转折点。

# AlexNet

更深更大的LeNet

主要改进：

- dropout
- ReLu
- MaxPooling
- 数据增广（非常重要）

改变：

我们不再思考人工特征提取，而是直接端到端



![alexnet](./img/alexnet.png)

# coding

In [2]:
import torch
from torch import nn

  from .autonotebook import tqdm as notebook_tqdm


In [11]:
# 注意我们这里使用fashion数据集，所以输入channel = 1

net = nn.Sequential(
    nn.Conv2d(1, 96, kernel_size=11, stride=4, padding=1),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2),
    nn.Conv2d(96, 256, kernel_size=5, padding=2),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2),
    nn.Conv2d(256, 384, kernel_size=3, padding=1),
    nn.ReLU(),
    nn.Conv2d(384, 384, kernel_size=3, padding=1),
    nn.ReLU(),
    nn.Conv2d(384, 256, kernel_size=3, padding=1),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2),
    nn.Flatten(),
    nn.Linear(6400, 4096), nn.ReLU(), nn.Dropout(p=0.5),
    nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(p=0.5),
    nn.Linear(4096, 10)
    )

In [12]:
# 使用这个能够轻易看出网络的哪层算错了
X = torch.rand(size=(1,1,224,224), dtype=torch.float32)

for layer in net:
    X = layer(X)
    print(layer.__class__.__name__, 'output shape: \t\t', X.shape)

Conv2d output shape: 		 torch.Size([1, 96, 54, 54])
ReLU output shape: 		 torch.Size([1, 96, 54, 54])
MaxPool2d output shape: 		 torch.Size([1, 96, 26, 26])
Conv2d output shape: 		 torch.Size([1, 256, 26, 26])
ReLU output shape: 		 torch.Size([1, 256, 26, 26])
MaxPool2d output shape: 		 torch.Size([1, 256, 12, 12])
Conv2d output shape: 		 torch.Size([1, 384, 12, 12])
ReLU output shape: 		 torch.Size([1, 384, 12, 12])
Conv2d output shape: 		 torch.Size([1, 384, 12, 12])
ReLU output shape: 		 torch.Size([1, 384, 12, 12])
Conv2d output shape: 		 torch.Size([1, 256, 12, 12])
ReLU output shape: 		 torch.Size([1, 256, 12, 12])
MaxPool2d output shape: 		 torch.Size([1, 256, 5, 5])
Flatten output shape: 		 torch.Size([1, 6400])
Linear output shape: 		 torch.Size([1, 4096])
ReLU output shape: 		 torch.Size([1, 4096])
Dropout output shape: 		 torch.Size([1, 4096])
Linear output shape: 		 torch.Size([1, 4096])
ReLU output shape: 		 torch.Size([1, 4096])
Dropout output shape: 		 torch.Size([1, 409

基本就是通过不断增加channel，将一个image拉长，最后与全联接层连在一起。

In [15]:
import torchvision
from torchvision import transforms
from torch.utils import data

batch_size = 256
def load_data_fashion_mnist(batch_size, resize=None):
    trans = [transforms.ToTensor()]
    # 如果有resize，先做resize，然后再转换到tensor
    if resize:
        trans.insert(0, transforms.Resize(resize))
    trans = transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(root='./data/', train=True, transform=trans, download=False)
    mnist_test = torchvision.datasets.FashionMNIST("./data", train=False, transform=trans, download=False)
    return (data.DataLoader(mnist_train, batch_size, shuffle=True, num_workers=8), 
           data.DataLoader(mnist_test, batch_size, shuffle=True, num_workers=8))
  
train_iter, test_iter = load_data_fashion_mnist(batch_size, resize=224)

In [16]:
def train(net, train_iter, test_iter, num_epochs, lr):
    def init_weights(m):
        if type(m)==nn.Linear or type(m) == nn.Conv2d:
            nn.init.xavier_uniform_(m.weight)
            
    net.apply(init_weights)
    print('begin training')
    
    optimizer = torch.optim.SGD(net.parameters(), lr=lr)
    loss = nn.CrossEntropyLoss()
    
    for epoch in range(num_epochs):
        net.train()
        for i, (X, y) in enumerate(train_iter):
            optimizer.zero_grad()
            y_hat = net(X)
            L = loss(y_hat, y)
            L.backward()
            optimizer.step()
            
        print(f'loss: {L.sum()}')

In [17]:
# 太慢了啊， 我没有gpu，这里就放弃了
# 沐神电脑上最终输出的 0.88的精度

lr = 0.05
num_epochs = 10
train(net, train_iter, test_iter, num_epochs, lr)

begin training


KeyboardInterrupt: 