# RNN 做图像分类
前面我们讲了 RNN 特别适合做序列类型的数据，那么 RNN 能不能想 CNN 一样用来做图像分类呢？下面我们用 mnist 手写字体的例子来展示一下如何用 RNN 做图像分类，但是这种方法并不是主流，这里我们只是作为举例。

For a handwritten font image, the size is 28 * 28, we can think of it as a sequence of length 28, each sequence has a feature of 28, which is

![](https://ws4.sinaimg.cn/large/006tKfTcly1fmu7d0byfkj30n60djdg5.jpg)

So we solved the problem of the input sequence, what about the output sequence? In fact, it's very simple. Although our output is a sequence, we only need to keep one of them as the output. In this case, it is best to keep the last result, because the last result has the information of all the previous sequences. Like below

![](https://ws3.sinaimg.cn/large/006tKfTcly1fmu7fpqri0j30c407yjr8.jpg)

Below we show directly through examples


In [1]:
import sys
sys.path.append('..')

import torch
from torch.autograd import Variable
from torch import nn
from torch.utils.data import DataLoader

from torchvision import transforms as tfs
from torchvision.datasets import MNIST

In [2]:
# Define data
data_tf = tfs.Compose([
    tfs.ToTensor(),
tfs.Normalize([0.5], [0.5]) # normalization
])

train_set = MNIST('./data', train=True, transform=data_tf)
test_set = MNIST('./data', train=False, transform=data_tf)

train_data = DataLoader(train_set, 64, True, num_workers=4)
test_data = DataLoader(test_set, 128, False, num_workers=4)

In [3]:
# Define model
class rnn_classify(nn.Module):
    def __init__(self, in_feature=28, hidden_feature=100, num_class=10, num_layers=2):
        super(rnn_classify, self).__init__()
Self.rnn = nn.LSTM(in_feature, hidden_feature, num_layers) # Use two layers lstm
Self.classifier = nn.Linear(hidden_feature, num_class) # Use the full join of the output of the last rnn to get the final classification result
        
    def forward(self, x):
        '''
The x size is (batch, 1, 28, 28), so we need to convert it to the input form of RNN, ie (28, batch, 28)
        '''
x = x.squeeze() # Remove 1 from (batch, 1, 28, 28) to (batch, 28, 28)
x = x.permute(2, 0, 1) # Put the last dimension into the first dimension and become (28, batch, 28)
Out, _ = self.rnn(x) # Using the default hidden state, the resulting out is (28, batch, hidden_feature)
Out = out[-1, :, :] # Take the last one in the sequence, the size is (batch, hidden_feature)
Out = self.classifier(out) # Get the classification result
        return out

In [4]:
net = rnn_classify()
criterion = nn.CrossEntropyLoss()

optimzier = torch.optim.Adadelta(net.parameters(), 1e-1)

In [5]:
#开始培训
from utils import train
train(net, train_data, test_data, 10, optimzier, criterion)

Epoch 0. Train Loss: 1.858605, Train Acc: 0.318347, Valid Loss: 1.147508, Valid Acc: 0.578125, Time 00:00:09
Epoch 1. Train Loss: 0.503072, Train Acc: 0.848514, Valid Loss: 0.300552, Valid Acc: 0.912579, Time 00:00:09
Epoch 2. Train Loss: 0.224762, Train Acc: 0.934785, Valid Loss: 0.176321, Valid Acc: 0.946499, Time 00:00:09
Epoch 3. Train Loss: 0.157010, Train Acc: 0.953392, Valid Loss: 0.155280, Valid Acc: 0.954015, Time 00:00:09
Epoch 4. Train Loss: 0.125926, Train Acc: 0.962137, Valid Loss: 0.105295, Valid Acc: 0.969640, Time 00:00:09
Epoch 5. Train Loss: 0.104938, Train Acc: 0.968450, Valid Loss: 0.091477, Valid Acc: 0.972805, Time 00:00:10
Epoch 6. Train Loss: 0.089124, Train Acc: 0.973481, Valid Loss: 0.104799, Valid Acc: 0.969343, Time 00:00:09
Epoch 7. Train Loss: 0.077920, Train Acc: 0.976679, Valid Loss: 0.084242, Valid Acc: 0.976661, Time 00:00:10
Epoch 8. Train Loss: 0.070259, Train Acc: 0.978795, Valid Loss: 0.078536, Valid Acc: 0.977749, Time 00:00:09
Epoch 9. Train Loss

It can be seen that training 10 times also achieved 98% accuracy on the simple mnist dataset, so RNN can also do simple image classification, but this is not his main battlefield. Speaking of a usage scenario of RNN, time series prediction.
