Pytorch迁移学习高阶教程
====
本节介绍迁移学习中重要的两个finetuning方案：
----
1.freeze parameteres <br>
2.load some of parameters

<h1>1.freeze parameteres

详细参考http://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html#convnet-as-fixed-feature-extractor<br>

假设model使我们的整体模型<br>
``for param in model.parameters():
    param.requires_grad = False``
<br>requires_grad是指不进行求导--关闭参数求导变化

<h1>2.load some of parameters

这是本节的重点，我们将举一个例子帮助我们理解。<br>
（考虑到freeze parameters在pytorch官方教程中以有，我们跳过那个阶段）<br>
首先看怎么导入全部参数

BiRNN+FC完成MNIST识别，如若看不懂代码可以直接跳过这部分，看之后的导入参数<br>
该代码自动检测GPU可否运行

In [1]:
import torch
import torch.nn as nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.autograd import Variable

# Hyper Parameters
sequence_length = 28
input_size = 28
hidden_size = 128
num_layers = 2
num_classes = 10
batch_size = 100
num_epochs = 2
learning_rate = 0.003

# MNIST Dataset
train_dataset = dsets.MNIST(root='../data/',
                            train=True,
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='../data/',
                           train=False,
                           transform=transforms.ToTensor())

# Data Loader (Input Pipeline)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

GPU_FLAG = torch.cuda.is_available()


# BiRNN Model (Many-to-One)
class BiRNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(BiRNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers,
                            batch_first=True, bidirectional=True)
        self.fc = nn.Linear(hidden_size * 2, num_classes)  # 2 for bidirection

    def forward(self, x):
        # Set initial states
        if GPU_FLAG:
            h0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda()  # 2 for bidirection
            c0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda()
        else:
            h0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size))  # 2 for bidirection
            c0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size))

        # Forward propagate RNN
        out, _ = self.lstm(x, (h0, c0))

        # Decode hidden state of last time step
        out = self.fc(out[:, -1, :])
        return out


rnn = BiRNN(input_size, hidden_size, num_layers, num_classes)
if GPU_FLAG:
    rnn.cuda()
else:
    pass

# Loss and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)

# Train the Model
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        if GPU_FLAG:
            images = Variable(images.view(-1, sequence_length, input_size)).cuda()
            labels = Variable(labels).cuda()
        else:
            images = Variable(images.view(-1, sequence_length, input_size))
            labels = Variable(labels)

        # Forward + Backward + Optimize
        optimizer.zero_grad()
        outputs = rnn(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        if (i + 1) % 100 == 0:
            print ('Epoch [%d/%d], Step [%d/%d], Loss: %.4f'
                   % (epoch + 1, num_epochs, i + 1, len(train_dataset) // batch_size, loss.data[0]))

# Test the Model
correct = 0
total = 0
for images, labels in test_loader:
    images = Variable(images.view(-1, sequence_length, input_size)).cuda()
    outputs = rnn(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted.cpu() == labels).sum()

print('Test Accuracy of the model on the 10000 test images: %d %%' % (100 * correct / total))

# Save the Model
torch.save(rnn.state_dict(), 'rnn.pkl')

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Processing...
Done!
Epoch [1/2], Step [100/600], Loss: 0.5616
Epoch [1/2], Step [200/600], Loss: 0.3109
Epoch [1/2], Step [300/600], Loss: 0.1398
Epoch [1/2], Step [400/600], Loss: 0.0776
Epoch [1/2], Step [500/600], Loss: 0.0810
Epoch [1/2], Step [600/600], Loss: 0.0532
Epoch [2/2], Step [100/600], Loss: 0.2020
Epoch [2/2], Step [200/600], Loss: 0.1019
Epoch [2/2], Step [300/600], Loss: 0.0305
Epoch [2/2], Step [400/600], Loss: 0.0510
Epoch [2/2], Step [500/600], Loss: 0.0842
Epoch [2/2], Step [600/600], Loss: 0.1936
Test Accuracy of the model on the 10000 test images: 97 %


至此在本地文件夹中存在一个rnn.pkl文件，里面保存着我们的模型参数<br>
保存模型参数：``torch.save(rnn.state_dict(), 'rnn.pkl')``<br>
导入模型参数：``rnn.load_state_dict(torch.load('rnn.pkl'))``<br>
更详细过程参考：https://www.aiboy.pub/2017/06/05/How_To_Save_And_Restore_Model/

pytorch中的模型文件是按照网络的变量名字依次导入，是一个序列字典<br>
若是模型的网络层与pkl文件的不完全一致，会出现错误

In [3]:
# BiRNN Model (Many-to-One)
class BiRNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(BiRNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers,
                            batch_first=True, bidirectional=True)
        self.fc1 = nn.Linear(hidden_size * 2, num_classes)  # 2 for bidirection

    def forward(self, x):
        # Set initial states
        if GPU_FLAG:
            h0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda()  # 2 for bidirection
            c0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda()
        else:
            h0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size))  # 2 for bidirection
            c0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size))

        # Forward propagate RNN
        out, _ = self.lstm(x, (h0, c0))

        # Decode hidden state of last time step
        out = self.fc(out[:, -1, :])
        return out


rnn = BiRNN(input_size, hidden_size, num_layers, num_classes)
torch.save(rnn.state_dict(), 'rnn.pkl')

例如这段代码中的最后一层网络fc变为fc1，不再是fc了。出现错误

``rnn.load_state_dict(torch.load('rnn.pkl'))``
<br>等价于
``
state_dict = torch.load('rnn.pkl')
for name, param in state_dict.items():
    before = rnn.state_dict()[name]
    rnn.state_dict()[name].copy_(param)
``
<br>
按照这种方式就可以随意导入局部参数
