<a href="https://colab.research.google.com/github/Bingle-labake/deeplearn/blob/master/transferlearning/6_%E8%BF%81%E7%A7%BB%E5%AD%A6%E4%B9%A0_%E8%8A%B1%E6%9C%B5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##迁移学习
通常，你不需要自己训练整个卷积网络。现代卷积网络在多个 GPU 上用 ImageNet 等大型数据集进行训练时，需要花费数周时间。



> 大多数人都使用预训练的网络作为固定的特征提取器，或作为初始网络进行微调。



在此 notebook 中，你将使用在 [ImageNet dataset ](http://www.image-net.org/)上训练过的 [VGGNet](https://arxiv.org/pdf/1409.1556.pdf) 作为特征提取器。下面是 VGGNet 架构的图表，它包含一系列卷积层和最大池化层，最后是三个全连接层，用于根据 ImageNet 数据库中的 1000 个类别进行分类。
![替代文字](https://viewuf1qztkwjj.udacity-student-workspaces.com/notebooks/notebook_ims/vgg_16_architecture.png)

VGGNet 很强大，因为它很简单，并且效果很棒，在 ImageNet 竞赛中获得了第二名的好成绩。我们的做法是保留所有卷积层，但是将最后的全连接层替换成我们自己的分类器。这样，我们就可以将 VGGNet 当做图像的特征提取器，然后在此基础之上训练简单的分类器。




*   使用除最后一个全连接层之外的所有其他层级作为固定的特征提取器。
定义一个新的最后分类层级，并将其应用到我们的任务上！
*   要详细了解迁移学习，请[参阅 CS231n 斯坦福课程笔记](http://cs231n.github.io/transfer-learning/)。



---



##花朵图像
我们将使用 VGGNet 分类花朵图像。首先导入资源。然后检查能否在 GPU 上训练模型。

###下载数据
你可以在此课程的资源部分找到花朵数据 zip 文件并下载到本地环境中。对于此 notebook，我们已下载数据并将其放在目录 flower_photos/ 下。

In [0]:
import os
import numpy as np
import torch

import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt

%matplotlib inline

In [2]:
# check if CUDA is available
train_on_gpu = torch.cuda.is_available()

if not train_on_gpu:
    print('CUDA is not available.  Training on CPU ...')
else:
    print('CUDA is available!  Training on GPU ...')

CUDA is available!  Training on GPU ...


###加载并转换数据
我们将使用 PyTorch 的ImageFolder 类就能轻松地从目录中加载数据。例如，训练图像全存储在如下所示的目录路径中：

root/class_1/xxx.png

root/class_1/xxy.png

root/class_1/xxz.png

root/class_2/123.png

root/class_2/nsdf3.png

root/class_2/asd932_.png

其中训练根文件夹是 flower_photos/train/，类别是花朵类型的名称。

In [0]:
# define training and test data directories
data_dir = 'flower_photos/'
train_dir = os.path.join(data_dir, 'train/')
test_dir = os.path.join(data_dir, 'test/')

# classes are folders in each directory with these names
classes = ['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']

###转换数据
在进行迁移学习时，我们需要将输入数据变形为符合预训练模型期望的形状。VGG16 要求输入数据是 224 维的方形图像，因此我们将相应地调整每张花朵图像的大小。

In [0]:
# load and transform data using ImageFolder

# VGG-16 Takes 224x224 images as input, so we resize all of them
data_transform = transforms.Compose([transforms.RandomResizedCrop(224), 
                                      transforms.ToTensor()])

train_data = datasets.ImageFolder(train_dir, transform=data_transform)
test_data = datasets.ImageFolder(test_dir, transform=data_transform)

# print out some data stats
print('Num training images: ', len(train_data))
print('Num test images: ', len(test_data))

###数据加载器和数据可视化¶

In [0]:
# define dataloader parameters
batch_size = 20
num_workers=0

# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, 
                                           num_workers=num_workers, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, 
                                          num_workers=num_workers, shuffle=True)

In [0]:
# Visualize some sample data

# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy() # convert images to numpy for display

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):
    ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])
    plt.imshow(np.transpose(images[idx], (1, 2, 0)))
    ax.set_title(classes[labels[idx]])

###定义模型
要定义训练模型，我们将执行以下步骤： 1.加载预训练的 VGG16 模型 2.“冻结”所有参数，使该网络变成固定的特征提取器 3.删除最后一个层级 4.将最后一个层级替换为我们所选的线性分类器

####冻结是指在训练过程中，不会更改预训练模型中的参数。

In [0]:
# Load the pretrained model from pytorch
vgg16 = models.vgg16(pretrained=True)

# print out the model structure
print(vgg16)

In [0]:
print(vgg16.classifier[6].in_features) 
print(vgg16.classifier[6].out_features) 

In [0]:
# Freeze training for all "features" layers
for param in vgg16.features.parameters():
    param.requires_grad = False

###最终分类器层级
获得预训练的特征提取器后，只需修改和/或添加到最后的全连接分类器层级。在此练习中，我们建议替换 vgg 分类器层级中的最后一个层级。



> 这一层的输入应该是无需更改的网络部分生成的特征数量，并为花朵分类任务生成相应数量的输出。



你可以通过名称以及（有时候是）数字访问预训练网络中的任何层级，例如 vgg16.classifier[6] 是叫做“classifier”的层级群组中的第六个层级。

###TODO：将最后的全连接层替换为生成相应数量的类别分数的层级。

In [0]:
## TODO: add a last linear layer  that maps n_inputs -> 5 flower classes
## new layers automatically have requires_grad = True
n_inputs = vgg16.classifier[6].in_features
last_layer = torch.nn.Linear(n_inputs, len(classes))
vgg16.classifier[6] = last_layer

# after completing your model, if GPU is available, move the model to GPU
if train_on_gpu:
    vgg16.cuda()
    
print(vgg16.classifier)

###指定[损失函数](https://pytorch.org/docs/stable/nn.html#loss-functions)和[优化器](https://pytorch.org/docs/stable/optim.html)
下面，我们将使用交叉熵损失和随机梯度下降法，并设定很小的学习速率。注意，优化器仅接受可训练的参数 vgg.classifier.parameters() 作为输入。

In [0]:
import torch.optim as optim

# specify loss function (categorical cross-entropy)
criterion = torch.nn.CrossEntropyLoss()

# specify optimizer (stochastic gradient descent) and learning rate = 0.001
optimizer = optim.SGD(vgg16.classifier.parameters(), lr=0.001)

###训练
现在训练网络。



> 练习：到目前为止，我们一直为你提供了训练代码。现在你将接受挑战并编写训练网络的代码。如果你需要帮助，可以查看我的解决方案。



In [0]:
# number of epochs to train the model
n_epochs = 2

## TODO complete epoch and training batch loops
## These loops should update the classifier-weights of this model
## And track (and print out) the training loss over time
for epoch in range(1, n_epochs+1):

    # keep track of training and validation loss
    train_loss = 0.0
    
    ###################
    # train the model #
    ###################
    # model by default is set to train
    for batch_i, (data, target) in enumerate(train_loader):
        # move tensors to GPU if CUDA is available
        if train_on_gpu:
            data, target = data.cuda(), target.cuda()
        # clear the gradients of all optimized variables
        optimizer.zero_grad()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = vgg16(data)
        # calculate the batch loss
        loss = criterion(output, target)
        # backward pass: compute gradient of the loss with respect to model parameters
        loss.backward()
        # perform a single optimization step (parameter update)
        optimizer.step()
        # update training loss 
        train_loss += loss.item()
        
        if batch_i % 20 == 19:    # print training loss every specified number of mini-batches
            print('Epoch %d, Batch %d loss: %.16f' %
                  (epoch, batch_i + 1, train_loss / 20))
            train_loss = 0.0




---

###测试
下面是每个花朵类别的测试准确率

In [0]:
# track test loss 
# over 5 flower classes
test_loss = 0.0
class_correct = list(0. for i in range(5))
class_total = list(0. for i in range(5))

vgg16.eval() # eval mode

# iterate over test data
for data, target in test_loader:
    # move tensors to GPU if CUDA is available
    if train_on_gpu:
        data, target = data.cuda(), target.cuda()
    # forward pass: compute predicted outputs by passing inputs to the model
    output = vgg16(data)
    # calculate the batch loss
    loss = criterion(output, target)
    # update  test loss 
    test_loss += loss.item()*data.size(0)
    # convert output probabilities to predicted class
    _, pred = torch.max(output, 1)    
    # compare predictions to true label
    correct_tensor = pred.eq(target.data.view_as(pred))
    correct = np.squeeze(correct_tensor.numpy()) if not train_on_gpu else np.squeeze(correct_tensor.cpu().numpy())
    # calculate test accuracy for each object class
    for i in range(batch_size):
        label = target.data[i]
        class_correct[label] += correct[i].item()
        class_total[label] += 1

# calculate avg test loss
test_loss = test_loss/len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))

for i in range(5):
    if class_total[i] > 0:
        print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
            classes[i], 100 * class_correct[i] / class_total[i],
            np.sum(class_correct[i]), np.sum(class_total[i])))
    else:
        print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))

print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (
    100. * np.sum(class_correct) / np.sum(class_total),
    np.sum(class_correct), np.sum(class_total)))

###可视化示例测试结果¶

In [0]:
# obtain one batch of test images
dataiter = iter(test_loader)
images, labels = dataiter.next()
images.numpy()

# move model inputs to cuda, if GPU available
if train_on_gpu:
    images = images.cuda()

# get sample outputs
output = vgg16(images)
# convert output probabilities to predicted class
_, preds_tensor = torch.max(output, 1)
preds = np.squeeze(preds_tensor.numpy()) if not train_on_gpu else np.squeeze(preds_tensor.cpu().numpy())

# plot the images in the batch, along with predicted and true labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):
    ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])
    plt.imshow(np.transpose(images[idx], (1, 2, 0)))
    ax.set_title("{} ({})".format(classes[preds[idx]], classes[labels[idx]]),
                 color=("green" if preds[idx]==labels[idx].item() else "red"))