# 深度学习动手实验

## 任务介绍——语义分割

语义分割是当今计算机视觉领域的关键问题之一。从宏观上看，语义分割是一项高层次的任务，为实现场景的完整理解铺平了道路。场景理解作为一个核心的计算机视觉问题，其重要性在于越来越多的应用程序通过从图像中推断知识来提供营养。其中一些应用包括自动驾驶汽车、人机交互、虚拟现实等，近年来随着深度学习的普及，许多语义分割问题正在采用深层次的结构来解决，最常见的是卷积神经网络，在精度上大大超过了其他方法。以及效率。

![1](./Photos/seg1.jpeg)


## 神经网络

### Encoder-Decoder

![1](./Photos/ed.jpeg)

### UNet

![2](./Photos/unet.jpeg)

### UNet++

![3](./Photos/unetpp.jpeg)

In [1]:
import torch
import cv2 as cv
import numpy as np
import matplotlib.pyplot as plt

import torch.nn as nn
from torchvision import datasets, transforms
from torch.utils.data import Dataset, DataLoader, random_split

In [2]:
class unetpp(nn.Module):
    def __init__(self, in_channel, out_channel):
        super().__init__()
        self.conv_00 = DoubleConvLayers(in_channel, 64)
        self.conv_10 = DoubleConvLayers(64, 64)
        self.conv_20 = DoubleConvLayers(64, 64)
        self.conv_30 = DoubleConvLayers(64, 64)
        self.conv_01 = DoubleConvLayers(128, 64)
        self.conv_11 = DoubleConvLayers(128, 64)
        self.conv_21 = DoubleConvLayers(128, 64)
        self.conv_02 = DoubleConvLayers(192, 64)
        self.conv_12 = DoubleConvLayers(192, 64)
        self.conv_03 = DoubleConvLayers(256, 64)
        self.down_00 = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.down_10 = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.down_20 = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.up_00 = nn.Upsample(scale_factor=2)
        self.up_10 = nn.Upsample(scale_factor=2)
        self.up_20 = nn.Upsample(scale_factor=2)
        self.up_01 = nn.Upsample(scale_factor=2)
        self.up_11 = nn.Upsample(scale_factor=2)
        self.up_02 = nn.Upsample(scale_factor=2)
        self.conv_l1 = nn.Conv2d(64, out_channel, kernel_size=1, padding=0)
        self.conv_l2 = nn.Conv2d(64, out_channel, kernel_size=1, padding=0)
        self.conv_l3 = nn.Conv2d(64, out_channel, kernel_size=1, padding=0)

    def forward(self, x):
        x_00 = self.conv_00(x)
        down_00 = self.down_00(x_00)
        x_10 = self.conv_10(down_00)
        down_10 = self.down_10(x_10)
        x_20 = self.conv_20(down_10)
        down_20 = self.down_20(x_20)
        x_30 = self.conv_30(down_20)

        up_00 = self.up_00(x_10)
        up_10 = self.up_10(x_20)
        up_20 = self.up_20(x_30)

        x_01 = self.conv_01(torch.cat([x_00, up_00], dim=1))
        x_11 = self.conv_11(torch.cat([x_10, up_10], dim=1))
        x_21 = self.conv_21(torch.cat([x_20, up_20], dim=1))

        up_01 = self.up_01(x_11)
        up_11 = self.up_11(x_21)

        x_02 = self.conv_02(torch.cat([x_00, x_01, up_01], dim=1))
        x_12 = self.conv_12(torch.cat([x_10, x_11, up_11], dim=1))

        up_02 = self.up_02(x_12)

        x_03 = self.conv_03(torch.cat([x_00, x_01, x_02, up_02], dim=1))

        l1 = self.conv_l1(x_01)
        l2 = self.conv_l2(x_02)
        l3 = self.conv_l3(x_03)

        return (l1 + l2 + l3) / 3


class DoubleConvLayers(nn.Module):
    def __init__(self, in_channel, out_channel):
        super().__init__()
        self.ConvLayers = nn.Sequential(
            nn.Conv2d(in_channel, out_channel, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_channel),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channel, out_channel, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_channel),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        return self.ConvLayers(x)


class Down(nn.Module):
    def __init__(self, in_channel, out_channel):
        super().__init__()
        self.MaxPoolConv = nn.Sequential(
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0),
            DoubleConvLayers(in_channel, out_channel)
        )

    def forward(self, x):
        return self.MaxPoolConv(x)


class Up(nn.Module):
    def __init__(self, in_channel, out_channel):
        super().__init__()
        self.UpSample = nn.Upsample(scale_factor=2)
        self.ConvLayers = DoubleConvLayers(in_channel, out_channel)

    def forward(self, x1, x2):
        x1 = self.UpSample(x1)
        return self.ConvLayers(torch.cat([x1, x2], dim=1))

In [3]:
# VOC数据集的标注为边缘线框，UNet训练效果不理想，建议替换为其他数据集
class train():

    def __init__(self):
        
        self.batch_size = 1 # 每一轮训练的数据量
        self.num_workers = 1 # 参与数据准备的核心数
        self.shuffle = True # 随机打乱数据集

        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # 训练部署在CPU或者GPU
        self.net = unetpp(in_channel=3, out_channel=2)
        self.net.to(self.device)
        
        self.tf = transforms.Compose([
                transforms.Resize((224, 224)),
                transforms.ToTensor()
            ])
        
        # VOC
        train_dataset = datasets.VOCSegmentation(root='./Dataset', year='2007', image_set='train', download=True, transform=self.tf, target_transform=self.tf)
        valid_dataset = datasets.VOCSegmentation(root='./Dataset', year='2007', image_set='val', download=True, transform=self.tf, target_transform=self.tf)

        
        self.train_dataset_loader = DataLoader(train_dataset, shuffle=self.shuffle, batch_size=self.batch_size, num_workers=self.num_workers)
        self.valid_dataset_loader = DataLoader(valid_dataset, shuffle=self.shuffle, batch_size=1, num_workers=1)

    def save(self):
        torch.save(self.net.state_dict(), './Model/clf_network.pth')

    def load(self):
        self.net.load_state_dict(torch.load('./Model/clf_network.pth', map_location='cpu'))
        self.net.to(self.device)

    def train_net(self):
        
        self.net.train()
        epoch_n, epoch_loss = 0, 0.0
        optimizer = torch.optim.Adam(self.net.parameters(), lr=0.001)
        loss_func = torch.nn.CrossEntropyLoss()

        for img, true_target in self.train_dataset_loader:
            img = img.to(self.device).float()
            true_target = true_target.to(self.device)
            fake_target = self.net(img)
            true_target = true_target.squeeze().long()
            loss = loss_func(fake_target, true_target)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            batch_loss = loss.detach().cpu().numpy()
            epoch_n = epoch_n + int(img.shape[0])
            epoch_loss = epoch_loss + batch_loss
            
        epoch_loss = epoch_loss / epoch_n
        
        print("**********")
        print("task in train")
        print("epoch_n = " + str(epoch_n))
        print("epoch_loss = " + str(epoch_loss))
        print("**********")
            
    def valid_net(self):
    
        self.net.eval()
        with torch.no_grad():
            for img, true_target in self.valid_dataset_loader:
                img = img.to(self.device).float()
                fake_target = self.net(img)
                
                true_target = true_target.detach().cpu().numpy()[0]
                fake_target = fake_target.detach().cpu().numpy()[0]
                
                true_target = true_target[0]
                fake_target = fake_target.argmax(axis=0)
                
#                 true_target = true_target.astype(np.int64)
#                 fake_target = fake_target.astype(np.int64)
                
#                 true_target = cv.cvtColor(true_target, cv.COLOR_GRAY2BGR)
                
#                 cv.imshow('true_target', true_target*200)
#                 cv.imshow('fake_target', fake_target*200)
#                 cv.waitKey(1000)
#                 plt.figure()
                plt.clf()
                plt.figure()
                plt.subplot(1, 2, 1)
                plt.imshow(true_target*255, cmap='gray')
                plt.subplot(1, 2, 2)
                plt.imshow(fake_target*255, cmap='gray')
                plt.show()
                input('Continue')

### 心脏超声图像分割案例

![1](./Photos/ultrasound.jpg)

## 任务介绍——目标检测

目标检测（Object Detection）的任务是找出图像中所有感兴趣的目标（物体），确定它们的类别和位置，是计算机视觉领域的核心问题之一。由于各类物体有不同的外观、形状和姿态，加上成像时光照、遮挡等因素的干扰，目标检测一直是计算机视觉领域最具有挑战性的问题。
![1](./Photos/detach1.jpeg)

### 核心问题
1. 分类问题：即图片（或某个区域）中的图像属于哪个类别。
2. 定位问题：目标可能出现在图像的任何位置。
3. 大小问题：目标有各种不同的大小。
4. 形状问题：目标可能有各种不同的形状。
![2](./Photos/detach2.jpeg)

### R-CNN

R-CNN作者受AlexNet启发，尝试将图像分类迁移到PASCAL VOC的目标检测上。R-CNN利用候选区域的方法（Region Proposal），这也是该网络被称为R-CNN的原因：Regions with CNN features。对于小规模数据集的问题，R-CNN使用了微调的方法。并利用ImageNet对AlexNet预训练。

R-CNN目标检测的思路：
- 给定一张图片，从图片中选出2000个独立的候选区域(Region Proposal)
- 将每个候选区域输入到预训练好的AlexNet中，提取一个固定长度（4096）的特征向量
- 对每个目标（类别）训练一SVM分类器，识别该区域是否包含目标
- 训练一个回归器，修正候选区域中目标的位置：对于每个类，训练一个线性回归模型判断当前框是不是很完美。
![1](./Photos/rcnn.jpeg)

### YOLO

Yolo的将输入的图片分割成若干个网格，然后每个单元格负责去检测那些中心点落在该格子内的目标。目标的标签为（x, y, w, h, c），其中c为置信度）。
![1](./Photos/yolo1.jpeg)

Yolo采用卷积网络来提取特征，然后使用全连接层来得到预测值。网络结构参考GooLeNet模型，包含24个卷积层和2个全连接层。对于卷积层，主要使用1x1卷积来做通道压缩，然后紧跟3x3卷积。对于卷积层和全连接层，采用Leaky ReLU激活函数。但是最后一层却采用线性激活函数。
![2](./Photos/yolo2.jpeg)