### 张量(tensor)
* 张量创建

<img src="pic/pic1.png" width = "50%" />

* 张量操作
* 张量广播机制
* 索引和切片
  * `0`表示第一个元素的索引, `-1`表示最后一个元素的索引

In [13]:
import torch
x = torch.rand(4, 3)
print("tensor1:\n", x)
x = torch.randn(4, 3)
print("tensor2:\n", x)
x = torch.zeros(4, 3, dtype=torch.long)
print("tensor3:\n", x)
x = torch.ones(4, 3, dtype=torch.float32)
print("tensor4:\n", x)
x = torch.tensor([1, 2, 3])
print("tensor5:\n", x)
x = torch.arange(0, 10, 2)
print("tensor6:\n", x)
x = torch.linspace(0, 10, 6)
print("tensor7:\n", x, x.shape, x.size())

tensor1:
 tensor([[0.4464, 0.3749, 0.5388],
        [0.9854, 0.3688, 0.3479],
        [0.9173, 0.3776, 0.4577],
        [0.8208, 0.8751, 0.3079]])
tensor2:
 tensor([[-0.1097,  1.3128, -0.3790],
        [-0.1051,  0.9571,  0.9689],
        [ 0.7867, -0.6162, -0.6264],
        [-0.6189, -0.8200, -0.9993]])
tensor3:
 tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])
tensor4:
 tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
tensor5:
 tensor([1, 2, 3])
tensor6:
 tensor([0, 2, 4, 6, 8])
tensor7:
 tensor([ 0.,  2.,  4.,  6.,  8., 10.]) torch.Size([6]) torch.Size([6])


In [4]:
import torch
x = torch.rand(2, 3)
y = torch.rand(2, 3)

print("x, y:\n", x, "\n", y)

print("add operation:\n", x+y, "\n", torch.add(x, y), "\n", y.add_(x))

print("Hadamard product:\n", x*y, "\n", torch.mul(x, y), "\n", y.mul_(x))

print("dot product:\n", x@y.t(), "\n", torch.mm(x, y.t()), "\n", torch.matmul(x, y.t()))

print("norm:\n", x.norm(2), "\n", torch.norm(x, 2))

print("index operation:\n", x[:, 1], "\n", x[0, :])

# torch.view()返回的新tensor与源tensor共享内存，即更改其中一个，另一个也会跟着改变
print("dimension operation:\n", x.view(3, 2), "\n", x.view(6, 1), "\n", x.view(1, 6), "\n", x.view(6))

x, y:
 tensor([[0.1218, 0.9291, 0.9322],
        [0.8839, 0.8989, 0.9763]]) 
 tensor([[0.9876, 0.6483, 0.6951],
        [0.6778, 0.6812, 0.9287]])
add operation:
 tensor([[1.1094, 1.5774, 1.6273],
        [1.5617, 1.5800, 1.9050]]) 
 tensor([[1.1094, 1.5774, 1.6273],
        [1.5617, 1.5800, 1.9050]]) 
 tensor([[1.1094, 1.5774, 1.6273],
        [1.5617, 1.5800, 1.9050]])
Hadamard product:
 tensor([[0.1351, 1.4657, 1.5170],
        [1.3804, 1.4202, 1.8598]]) 
 tensor([[0.1351, 1.4657, 1.5170],
        [1.3804, 1.4202, 1.8598]]) 
 tensor([[0.1351, 1.4657, 1.5170],
        [1.3804, 1.4202, 1.8598]])
dot product:
 tensor([[2.7925, 3.2215],
        [2.9179, 4.3124]]) 
 tensor([[2.7925, 3.2215],
        [2.9179, 4.3124]]) 
 tensor([[2.7925, 3.2215],
        [2.9179, 4.3124]])
norm:
 tensor(2.0711) 
 tensor(2.0711)
index operation:
 tensor([0.9291, 0.8989]) 
 tensor([0.1218, 0.9291, 0.9322])
dimension operation:
 tensor([[0.1218, 0.9291],
        [0.9322, 0.8839],
        [0.8989, 0.9763]]) 


In [19]:
import torch
x = torch.arange(1, 3).view(1, 2)
print("x:\n", x)
y = torch.arange(1, 4).view(3, 1)
print("y:\n", y)
print("x+y:\n", x+y)

x:
 tensor([[1, 2]])
y:
 tensor([[1],
        [2],
        [3]])
x+y:
 tensor([[2, 3],
        [3, 4],
        [4, 5]])


### 张量/模型的保存与读写

In [3]:
import torch
from torch import nn
from torch.nn import functional as F

x = torch.arange(4)
y = torch.zeros(4)
torch.save([x, y], "tensor.pt")
x2, y2 = torch.load("tensor.pt")
print("x2, y2:\n", x2, y2)

x2, y2:
 tensor([0, 1, 2, 3]) tensor([0., 0., 0., 0.])


In [4]:
my_dict = {"x": x, "y": y}
torch.save(my_dict, "tensor_dict.pt")
my_dict_ = torch.load("tensor_dict.pt")
print("my_dict_:\n", my_dict_)

my_dict_:
 {'x': tensor([0, 1, 2, 3]), 'y': tensor([0., 0., 0., 0.])}


In [6]:
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.act = nn.ReLU()
        self.output = nn.Linear(256, 10)
    
    def forward(self, x):
        return self.output(self.act(self.hidden(x)))
    
net = MLP()
X = torch.randn(2, 20)
Y = net(X)
print("Y:\n", Y)

torch.save(net.state_dict(), "mlp.params")
clone = MLP()
clone.load_state_dict(torch.load("mlp.params"))
print("clone:\n", clone.eval()(X) == Y)

Y:
 tensor([[-0.3254, -0.2811, -0.2944, -0.0904,  0.0741,  0.0741,  0.2947, -0.0412,
         -0.4655, -0.4603],
        [ 0.1803,  0.2825,  0.2915, -0.3536,  0.2567,  0.0936,  0.0678,  0.1204,
         -0.0874, -0.0599]], grad_fn=<AddmmBackward0>)
clone:
 tensor([[True, True, True, True, True, True, True, True, True, True],
        [True, True, True, True, True, True, True, True, True, True]])


### 自动求导(autograd)
`torch.Tensor`是autograd核心, 如果设置属性`.requires_grad_`为`True`则会追踪对于该张量的所有操作; 完成计算后可以通过调用`.backward()`, 来自动计算所有的梯度, 该张量的所有梯度会自动累加到`.grad`属性上, 因此在每次运行反向传播之前要将梯度清零

1. 举例对函数$y=2x^Tx$关于列向量$x$求导
2. 分离计算
3. 非标量求梯度时要转为标量, 不可以对非标量进行`backward`。可以使用求和来求梯度, 因为求和并不影响梯度的求解结果

In [10]:
import torch
x = torch.arange(4.0)
print("x:\n", x)
x.requires_grad_(True)
print("x.grad:\n", x.grad)
y = 2*torch.dot(x, x)
print("y:\n", y)
y.backward()
print("x.grad:\n", x.grad)
print("test grad:\n", x.grad == 4*x)

x:
 tensor([0., 1., 2., 3.])
x.grad:
 None
y:
 tensor(28., grad_fn=<MulBackward0>)
x.grad:
 tensor([ 0.,  4.,  8., 12.])
test grad:
 tensor([True, True, True, True])


In [36]:
import torch
x = torch.arange(4.0, requires_grad=True)
print("x:\n", x)
y = x * x
u = y.detach() # u不需要计算梯度, 视作常数
z = u * x
z.sum().backward()
print("test:\n", x.grad == u)

x.grad.zero_()
y.sum().backward()
# 等价写法: y.backward(torch.ones(len(x)))
print("x.grad:\n", x.grad)
print("test:\n", x.grad == 2 * x)

x:
 tensor([0., 1., 2., 3.], requires_grad=True)
test:
 tensor([True, True, True, True])
x.grad:
 tensor([0., 2., 4., 6.])
x.grad:
 tensor([0., 2., 4., 6.])
test:
 tensor([True, True, True, True])


### 深度学习基本配置
* 常用包
* 常用超参数
    * batch
    * learning rate
    * max epochs
* GPU配置

In [None]:
import os 
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' # 指明调用的GPU为0,1号
import numpy as np 
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
import torch.optim as optimizer

### 数据读入
主要是通过`Dataset+DataLoader`方式完成, `Dataset`定义好数据的格式和数据变换形式, `DataLoader`用iterative的方式不断读入批次数据
定义自己的`Dataset`主要有三个方法:
* `__init__`: 向类中传入外部参数，同时定义样本集
* `__getitem__`: 逐个读取样本集合中的元素，可以进行一定的变换，并将返回训练/验证所需的数据
* `__len__`: 返回数据集的样本数

In [None]:
import os
import pandas as pd
from torchvision.io import read_image

# 自定义Dataset
class MyDataset(Dataset):
    def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
        """
        Args:
            annotations_file(string): Path to the csv file with annotations.
            img_dir(string): Directory with all the images.
            transform(callable, optional): Optional transform to be applied on a sample.
            target_transform(callable, optional): Optional transform to be applied on the target.
        """
        self.img_labels = pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        """
        Args:
            idx(int): Index
        """
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        return image, label
    

import torch
from torchvision import datasets
data_transform = ""
train_data = datasets.ImageFolder(train_path="", transform=data_transform)
val_data = datasets.ImageFolder(val_path="", transform=data_transform)
# 使用DataLoader加载数据
from torch.utils.data import DataLoader
batch_size = 32
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=4, shuffle=True, drop_last=True)
val_loader = torch.utils.data.DataLoader(val_data, batch_size=batch_size, num_workers=4, shuffle=False)

### 模型构建
* Pytorch中神经网络一般基于`torch.nn.Module`类的模型, 只需要定义`forward`函数, `backward`函数会在使用`autograd`时自动定义, `backward`函数用来计算导数
* 神经网路中常见的层
    * 不含模型参数的层
    * 含模型参数的层
    * 二维卷积层
    * 池化层
* 神经网络的典型训练过程
    1. 定义包含一些可学习参数(或者叫权重)的神经网络
    2. 在输入数据集上迭代
    3. 通过网络处理输入
    4. 计算loss(输出和正确答案的距离)
    5. 将梯度反向传播给网络的参数
    6. 更新网络的权重, 一般使用一个简单的规则: `weight = weight - learning_rate * gradient`

In [23]:
import torch
from torch import nn

class MLP(nn.Module):
  # 声明带有模型参数的层, 这里声明了两个全连接层
  def __init__(self, **kwargs):
    # 调用MLP父类Block的构造函数来进行必要的初始化。这样在构造实例时还可以指定其他函数
    super(MLP, self).__init__(**kwargs)
    self.hidden = nn.Linear(784, 256)
    self.activate = nn.ReLU()
    self.output = nn.Linear(256,10)
    
   # 定义模型的前向计算, 即如何根据输入x计算返回所需要的模型输出
   # 无须定义反向传播函数。系统将通过自动求梯度而自动生成反向传播所需的backward函数
  def forward(self, x):
    out = self.activate(self.hidden(x))
    return self.output(out)  

X = torch.rand(2, 784)
net = MLP()
print(net)
print(net(X))

MLP(
  (hidden): Linear(in_features=784, out_features=256, bias=True)
  (activate): ReLU()
  (output): Linear(in_features=256, out_features=10, bias=True)
)
tensor([[-0.2646,  0.1430, -0.0876, -0.2220,  0.0971,  0.1070, -0.0418,  0.1346,
          0.0478,  0.2612],
        [-0.3111,  0.1876, -0.1022, -0.1682,  0.0499, -0.0197, -0.0717,  0.1531,
          0.0171,  0.2583]], grad_fn=<AddmmBackward0>)


In [25]:
# 不含模型参数的自定义层
import torch
from torch import nn   
class MyLayer(nn.Module):
    def __init__(self, **kwargs):
        super(MyLayer, self).__init__(**kwargs)
    def forward(self, x):
        return x - x.mean()
    
layer = MyLayer()
print(layer(torch.tensor([1, 2, 3, 4, 5], dtype=torch.float)))

tensor([-2., -1.,  0.,  1.,  2.])


In [31]:
# 含模型参数的自定义层
import torch
from torch import nn

class MyListDense(nn.Module):
    def __init__(self):
        super(MyListDense, self).__init__()
        # Parameter是Tensor的子类, 其会自动被添加到模型的参数列表里
        self.params = nn.ParameterList([nn.Parameter(torch.randn(4, 4)) for i in range(3)])
        self.params.append(nn.Parameter(torch.randn(4, 1)))

    def forward(self, x):
        for i in range(len(self.params)):
            x = torch.mm(x, self.params[i])
        return x
net = MyListDense()
print(net)

class MyDictDense(nn.Module):
    def __init__(self):
        super(MyDictDense, self).__init__()
        self.params = nn.ParameterDict({
                'linear1': nn.Parameter(torch.randn(4, 4)),
                'linear2': nn.Parameter(torch.randn(4, 1))
        })
        self.params.update({'linear3': nn.Parameter(torch.randn(4, 2))}) # 新增

    def forward(self, x, choice='linear1'):
        return torch.mm(x, self.params[choice])

net = MyDictDense()
print(net)

MyListDense(
  (params): ParameterList(
      (0): Parameter containing: [torch.float32 of size 4x4]
      (1): Parameter containing: [torch.float32 of size 4x4]
      (2): Parameter containing: [torch.float32 of size 4x4]
      (3): Parameter containing: [torch.float32 of size 4x1]
  )
)
MyDictDense(
  (params): ParameterDict(
      (linear1): Parameter containing: [torch.FloatTensor of size 4x4]
      (linear2): Parameter containing: [torch.FloatTensor of size 4x1]
      (linear3): Parameter containing: [torch.FloatTensor of size 4x2]
  )
)


In [None]:
# 二维卷积层, 将输入和卷积核做互相关运算, 并加上一个标量偏差来得到输出
import torch
from torch import nn

# 卷积运算（二维互相关）
def corr2d(X, K): 
    h, w = K.shape
    X, K = X.float(), K.float()
    Y = torch.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j] = (X[i: i + h, j: j + w] * K).sum()
    return Y

# 二维卷积层, 卷积层的模型参数包括了卷积核和标量偏差
class Conv2D(nn.Module):
    def __init__(self, kernel_size):
        super(Conv2D, self).__init__()
        self.weight = nn.Parameter(torch.randn(kernel_size))
        self.bias = nn.Parameter(torch.randn(1))

    def forward(self, x):
        return corr2d(x, self.weight) + self.bias

In [32]:
# 池化层, 每次对输入数据的一个固定形状窗口中的元素计算输出
import torch
from torch import nn

# 前向计算的实现
def pool2d(X, pool_size, mode='max'):
    p_h, p_w = pool_size
    Y = torch.zeros((X.shape[0] - p_h + 1, X.shape[1] - p_w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            if mode == 'max':
                Y[i, j] = X[i: i + p_h, j: j + p_w].max()
            elif mode == 'avg':
                Y[i, j] = X[i: i + p_h, j: j + p_w].mean()
    return Y

X = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]], dtype=torch.float)
pool2d(X, (2, 2))

tensor([[4., 5.],
        [7., 8.]])

### 模型初始化
`torch.nn.init`提供以下初始化方式:
1. `torch.nn.init.uniform_(tensor, a=0.0, b=1.0)`
2. `torch.nn.init.normal_(tensor, mean=0.0, std=1.0)`
3. `torch.nn.init.constant_(tensor, val)`
4. `torch.nn.init.ones_(tensor)`
5. `torch.nn.init.zeros_(tensor)`
6. `torch.nn.init.eye_(tensor)`
7. `torch.nn.init.dirac_(tensor, groups=1)`
8. `torch.nn.init.xavier_uniform_(tensor, gain=1.0)`
9. `torch.nn.init.xavier_normal_(tensor, gain=1.0)`
10. `torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan__in', nonlinearity='leaky_relu')`
11. `torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')`
12. `torch.nn.init.orthogonal_(tensor, gain=1)`
13. `torch.nn.init.sparse_(tensor, sparsity, std=0.01)`

以上函数均直接原地更改输入张量的值

14. `torch.nn.init.calculate_gain(nonlinearity, param=None)`

<img src="pic/pic2.png" width = "50%" />

In [39]:
import torch
from torch import nn

conv = nn.Conv2d(1, 3, 3)
print("test type:\n", isinstance(conv, nn.Conv2d))
print("weight:\n", conv.weight.data)
torch.nn.init.kaiming_normal_(conv.weight.data)
print("weight:\n", conv.weight.data)

# e.g.
# 封装初始化函数
def initialize_weights(model):
	for m in model.modules():
		# 判断是否属于Conv2d
		if isinstance(m, nn.Conv2d):
			torch.nn.init.zeros_(m.weight.data)
			# 判断是否有偏置
			if m.bias is not None:
				torch.nn.init.constant_(m.bias.data,0.3)
		elif isinstance(m, nn.Linear):
			torch.nn.init.normal_(m.weight.data, 0.1)
			if m.bias is not None:
				torch.nn.init.zeros_(m.bias.data)
		elif isinstance(m, nn.BatchNorm2d):
			m.weight.data.fill_(1) 		 
			m.bias.data.zeros_()	

print("\n-------MLP-------\n")
# 模型的定义
class MLP(nn.Module):
  # 声明带有模型参数的层，这里声明了两个全连接层
  def __init__(self, **kwargs):
    # 调用MLP父类Block的构造函数来进行必要的初始化。这样在构造实例时还可以指定其他函数
    super(MLP, self).__init__(**kwargs)
    self.hidden = nn.Conv2d(1,1,3)
    self.act = nn.ReLU()
    self.output = nn.Linear(10,1)
    
   # 定义模型的前向计算，即如何根据输入x计算返回所需要的模型输出
  def forward(self, x):
    o = self.act(self.hidden(x))
    return self.output(o)

mlp = MLP()
print(mlp.hidden.weight.data)
print("\n-------初始化-------\n")

mlp.apply(initialize_weights)
# 或者initialize_weights(mlp)
print(mlp.hidden.weight.data)

test type:
 True
weight:
 tensor([[[[ 0.2422,  0.1846,  0.1458],
          [-0.1550, -0.2580, -0.1129],
          [ 0.3068,  0.2992, -0.0534]]],


        [[[ 0.2793, -0.1693, -0.1054],
          [ 0.0284, -0.1615, -0.1221],
          [ 0.1241, -0.2343,  0.2210]]],


        [[[-0.2649,  0.0559,  0.1013],
          [-0.0170,  0.0444,  0.1168],
          [-0.0975,  0.1835,  0.0817]]]])
weight:
 tensor([[[[ 0.3414,  0.9009,  0.2158],
          [ 0.0270,  0.3259, -0.1581],
          [-0.4717,  0.1819, -0.0927]]],


        [[[ 0.9349,  0.0406, -0.0252],
          [ 0.2589,  0.6081,  0.5480],
          [-0.1269, -0.1519,  0.9163]]],


        [[[ 0.1861,  0.2320,  0.5892],
          [ 0.2482, -0.3617, -0.0351],
          [-0.0309, -0.0220, -0.2693]]]])

-------MLP-------

tensor([[[[ 0.1814,  0.2181, -0.0305],
          [-0.3044,  0.1807,  0.1593],
          [ 0.2541, -0.2394,  0.1537]]]])

-------初始化-------

tensor([[[[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]]]])


### 损失函数
1. 二分类交叉熵损失函数`torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')`
    * `weight`: 每个类别的`loss`设置权值
    * `size_average`: 设置为`True`返回`loss`均值, 否则返回各样本`loss`之和
    * `reduce`: 设置为`True`返回标量
2. 交叉熵损失函数`torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')`
    * `ignore_index`: 忽略某个类的损失函数

    <img src="pic/pic3.png" width = "75%" />

3. `L1`损失函数`torch.nn.L1Loss(size_average=None, reduce=None, reduction='mean')`, $l_n=|x_n-y_n|$
    * `reduction`决定了计算模式, 设置为`none`则逐个元素计算; 设置为`sum`则所有元素求和; 设置为`mean`则加权平均返回标量
4. `MSE`损失函数`torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')`, $l_n=(x_n-y_n)^2$
5. 平滑`L1(Smooth L1)`损失函数`torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction='mean', beta=1.0)`
    
    <img src="pic/pic4.png" width = "75%" />

6. 目标泊松分布的负对数似然损失`torch.nn.PoissonNLLLoss(log_input=True, full=False, size_average=None, eps=1e-08, reduce=None, reduction='mean')`
    * `log_input`: 输入是否为对数形式
    * `full`: 计算所有的损失
    * `eps`: 修正项，避免`input`为0时, `log(input)`为nan的情况

    <img src="pic/pic5.png" width = "65%" />

7. `KL`散度`torch.nn.KLDivLoss(size_average=None, reduce=None, reduction='mean', log_target=False)`

    <img src="pic/pic6.png" width = "65%" />

8. `MarginRankingLoss`损失函数`torch.nn.MarginRankingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')`, $l(x_1,x_2,y)=\max(0,-y\times(x_1-x_2)+margin)$
    * `margin`: 边界值, $x_1$和$x_2$之间的差异值
9. 多标签边界损失函数`torch.nn.MultiLabelMarginLoss(size_average=None, reduce=None, reduction='mean')`

    <img src="pic/pic7.png" width = "80%" />

10. 二分类损失函数`torch.nn.SoftMarginLoss(size_average=None, reduce=None, reduction='mean')`

    <img src="pic/pic8.png" width = "80%" />

11. 多分类折页损失`torch.nn.MultiMarginLoss(p=1, margin=1.0, weight=None, size_average=None, reduce=None, reduction='mean')`

    <img src="pic/pic9.png" width = "80%" />

12. 三元组损失`torch.nn.TripletMarginLoss(margin=1.0, p=2.0, eps=1e-06, swap=False, size_average=None, reduce=None, reduction='mean')`
    * 三元组: <实体1, 关系, 实体2>。在项目中, 也可以表示为`<anchor, positive examples , negative examples>`, 希望`anchor`的距离更接近`positive examples`更远离`negative examples`

    <img src="pic/pic10.png" width = "65%" />

13. `HingEmbeddingLoss`损失函数`torch.nn.HingeEmbeddingLoss(margin=1.0, size_average=None, reduce=None, reduction='mean')`

    <img src="pic/pic11.png" width = "80%" />    

14. 余弦相似度`torch.nn.CosineEmbeddingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')`
15. `CTC`损失函数`torch.nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)`

In [41]:
import torch 
from torch import nn

m = nn.Sigmoid()
loss = nn.BCELoss()
input = torch.randn(3, requires_grad=True)
target = torch.empty(3).random_(2)
output = loss(m(input), target)
output.backward()
print('BCELoss损失函数的计算结果为',output)

BCELoss损失函数的计算结果为 tensor(0.7857, grad_fn=<BinaryCrossEntropyBackward0>)


In [1]:
import torch
from torch import nn

loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
output.backward()
print('CrossEntropyLoss损失函数的计算结果为',output)

CrossEntropyLoss损失函数的计算结果为 tensor(1.4290, grad_fn=<NllLossBackward0>)


In [2]:
import torch
from torch import nn

loss = nn.L1Loss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()
print('L1Loss损失函数的计算结果为',output)

L1Loss损失函数的计算结果为 tensor(1.0457, grad_fn=<MeanBackward0>)


In [3]:
import torch
from torch import nn

loss = nn.MSELoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()
print('MSELoss损失函数的计算结果为',output)

MSELoss损失函数的计算结果为 tensor(1.7451, grad_fn=<MseLossBackward0>)


In [4]:
import torch
from torch import nn

loss = nn.SmoothL1Loss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()
print('SmoothL1Loss损失函数的计算结果为',output)

SmoothL1Loss损失函数的计算结果为 tensor(0.7535, grad_fn=<SmoothL1LossBackward0>)


In [11]:
import torch
from torch import nn

loss = nn.PoissonNLLLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()
print('PoissonNLLLoss损失函数的计算结果为',output)

PoissonNLLLoss损失函数的计算结果为 tensor(1.3255, grad_fn=<MeanBackward0>)


In [15]:
import torch 
from torch import nn

# 不需要反向传播
loss = nn.KLDivLoss()
input = torch.tensor([[0.5, 0.3, 0.2], [0.2, 0.3, 0.5]])
target = torch.tensor([[0.9, 0.05, 0.05], [0.1, 0.7, 0.2]], dtype=torch.float)
output = loss(input, target)
print('KLDivLoss损失函数的计算结果为',output)

KLDivLoss损失函数的计算结果为 tensor(-0.3335)


In [16]:
import torch
from torch import nn

loss = nn.MarginRankingLoss()
input1 = torch.randn(3, requires_grad=True)
input2 = torch.randn(3, requires_grad=True)
target = torch.randn(3).sign()
output = loss(input1, input2, target)
output.backward()
print('MarginRankingLoss损失函数的计算结果为',output)

MarginRankingLoss损失函数的计算结果为 tensor(0.8786, grad_fn=<MeanBackward0>)


In [19]:
import torch
from torch import nn

loss = nn.MultiLabelMarginLoss()
x = torch.FloatTensor([[0.9, 0.2, 0.4, 0.8]])
# for target y, only consider labels 3 and 0, not after label -1
y = torch.LongTensor([[3, 0, -1, 1]])# 真实的分类是，第3类和第0类
output = loss(x, y)

print('MultiLabelMarginLoss损失函数的计算结果为',output)

MultiLabelMarginLoss损失函数的计算结果为 tensor(0.4500)


In [22]:
import torch
from torch import nn

loss = nn.SoftMarginLoss()
input = torch.tensor([[0.3, 0.7], [0.5, 0.5]])
target = torch.tensor([[1.0, 0.0], [0.0, 1.0]], dtype=torch.float)
output = loss(input, target)
print('SoftMarginLoss损失函数的计算结果为',output)

SoftMarginLoss损失函数的计算结果为 tensor(0.6037)


In [23]:
import torch 
from torch import nn

loss = nn.MultiMarginLoss()
input = torch.tensor([[0.3, 0.7], [0.5, 0.5]])
target = torch.tensor([1, 0], dtype=torch.long)
output = loss(input, target)
print('MultiMarginLoss损失函数的计算结果为',output)

MultiMarginLoss损失函数的计算结果为 tensor(0.4000)


In [26]:
import torch
from torch import nn

loss = nn.TripletMarginLoss(margin=1.0, p=2)
anchor = torch.randn(100, 128, requires_grad=True)
positive = torch.randn(100, 128, requires_grad=True)
negative = torch.randn(100, 128, requires_grad=True)
output = loss(anchor, positive, negative)
output.backward()
print('TripletMarginLoss损失函数的计算结果为',output)

TripletMarginLoss损失函数的计算结果为 tensor(1.0236, grad_fn=<MeanBackward0>)


In [28]:
import torch
from torch import nn

loss = nn.HingeEmbeddingLoss()
input = torch.tensor([[1., 0.8, 0.5]])
target = torch.tensor([[1, 1, -1]])
output = loss(input, target)
print('HingeEmbeddingLoss损失函数的计算结果为',output)

HingeEmbeddingLoss损失函数的计算结果为 tensor(0.7667)


In [32]:
import torch
from torch import nn

loss = nn.CosineEmbeddingLoss()
input1 = torch.randn(100, 128, requires_grad=True)
input2 = torch.randn(100, 128, requires_grad=True)
target = torch.randn(100).sign()
output = loss(input1, input2, target)
output.backward()
print('CosineEmbeddingLoss损失函数的计算结果为',output)

CosineEmbeddingLoss损失函数的计算结果为 tensor(0.5096, grad_fn=<MeanBackward0>)


### model的训练验证
#### 训练
```python
def train(epoch):
    # 训练状态
    model.train()
    train_loss = 0
    # 用for循环读取DataLoader中的全部数据
    for data, label in train_loader:
        data, label = data.cuda(), label.cuda()
        # 将优化器梯度置为0
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, label)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()*data.size(0)
    train_loss = train_loss/len(train_loader.dataset)
		print('Epoch: {} \tTraining Loss: {:.6f}'.format(epoch, train_loss))
```
#### 验证
```python
def val(epoch):       
    # 验证状态
    model.eval()
    val_loss = 0
    # 验证时梯度保持不变
    with torch.no_grad():
        for data, label in val_loader:
            data, label = data.cuda(), label.cuda()
            output = model(data)
            preds = torch.argmax(output, 1)
            loss = criterion(output, label)
            val_loss += loss.item()*data.size(0)
            running_accu += torch.sum(preds == label.data)
    val_loss = val_loss/len(val_loader.dataset)
    print('Epoch: {} \tTraining Loss: {:.6f}'.format(epoch, val_loss))
```

### pytorch优化器

<img src="pic/pic12.png" width = "75%" />

`Optimizer`的三个属性 
* `defaults`: 存储优化器的超参数(是一个dict)
* `state`: 参数的缓存
* `param_groups`: 管理的参数组(是一个list, 其中每个元素是一个dict)

`Optimizer`包括的几个方法
* `zero_grad()`: 清空所管理参数的梯度
* `step()`: 执行一步梯度更新(参数更新)
* `add_param_group()`: 添加参数组
* `load_state_dict()`: 加载状态参数字典(常用于模型的断点续训练, 继续上次的参数进行训练)
* `state_dict()`: 获取优化器当前状态信息字典

`Optimizer`在一个`epoch`中:
```python
optimizer = torch.optim.SGD(net.parameters(), lr=1e-5)
for epoch in range(EPOCH):
	...
	optimizer.zero_grad()  #梯度置零
	loss = ...             #计算loss
	loss.backward()        #BP反向传播
	optimizer.step()       #梯度更新
```