In [1]:
import torch

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output

# 设置了requires_grad=True，可以自动求导
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
# 线性模型
z = torch.matmul(x, w)+b
# 损失函数
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

可以在创建张量时使用requires_grad，也可以后面使用x.requires_grad_(True)，是一样的效果

In [2]:
print('Gradient function for z =',z.grad_fn)
print('Gradient function for loss =', loss.grad_fn)

Gradient function for z = <AddBackward0 object at 0x0000022B42EEA1F0>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward object at 0x0000022B42EEA5E0>


## 1.计算梯度

In [3]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.0196, 0.2885, 0.0530],
        [0.0196, 0.2885, 0.0530],
        [0.0196, 0.2885, 0.0530],
        [0.0196, 0.2885, 0.0530],
        [0.0196, 0.2885, 0.0530]])
tensor([0.0196, 0.2885, 0.0530])


## 2.如何禁用梯度追踪

In [4]:
z = torch.matmul(x, w)+b
print(z.requires_grad)

# 使用with torch.no_grad()
with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

True
False


In [5]:
# 对张量使用detach()也可以
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False


有两种情况我们可能需要禁用梯度

1.冻结模型的某些层的参数，禁止其更新，常用于 finetuning a pretrained network

2.推理的时候，我们不需要反向传播的时候，比如验证和测试

pytorch的计算图是动态的，在每次.backward()后都会生成一个新的图，这也是为什么可以在模型中添加控制流的原因

## 3.可选阅读：张量梯度和雅可比积Tensor Gradients and Jacobian Products

In many cases, we have a scalar loss function, and we need to compute the gradient with respect to some parameters. However, there are cases when the output function is an arbitrary tensor.

In [6]:
inp = torch.eye(5, requires_grad=True)
out = (inp+1).pow(2)

In [7]:
out

tensor([[4., 1., 1., 1., 1.],
        [1., 4., 1., 1., 1.],
        [1., 1., 4., 1., 1.],
        [1., 1., 1., 4., 1.],
        [1., 1., 1., 1., 4.]], grad_fn=<PowBackward0>)

In [8]:
# 我们之前在调用.backward的时候没有传入参数，
# 这是等价于backward(torch.tensor(1.0))的，这适用于计算标量值函数的梯度
# 比如神经网络的标量损失

out.backward(torch.ones_like(inp), retain_graph=True)
print("First call\n", inp.grad)
# 注意在这时没有进行梯度清零直接进行了第二次反传，那么梯度是会累积的
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nSecond call\n", inp.grad)


inp.grad.zero_()
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nCall after zeroing gradients\n", inp.grad)

First call
 tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.],
        [2., 2., 2., 2., 4.]])

Second call
 tensor([[8., 4., 4., 4., 4.],
        [4., 8., 4., 4., 4.],
        [4., 4., 8., 4., 4.],
        [4., 4., 4., 8., 4.],
        [4., 4., 4., 4., 8.]])

Call after zeroing gradients
 tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.],
        [2., 2., 2., 2., 4.]])
