# 07 自动求导
date: 2023-07-16

![](https://raw.githubusercontent.com/SisyphusTang/Picture-bed/master/image-20230716152025420.png) 

 X,W为两个向量，X和W的内积<X,W>与标量y的差的平方求z的导数
![](https://raw.githubusercontent.com/SisyphusTang/Picture-bed/master/image-20230716152400193.png)

![](https://raw.githubusercontent.com/SisyphusTang/Picture-bed/master/image-20230716152509019.png)

## 自动求导

### 计算图

![](https://raw.githubusercontent.com/SisyphusTang/Picture-bed/master/image-20230716152700381.png)

![](https://raw.githubusercontent.com/SisyphusTang/Picture-bed/master/image-20230716152814450.png)

![](https://raw.githubusercontent.com/SisyphusTang/Picture-bed/master/image-20230716152904574.png)
![](https://raw.githubusercontent.com/SisyphusTang/Picture-bed/master/image-20230716153229915.png)
**反向求的时候，因为第一遍正向计算将中间结果存起来，后面反向求梯度就可以直接算**
![](https://raw.githubusercontent.com/SisyphusTang/Picture-bed/master/image-20230716153303090.png)
![](https://raw.githubusercontent.com/SisyphusTang/Picture-bed/master/image-20230716153406544.png)

In [2]:
import torch
x = torch.arange(4.0)
x

tensor([0., 1., 2., 3.])

在计算y关于x的梯度之前，我们需要一个地方来存储梯度。

In [3]:
x.requires_grad_(True) # 等价于 x = torch.arange(4.0,requires_grad_)
x.grad # 默认是none

In [4]:
y = 2 * torch.dot(x,x)
y

tensor(28., grad_fn=<MulBackward0>)

**调用反向传播函数来自动计算y关于x每个分量的梯度**

In [5]:
y.backward()
x.grad  # 函数y的梯度

tensor([ 0.,  4.,  8., 12.])

In [6]:
x.grad == 4 * x

tensor([True, True, True, True])

**默认情况下 pytorch会累积梯度，需要清除之前的值**

In [7]:
x.grad.zero_()
y = x.sum()
y.backward()
x.grad # 对四个分量分别求偏导数 所以是四个1

tensor([1., 1., 1., 1.])

In [8]:
A = torch.arange(24).reshape(2,3,4)
A

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])

In [9]:
t = A.sum(axis = 0,keepdims = True)
t

tensor([[[12, 14, 16, 18],
         [20, 22, 24, 26],
         [28, 30, 32, 34]]])

In [10]:
A = torch.arange(24).reshape(3,8)
B = torch.ones(3,8)
A,B

(tensor([[ 0,  1,  2,  3,  4,  5,  6,  7],
         [ 8,  9, 10, 11, 12, 13, 14, 15],
         [16, 17, 18, 19, 20, 21, 22, 23]]),
 tensor([[1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1.]]))

In [11]:
print(A*B)
print(torch.sum(A*B))

tensor([[ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11., 12., 13., 14., 15.],
        [16., 17., 18., 19., 20., 21., 22., 23.]])
tensor(276.)


In [13]:
x.grad.zero_()
y = x*x
y

tensor([0., 1., 4., 9.], grad_fn=<MulBackward0>)