In [1]:
import sys
import torch
from torch import nn


# Contents

## 1 inplace operation
- 对requires_grad==True的叶子张量不能使用替代操作
- 对求梯度阶段用到的张量(leaf node/variables/parameters)不能使用替代操作

### 

In [20]:
w = torch.FloatTensor(10)
w.requires_grad=True

In [21]:
w

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True)

In [22]:
w.data

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [23]:
w.normal_() 

RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

In [24]:
w.data.normal_()

tensor([ 0.9103,  0.5313,  0.3694,  1.2054,  0.6227, -0.6204, -0.2041,  1.2064,
        -0.6975,  0.3144])

In [25]:
w.data

tensor([ 0.9103,  0.5313,  0.3694,  1.2054,  0.6227, -0.6204, -0.2041,  1.2064,
        -0.6975,  0.3144])

###

In [42]:
x = torch.tensor([1.,3.])
w1 = torch.tensor([[2.],[1.]],requires_grad=True)
w2 = torch.tensor([9.],requires_grad=True)
w2.is_leaf

True

x
w1——>d
     w2——>f

In [44]:
d = torch.matmul(x, w1)
f = torch.matmul(d, w2)
d[:] = 4
# d是计算df/dw2梯度值依赖的节点
f.backward()

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]], which is output 0 of struct torch::autograd::CopySlices, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

`计算损失函数过程中间值实际值的变动——>报错`m

### 2 data .detach

- detach:
    + return a new tensor, detached from current graph
    + the result will never require gradient
- x.data与x.detach()都返回tensor
    + 都和x共享数据
    + 与x计算历史无关
    + requires_grad=False
- x.data的修改不会报错 x.detcah()会报错【梯度安全】

In [61]:
a = torch.tensor([1.,2.,3.], requires_grad=True)
out = a.sigmoid()
out

tensor([0.7311, 0.8808, 0.9526], grad_fn=<SigmoidBackward0>)

In [62]:
c = out.data
out,c

(tensor([0.7311, 0.8808, 0.9526], grad_fn=<SigmoidBackward0>),
 tensor([0.7311, 0.8808, 0.9526]))

In [63]:
a.requires_grad, c.requires_grad, out.requires_grad # .data 只复制数据，不保留梯度传播路径

(True, False, True)

In [64]:
c.zero_()
out,c # .data共享同一个位置，相当于一个别名

(tensor([0., 0., 0.], grad_fn=<SigmoidBackward0>), tensor([0., 0., 0.]))

In [68]:
a = torch.tensor([1.,2.,3.], requires_grad=True)
out = a.sigmoid()
out.sum().backward()

In [69]:
a = torch.tensor([1.,2.,3.], requires_grad=True)
out = a.sigmoid()
c = out.data
c.zero_() # out==zero
out.sum().backward() # 实际上节点值已经被改变了 但是这里没有报错
# 因此data不能保证in-place的安全性

In [72]:
a = torch.tensor([1.,2.,3.], requires_grad=True)
out = a.sigmoid()
c = out.detach() #.detach()也是共享同一个内存
c.zero_() # out==zero 
out, c, a.requires_grad, out.requires_grad,c.requires_grad # 只复制数据，不保留梯度传播路径

(tensor([0., 0., 0.], grad_fn=<SigmoidBackward0>),
 tensor([0., 0., 0.]),
 True,
 True,
 False)

In [70]:
a = torch.tensor([1.,2.,3.], requires_grad=True)
out = a.sigmoid()
c = out.detach()
c.zero_() # out==zero
out.sum().backward() # 实际上节点值已经被改变，且报错 

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [3]], which is output 0 of SigmoidBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

In [57]:
a.grad

tensor([0., 0., 0.])