In [1]:
import torch

# There are two situations where in-place modifications cannot be done to tensors.

## leaf node with grad

In [21]:
x = torch.tensor([1,2,3,4], dtype=torch.float32)
w = torch.tensor([1,2,3,4], requires_grad=True, dtype=torch.float32)
print(x,w)

tensor([1., 2., 3., 4.]) tensor([1., 2., 3., 4.], requires_grad=True)


In [22]:
y = torch.dot(x,w)
print(y)

tensor(30., grad_fn=<DotBackward>)


Here `x.requires_grad` is False. We do not track its gradients. So it is not included in the computational graph(detached), and it is a leaf node. If we modify $\mathbf{x}$ in place, there will be no problem.

In [13]:
x.normal_(0,0.01)
print(x, x.is_leaf)

tensor([ 0.0020, -0.0118, -0.0089, -0.0055]) True


value of `y` will stay unchanged.

In [12]:
print(y)

tensor(30., grad_fn=<DotBackward>)


But when it comes to $\mathbf{w}$, things will be different.

In [14]:
print(w, w.is_leaf)

tensor([1., 2., 3., 4.], requires_grad=True) True


In [15]:
w.normal_(0, 0.01)

RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

As we can see pytorch won't allow us to modify a tensor in place if it is a leaf node and it requires grad. We can bupass this `RuntimeError` by setting the `data` attribute of the tensor.

In [17]:
w.data.normal_(0,0.01)
print(w)
print(y)

tensor([-0.0162,  0.0024, -0.0060,  0.0114], requires_grad=True)
tensor(30., grad_fn=<DotBackward>)


## Variables needed for gradient descent

In [23]:
x = torch.tensor([[1., 2.]])
w1 = torch.tensor([[2.], [1.]], requires_grad = True)
w2 = torch.tensor([3.], requires_grad = True)
d = torch.matmul(x, w1)
f = torch.matmul(d, w2)
print(d,f)

tensor([[4.]], grad_fn=<MmBackward>) tensor([12.], grad_fn=<MvBackward>)


To save memory, each time after pytorch does `backward()`, it will destory the computational graph. So if we want to `backward()` again, we need to set `retain_graph` to `True` the first time we call `backward()`.

In [24]:
f.backward(retain_graph=True)
print(w1.grad)

tensor([[3.],
        [6.]])


If we modify $d$ and try to backporpagate $f$, because $\frac{\partial f}{\partial w1}=\frac{\partial f}{\partial d} \frac{\partial d}{\partial w1}$, we need $d$ to calculate the grad. If we modify $d$ first, we will get an `RuntimeError` when we try backpropagating $f$.

In [25]:
d[:] = 1

In [26]:
f.backward(retain_graph=True)

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1, 1]], which is output 0 of MmBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Note this error can not be bypassed by modifying the `data` attribute of tensor `w1`.

In [27]:
d.data.normal_(0,0.01)

tensor([[-0.0120]])

In [28]:
f.backward()

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1, 1]], which is output 0 of MmBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).