In [1]:
import torch

In [2]:
x = torch.ones(5) #input tensor
y = torch.zeros(3)
w = torch.rand(size=(5, 3), requires_grad = True) # 建立一個5x3的矩陣才可以做線性轉換
b = torch.rand(3, requires_grad = True)

z = torch.matmul(x, w) + b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

**Note:** You can set the value of `requires_grad` when creating a tensor, or later by using `x.requires_grad_(True)` method.

In [3]:
print('Gradient function for z =',z.grad_fn)
print('Gradient function for loss =', loss.grad_fn)

Gradient function for z = <AddBackward0 object at 0x0000028822C1B580>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x0000028822C1BCA0>


In [4]:
print(f'current grad without performing loss.backward() is w\'s {w.grad}, bias\'s {b.grad}')

current grad without performing loss.backward() is w's None, bias's None


In [5]:
loss.backward()
print("current grad after performing loss.backward()")
print(w.grad)
print(b.grad)

current grad after performing loss.backward()
tensor([[0.3214, 0.2906, 0.3239],
        [0.3214, 0.2906, 0.3239],
        [0.3214, 0.2906, 0.3239],
        [0.3214, 0.2906, 0.3239],
        [0.3214, 0.2906, 0.3239]])
tensor([0.3214, 0.2906, 0.3239])


In [6]:
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.enable_grad():
    z = x @ w + b
print(z.requires_grad)

z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

True
False
True
False


There are reasons you might want to disable gradient tracking:
  - To mark some parameters in your neural network at **frozen parameters**. This is
    a very common scenario for fine tuning a pre-trained network.
  - To **speed up computations** when you are only doing forward pass, because computations on tensors that do
    not track gradients would be more efficient.

## Optional reading: Tensor gradients and Jacobian products

In many cases, we have a scalar loss function, and we need to compute
the gradient with respect to some parameters. However, there are cases
when the output function is an arbitrary tensor. In this case, PyTorch
allows you to compute so-called **Jacobian product**, and not the actual
gradient.

For a vector function $\vec{y}=f(\vec{x})$, where
$\vec{x}=\langle x_1,\dots,x_n\rangle$ and
$\vec{y}=\langle y_1,\dots,y_m\rangle$, a gradient of
$\vec{y}$ with respect to $\vec{x}$ is given by **Jacobian
matrix**, whose element $J_{ij}$ contains $\frac{\partial y_{i}}{\partial x_{j}}$.

Instead of computing the Jacobian matrix itself, PyTorch allows you to
compute **Jacobian Product** $v^T\cdot J$ for a given input vector
$v=(v_1 \dots v_m)$. This is achieved by calling `backward` with
$v$ as an argument. The size of $v$ should be the same as
the size of the original tensor, with respect to which we want to
compute the product:


In [20]:
inp = torch.eye(5, requires_grad = True)
print(f'Current inp matrix is :')
print({inp})

Current inp matrix is :
{tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]], requires_grad=True)}


In [21]:
out = (inp+1).pow(2)
print(f'Output is')
print(out)

Output is
tensor([[4., 1., 1., 1., 1.],
        [1., 4., 1., 1., 1.],
        [1., 1., 4., 1., 1.],
        [1., 1., 1., 4., 1.],
        [1., 1., 1., 1., 4.]], grad_fn=<PowBackward0>)


In [22]:
out.backward(torch.ones_like(inp), retain_graph=True)
print("First call\n", inp.grad)
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nSecond call\n", inp.grad)
inp.grad.zero_()
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nCall after zeroing gradients\n", inp.grad)

First call
 tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.],
        [2., 2., 2., 2., 4.]])

Second call
 tensor([[8., 4., 4., 4., 4.],
        [4., 8., 4., 4., 4.],
        [4., 4., 8., 4., 4.],
        [4., 4., 4., 8., 4.],
        [4., 4., 4., 4., 8.]])

Call after zeroing gradients
 tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.],
        [2., 2., 2., 2., 4.]])


> **Note:** previously we were calling `backward()` function without parameters. This is equivalent to calling `backward(torch.tensor(1.0))`, which is a useful way to compute the gradients in case of a scalar-valued function, such as loss during neural network training.
