## Autograd

Now that we know how to calculate a loss, how do we use it to perform backpropagation? Torch provides a module, `autograd`, for automatically calculating the gradients of tensors. We can use it to calculate the gradients of all our parameters with respect to the loss. Autograd works by keeping track of operations performed on tensors, then going backwards through those operations, calculating gradients along the way. To make sure PyTorch keeps track of operations on a tensor and calculates the gradients, you need to set `requires_grad = True` on a tensor. You can do this at creation with the `requires_grad` keyword, or at any time with `x.requires_grad_(True)`.

You can turn off gradients for a block of code with the `torch.no_grad()` content:
```python
x = torch.zeros(1, requires_grad=True)
>>> with torch.no_grad():
...     y = x * 2
>>> y.requires_grad
False
```

Also, you can turn on or off gradients altogether with `torch.set_grad_enabled(True|False)`.

The gradients are computed with respect to some variable `z` with `z.backward()`. This does a backward pass through the operations that created `z`.

In [1]:
import torch
from torch import nn
import torch.nn.functional as F

In [2]:
x = torch.randn(2,2, requires_grad=True)
print(x)

tensor([[ 9.4531e-01,  1.1717e-03],
        [-1.3052e+00, -1.1816e+00]], requires_grad=True)


In [3]:
y = x**2
print(y)

tensor([[8.9361e-01, 1.3729e-06],
        [1.7035e+00, 1.3961e+00]], grad_fn=<PowBackward0>)


In [4]:
## grad_fn shows the function that generated this variable
print(y.grad_fn)

<PowBackward0 object at 0x7f2808fd93d0>


In [5]:
z = y.mean()
print(z)

tensor(0.9983, grad_fn=<MeanBackward0>)


In [6]:
print(x.grad)

None


To calculate the gradients, you need to run the `.backward` method on a Variable, `z` for example. This will calculate the gradient for `z` with respect to `x`

$$
\frac{\partial z}{\partial x} = \frac{\partial}{\partial x}\left[\frac{1}{n}\sum_i^n x_i^2\right] = \frac{x}{2}
$$

In [7]:
z.backward()
print(x.grad)
print(x/2)

tensor([[ 4.7266e-01,  5.8586e-04],
        [-6.5258e-01, -5.9078e-01]])
tensor([[ 4.7266e-01,  5.8586e-04],
        [-6.5258e-01, -5.9078e-01]], grad_fn=<DivBackward0>)


In [8]:
z.item()

0.9982856512069702

In [9]:
net = nn.Sequential(nn.Linear(10, 4, bias=False), nn.ReLU(), nn.Linear(4, 1, bias=False), nn.Sigmoid())
net2 = nn.Sequential(nn.Linear(10, 4, bias=False), nn.ReLU(), nn.Linear(4, 1, bias=False), nn.Sigmoid())
net2.load_state_dict(net.state_dict())y.grad

SyntaxError: invalid syntax (3189304565.py, line 3)

In [10]:
y_grads= torch.autograd.grad(z,y)

In [11]:
y_grads

(tensor([[0.2500, 0.2500],
         [0.2500, 0.2500]]),)

### Copying module parameters:

In [12]:
import numpy as np
x= np.array([0.05, 0.06, 0.01, 0.04, 0.2, 0.057, 0.084, 0.145, 0.014, 0.022], dtype=np.float32)
y= np.array([1], dtype=np.float32)
x=torch.as_tensor(x)
y=torch.as_tensor(y)
print(x)
print(y)

tensor([0.0500, 0.0600, 0.0100, 0.0400, 0.2000, 0.0570, 0.0840, 0.1450, 0.0140,
        0.0220])
tensor([1.])


### Example for clonning a module parameters using load_state_dict()

In [13]:
net = nn.Sequential(nn.Linear(10, 4, bias=False), nn.ReLU(), nn.Linear(4, 1, bias=False), nn.Sigmoid())
net2 = nn.Sequential(nn.Linear(10, 4, bias=False), nn.ReLU(), nn.Linear(4, 1, bias=False), nn.Sigmoid())
net2.load_state_dict(net.state_dict())

<All keys matched successfully>

### Example for clonning a module parameters using torch.clone()

In [None]:
net = nn.Sequential(nn.Linear(10, 4, bias=False), nn.ReLU(), nn.Linear(4, 1, bias=False), nn.Sigmoid())
net2 = nn.Sequential(nn.Linear(10, 4, bias=False), nn.ReLU(), nn.Linear(4, 1, bias=False), nn.Sigmoid())
net2[0].weight= torch.nn.Parameter(net[0].weight.clone())
net2[2].weight= torch.nn.Parameter(net[2].weight.clone())

### Example for clonning a module parameters using deepcopy()

In [None]:
import copy
net = nn.Sequential(nn.Linear(10, 4, bias=False), nn.ReLU(), nn.Linear(4, 1, bias=False), nn.Sigmoid())
net2 = copy.deepcopy(net)

### Example for clonning a module parameters using detach().clone()

In [None]:
net = nn.Sequential(nn.Linear(10, 4, bias=False), nn.ReLU(), nn.Linear(4, 1, bias=False), nn.Sigmoid())
net2[0].weight= torch.nn.Parameter(net[0].weight.detach().clone())
net2[2].weight= torch.nn.Parameter(net[2].weight.detach().clone())

# Now test each method:

In [14]:
print(net)
print(net2)

Sequential(
  (0): Linear(in_features=10, out_features=4, bias=False)
  (1): ReLU()
  (2): Linear(in_features=4, out_features=1, bias=False)
  (3): Sigmoid()
)
Sequential(
  (0): Linear(in_features=10, out_features=4, bias=False)
  (1): ReLU()
  (2): Linear(in_features=4, out_features=1, bias=False)
  (3): Sigmoid()
)


In [15]:
print('Net parameters \n', list(net.parameters()))
print('Net2 parameters \n', list(net2.parameters()))

Net parameters 
 [Parameter containing:
tensor([[-0.0969, -0.0174, -0.0342,  0.1916,  0.0039,  0.1663, -0.2375, -0.1561,
          0.0982, -0.1324],
        [ 0.1330,  0.1270, -0.0508, -0.2828, -0.1075, -0.0015,  0.1396,  0.2473,
         -0.3151,  0.1716],
        [-0.2688, -0.3004,  0.0702, -0.2984, -0.2828,  0.0428,  0.0161, -0.2365,
          0.2222, -0.2657],
        [-0.2590,  0.1048,  0.1980,  0.2172,  0.0037, -0.0745, -0.2623,  0.0855,
         -0.0913,  0.0862]], requires_grad=True), Parameter containing:
tensor([[ 0.2993,  0.0315,  0.3062, -0.2269]], requires_grad=True)]
Net2 parameters 
 [Parameter containing:
tensor([[-0.0969, -0.0174, -0.0342,  0.1916,  0.0039,  0.1663, -0.2375, -0.1561,
          0.0982, -0.1324],
        [ 0.1330,  0.1270, -0.0508, -0.2828, -0.1075, -0.0015,  0.1396,  0.2473,
         -0.3151,  0.1716],
        [-0.2688, -0.3004,  0.0702, -0.2984, -0.2828,  0.0428,  0.0161, -0.2365,
          0.2222, -0.2657],
        [-0.2590,  0.1048,  0.1980,  0.2172,

In [16]:
print('Net first linear layer weights \n', net._parameters)

Net first linear layer weights 
 OrderedDict()


In [19]:
print('Net first linear layer weights \n', net[0].weight)
print('Net2 first linear layer weights \n',net2[0].weight)
print('\n')
print('Net second linear layer weights \n', net[2].weight)
print('Net2 second linear layer weights \n', net2[2].weight)

Net first linear layer weights 
 Linear(in_features=10, out_features=4, bias=False)
Net2 first linear layer weights 
 Parameter containing:
tensor([[-0.0969, -0.0174, -0.0342,  0.1916,  0.0039,  0.1663, -0.2375, -0.1561,
          0.0982, -0.1324],
        [ 0.1330,  0.1270, -0.0508, -0.2828, -0.1075, -0.0015,  0.1396,  0.2473,
         -0.3151,  0.1716],
        [-0.2688, -0.3004,  0.0702, -0.2984, -0.2828,  0.0428,  0.0161, -0.2365,
          0.2222, -0.2657],
        [-0.2590,  0.1048,  0.1980,  0.2172,  0.0037, -0.0745, -0.2623,  0.0855,
         -0.0913,  0.0862]], requires_grad=True)


Net second linear layer weights 
 Parameter containing:
tensor([[ 0.2993,  0.0315,  0.3062, -0.2269]], requires_grad=True)
Net2 second linear layer weights 
 Parameter containing:
tensor([[ 0.2993,  0.0315,  0.3062, -0.2269]], requires_grad=True)


In [None]:
print('Net first linear layer weights gradients \n', net[0].weight.grad)
print('Net2 first linear layer weights gradients \n',net2[0].weight.grad)
print('\n')
print('Net second linear layer weights gradients \n', net[2].weight.grad)
print('Net2 second linear layer weights gradients \n', net2[2].weight.grad)

In [None]:
# One gradient update step is called on Net2:
loss= nn.BCEWithLogitsLoss()
opt= torch.optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

In [None]:
error = loss(net(x), y)
error.backward()
opt.step()

In [None]:
error

In [None]:
print('Net first linear layer weights \n', net[0].weight)
print('Net2 first linear layer weights \n',net2[0].weight)
print('\n')
print('Net second linear layer weights \n', net[2].weight)
print('Net2 second linear layer weights \n', net2[2].weight)

In [None]:
print('Net first linear layer weights gradients \n', net[0].weight.grad)
print('Net2 first linear layer weights gradients \n',net2[0].weight.grad)
print('\n')
print('Net second linear layer weights gradients \n', net[2].weight.grad)
print('Net2 second linear layer weights gradients \n', net2[2].weight.grad)