In [1]:
import torch


In PyTorch, requires_grad is an attribute of a Tensor that indicates whether or not the gradients with respect to this tensor should be computed during the backward pass. When you perform backpropagation, PyTorch calculates gradients for all tensors with requires_grad=True. These gradients are stored in the .grad attribute of the tensor.

Why is requires_grad used?

Training Models: In deep learning, you often adjust the weights of your model to minimize the loss function. To update the weights using optimization algorithms like gradient descent, you need to compute the gradient of the loss function with respect to each weight. By setting requires_grad=True on the model's weights, PyTorch will automatically compute these gradients for you during backpropagation.

Freezing Layers: In transfer learning or fine-tuning, you might want to freeze certain layers of a pre-trained model (i.e., not update them during training). You can do this by setting requires_grad=False for the parameters of those layers.

Memory Efficiency: If you don't need gradients for a specific tensor (e.g., input data or when performing inference), setting requires_grad=False saves memory and computation time.

requires_grad is used to automatically compute the gradients needed for optimizing model parameters.

In [2]:
x = torch.randn(4, requires_grad=True)
print(x)

tensor([ 0.1049,  0.8008, -0.7523,  0.0189], requires_grad=True)


### 1 .. Add gradient

In [6]:
# y = x * 2
# print(y)

# y = x * 2
# z = y * y * 3
# print(z)

# z = x * x * 2
# z = z.mean()
# z = z.median()
# print(z)


# z = x * x * 2
# z = z.mean()
# z.backward()        ## dz/dx
# print(x.grad)       ## grad will work if the  requires_grad=True ..... grad can be implemented to a scaller product only



z = x * x * 2
v = torch.tensor([0.1, 0.3, 0.01, 1.0], dtype=torch.float32)
z.backward(v)        ## dz/dx
print(x.grad) 

tensor([ 0.1469,  1.7618, -0.7824,  0.0947])


Tensor Creation: We create two tensors x and y with requires_grad=True. This tells PyTorch to keep track of operations on these tensors and to compute their gradients when we call .backward().

Operation: We compute z = x * y + y. PyTorch builds a computation graph and records the operations.

Backward Pass: We call z.backward(), which computes the gradients of z with respect to x and y.

Gradients: After the backward pass, x.grad contains the gradient of z with respect to x, and y.grad contains the gradient of z with respect to y.

### 2 .... Remove gradient

In [17]:
y = torch.rand(4, requires_grad=True)
print(y)

# y.requires_grad_(False)
# print(y)

# z = y.detach()
# print(z)


with torch.no_grad():
    z = y + 2
    print(z)

tensor([0.6976, 0.0641, 0.3627, 0.2815], requires_grad=True)
tensor([2.6976, 2.0641, 2.3627, 2.2815])


### Training

In [21]:
weights = torch.ones(3, requires_grad=True)

for epochs in range(5):
    model_output = (weights*1.8).sum()
    print("model sum", model_output)

    model_output.backward()

    print("model gardient ", weights.grad)

    weights.grad.zero_()

model sum tensor(5.4000, grad_fn=<SumBackward0>)
model gardient  tensor([1.8000, 1.8000, 1.8000])
model sum tensor(5.4000, grad_fn=<SumBackward0>)
model gardient  tensor([1.8000, 1.8000, 1.8000])
model sum tensor(5.4000, grad_fn=<SumBackward0>)
model gardient  tensor([1.8000, 1.8000, 1.8000])
model sum tensor(5.4000, grad_fn=<SumBackward0>)
model gardient  tensor([1.8000, 1.8000, 1.8000])
model sum tensor(5.4000, grad_fn=<SumBackward0>)
model gardient  tensor([1.8000, 1.8000, 1.8000])
