<img src="https://drive.google.com/uc?id=1EOzaynFEkpMjE-PWPqllAYWRScW-2YQ0">

<br>

<hr>

<br>

<center>

<h1> Avisek Gupta </h1>

<h1> Indian Statistical Institute, Kolkata </h1>

</center>

<br>

<hr>

<br>

<br>

<hr>

<br>

<center>

<h1> Tutorial 2: Tensor Gradients and Optimization </h1>

</center>

<br>

<hr>

<br>


In [1]:
import torch

<br>

<hr>

<br>

<h2> 1. Computing Tensor Gradients : </h2>

<h2> (i) Every Tensor has a flag ''requires_grad'' that allows for fine grained exclusion of subgraphs from gradient computation </h2>

<h2> (ii) By default, the gradients of user-created tensors are not calculated. </h2>

<br>

<hr>

<br>


In [2]:
x = torch.ones(2, 2)
print(x)
print(x.requires_grad)

x = torch.ones(2, 2, requires_grad=True)
print(x)
print(x.requires_grad)


tensor([[1., 1.],
        [1., 1.]])
False
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
True


<br>

<hr>

<br>

<h2> (iii) The backward() function calculates the gradients. </h2>

<br>

<hr>

<br>


In [3]:
x = torch.ones(2, 2, requires_grad=True)
print(x)

y = x + 2
print(y)

z = 3 * (y ** 2)
print(z)

out = z.mean()
print(out)

out.backward()  # equivalent to out.backward(torch.tensor(1.)) since out contains a single scalar

print(x.grad)


tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)
tensor(27., grad_fn=<MeanBackward0>)
tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


<br>

<hr>

<br>

<h2> (iv) We cannot print the gradients of intermediate tensors y or z, even though they are calculated. </h2>

<h2> (v) Only if ALL inputs tensors DO NOT require gradients, the gradient of the output tensor will NOT be tracked. </h2>

<br>

<hr>

<br>


In [4]:
print('y:', y)
print('y.grad:', y.grad)
print('y.requires_grad:', y.requires_grad)
print('y.grad_fn:', y.grad_fn)

print('z:', z)
print('z.grad:', z.grad)
print('z.requires_grad:', z.requires_grad)
print('z.grad_fn:', z.grad_fn)


y: tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
y.grad: None
y.requires_grad: True
y.grad_fn: <AddBackward0 object at 0x7f0d90ac2e10>
z: tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)
z.grad: None
z.requires_grad: True
z.grad_fn: <MulBackward0 object at 0x7f0d90ad51d0>


  
  import sys


<br>

<hr>

<br>

<h2> 2. Stop automatic computation of gradients on Tensors with "requires_grad=True" either by - </h2>
    
<h2> (i) Wrapping the code block in with torch.no_grad(), or </h2>
    
<h2> (ii) By using .detach() to get a new Tensor with the same content but that does not require gradients </h2>

<br>

<hr>

<br>


In [5]:
x = torch.randn(3, requires_grad=True)

print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)
    

True
True
False


In [6]:
print(x.requires_grad)
y = x.detach()
print(y.requires_grad)
print(x.eq(y).all())


True
False
tensor(True)


<br>

<hr>

<br>

<h2> 3. A simple gradient descent code using only backward() to compute the gradients </h2>

<br>

<hr>

<br>

In [7]:
# First lets try to code only a single step of gradient descent

x = torch.randn(1, requires_grad=True)
print('x:', x)

y = (x - 4) ** 2
y.backward()

learning_rate = 0.1

print('x.grad:', x.grad)
print('x - x.grad:', x - learning_rate * x.grad)
with torch.no_grad():
    x = x - learning_rate * x.grad
print('x:', x)
x.requires_grad = True
print('x:', x)


x: tensor([0.2121], requires_grad=True)
x.grad: tensor([-7.5758])
x - x.grad: tensor([0.9697], grad_fn=<SubBackward0>)
x: tensor([0.9697])
x: tensor([0.9697], requires_grad=True)


In [8]:
# Gradient Descent till convergence

x = torch.randn(1, requires_grad=True)
print('Initial x:', x)

max_iter = 100
learning_rate = 0.1
eps = 1e-6

for _ in range(max_iter):
    y = (x - 4) ** 2
    y.backward()
    with torch.no_grad():
        prev_x = x
        x = x - learning_rate * x.grad
        if torch.norm(prev_x - x) < eps:
            break
    x.requires_grad = True

print('Soln x:', x)


Initial x: tensor([0.7421], requires_grad=True)
Soln x: tensor([4.0000])


<br>

<hr>

<br>

<h2> Recap: </h2>

<h2> 1. Computing Tensor Gradients : </h2>

<h2> (i) Every Tensor has a flag ''requires_grad'' that allows for fine grained exclusion of subgraphs from gradient computation </h2>

<h2> (ii) By default, the gradients of user-created tensors are not calculated. </h2>

<h2> (iii) The backward() function calculates the gradients. </h2>

<h2> (iv) We cannot print the gradients of intermediate tensors y or z, even though they are calculated. </h2>

<h2> (v) Only if ALL inputs tensors DO NOT require gradients, the gradient of the output tensor will NOT be tracked. </h2>

<h2> 2. Stop automatic computation of gradients on Tensors with "requires_grad=True" either by - </h2>
    
<h2> (i) Wrapping the code block in with torch.no_grad(), or </h2>
    
<h2> (ii) By using .detach() to get a new Tensor with the same content but that does not require gradients </h2>

<h2> 3. A simple gradient descent code using only backward() to compute the gradients </h2>

<br>

<hr>

<br>
