In [51]:
import torch
import numpy as np

# Gradient

`autograd` Is built in module in pytorch

It tracks tensor operations, and when its time to calculate derivative, it does it fast

In [52]:
x = torch.tensor([2.], requires_grad=True)
x

tensor([2.], requires_grad=True)

# Example gradient

We calculate gradient/derivative of the function:

`z = x^2+x`


`dz/dx = 2x+1`

`x = 2 -> gradient = 2*2+1 = 5`

In [53]:
x = torch.tensor([2.], requires_grad=True)
y=x**2
z=y+x
z.backward()
print("x.grad = ", x.grad)

x.grad =  tensor([5.])


## Same way, shorter

In [54]:
x = torch.tensor([2.], requires_grad=True)
z = x**2 + x
z.backward()  # Calculate gradient
print(x.grad)  # Get result

tensor([5.])


# Example 1 - X is vector (2 or more values)

## Partial derivatives

`y = x_0 * x_1`

Here pytorch takes partial derivative:

* First gradient = `dy/dx_0`

  * We get: `dy/dx_0*y = dy/dx_0 * x_0 * x_1`

  * The result is: `x_1` which is `3` so the first value is `3`.

* Second gradient = `dy/dx_1`

  * We get: `dy/dx_1*y = dy/dx_1 * x_0 * x_1 `

  * The result is: `x_0` which is `2` so the second value is `2`.

* We get: `(3, 2)`




In [55]:
x = torch.tensor([2.,3.],requires_grad=True)
y = x[0]*x[1]
y.backward()
print(x.grad)

tensor([3., 2.])


# Example 2 - Two tensors in paralel

`z = (x_0+x_1)^2 + (y_0+y_1)^2 + x_0 + y_1`

Here we have both `x` and `y` variables.

We calculate gradient of `x` by using two partial derivatives of `x_0` and `x_1`, and so, derivative of `x` is a point (two values).

Same for `y`.

In [56]:
x = torch.tensor([2.,3.],requires_grad=True)
y = torch.tensor([4.,5.],requires_grad=True)
w=x.sum()**2
z=w+y.sum()**3+x[0]+y[1]
z.backward()
print(x.grad,y.grad,sep='\n')

tensor([11., 10.])
tensor([243., 244.])


### Notice
**the values `[2, 3]` are NOT the `[2x_0, 3x_1]` but instead the VALUE of `x_0, x_1`**.

# .detach() - Don't calculate gradient

By calling `w.detach()` we say that 'dont calculate gradient of 's', and it counts as fixed number'

`z = s + (y_0 + y_1)^3 + x_0 + y_1`

In [57]:
x = torch.tensor([2.,3.],requires_grad=True)
y = torch.tensor([4.,5.],requires_grad=True)
w=x.sum()**2
s=w.detach()
z=s+y.sum()**3+x[0]+y[1]
z.backward()
print(x.grad,y.grad,sep='\n')

tensor([1., 0.])
tensor([243., 244.])


# Example 3 - read from pdf

# Example 4 - read from pdf

# Question 1 - we can't backward again - `retain_graph`

We can't backward again, read: https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#computational-graph

Pytorch saves in a graph ALL of the computations, if `required_grad` equals True.

When we backward, python basically throws the 'forward' derivative, to save memory. (listen to professor, lecture 2, 1:48:30).

**Because we throw the 'forward' calculation, when we 'backward' we basically thrown everyting out except the result of the calculation.**

## What does `retain_graph` do?

It saves the intermediate derivatives (the gray boxes in the URL above - the functions `PowBackward()` and `MultiBackward()`).

Without `retain_graph` python just saves the results, and not the intermediate process and intermediate product.

In [66]:
x = torch.tensor([2.], requires_grad=True)
y=x**2
z=y+x
z.backward()
print(x.grad)
z.backward()
print(x.grad)

tensor([5.])


RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

## Question 1 - Solution - use `retain_graph`

In [67]:
x = torch.tensor([2.], requires_grad=True)
y=x**2
z=y+x
z.backward(retain_graph=True)
print(x.grad)
z.backward()
print(x.grad)

tensor([5.])
tensor([10.])


## Problem - `retain_graph` saves the result - python doesn't reset the gradient

In [69]:
x=torch.tensor(2., requires_grad=True)
y=x**2
y.retain_grad()
z=y+x
z.backward(retain_graph=True)
print(x.grad, y.grad)
z.backward()  # The previous gradient is saved!
print(x.grad, y.grad)

tensor(5.) tensor(1.)
tensor(10.) tensor(2.)


## Solution - reset gradient

In [72]:
x=torch.tensor(2., requires_grad=True)
y=x**2
y.retain_grad()
z=y+x
z.backward(retain_graph=True)
print(x.grad, y.grad)
x.grad = None  # Reset gradient
z.backward() 
print(x.grad, y.grad)

tensor(5.) tensor(1.)
tensor(5.) tensor(2.)


# כלל השרשרת בגזירה - Derivative Chain Rule

![](./img/Untitled.png)

# Question 2

## Problem - gradient of unfollowed variable

In [74]:
x=torch.tensor(2., requires_grad=True)
y=x**2
z=y+x
z.backward()
print(x.grad)

t=x.grad
x.grad = None
t.backward()

tensor(5.)


RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

## Solution - use `create_graph`

Ignore the warning

In [75]:
x=torch.tensor(2., requires_grad=True)
y=x**2
z=y+x
z.backward(create_graph=True)  # Here
print(x.grad)

t=x.grad
x.grad = None
t.backward()


tensor(5., grad_fn=<CopyBackwards>)


  Variable._execution_engine.run_backward(
