# Automatic Differentiation


Once we start building models we may sometimes need to create new operations or functions to take care of new types of blocks in the neural network and handle the loss propagation ourselves. This is also helpful when we make some changes in our model and we need full control on how the model will be trained.

## tf.GradientTape

To compute the gradient we use the `GradientTape` api.

In [1]:
import tensorflow as tf

In [2]:
x = tf.constant([2.0])

with tf.GradientTape(persistent=False, watch_accessed_variables=True) as grad:
    f = x ** 2

print('The df/dx where f=(x)^2: \n', grad.gradient(f, x))

The df/dx where f=(x)^2: 
 None


As we can see that if x is a constant the gradient doesnt come out to be 0 but it comes as None.

In [3]:
x = tf.Variable([2.0])

with tf.GradientTape(persistent=False, watch_accessed_variables=True) as grad:
    f = x ** 2

print('The df/dx where f=(x)^2: \n', grad.gradient(f, x))

The df/dx where f=(x)^2: 
 tf.Tensor([4.], shape=(1,), dtype=float32)


But the recommended way is to use `.watch()` operation

In [4]:
x = tf.constant([2.0])

with tf.GradientTape(persistent=False, watch_accessed_variables=True) as grad:
    grad.watch(x)
    f = x ** 2

print('The df/dx where f=(x)^2: \n', grad.gradient(f, x))

The df/dx where f=(x)^2: 
 tf.Tensor([4.], shape=(1,), dtype=float32)


In [5]:
x = tf.Variable([2.0])

with tf.GradientTape(persistent=False, watch_accessed_variables=True) as grad:
    grad.watch(x)
    f = x ** 2

print('The df/dx where f=(x)^2: \n', grad.gradient(f, x))

The df/dx where f=(x)^2: 
 tf.Tensor([4.], shape=(1,), dtype=float32)


## Multi Variable Function

In [14]:
w = tf.Variable([[1, 2],
                 [3, 4],
                 [5, 6]], dtype=tf.float32, name='w')
b = tf.Variable(tf.zeros(2, dtype=tf.float32), name='b')
x = [[1, 2, 3]]

with tf.GradientTape(persistent=False) as grad:
    y = x @ w + b
    loss = tf.reduce_mean(y ** 2)
    [dl_dw, dl_db] = grad.gradient(loss, [w, b])

print('Y: ', y)
print('Loss: ', loss)
print('d(Loss)/dw: ', dl_dw)
print('d(Loss)/db: ', dl_db)

Y:  tf.Tensor([[22. 28.]], shape=(1, 2), dtype=float32)
Loss:  tf.Tensor(634.0, shape=(), dtype=float32)
d(Loss)/dw:  tf.Tensor(
[[22. 28.]
 [44. 56.]
 [66. 84.]], shape=(3, 2), dtype=float32)
d(Loss)/db:  tf.Tensor([22. 28.], shape=(2,), dtype=float32)


## Persistent Gradient

By default the resources are released once we call the grad.gradient() function. But if we have to calculate multiple functions over the same variable then we will need to persist the gradient so that the functions and its relations are preserved even once we have called grad.gradient().

**Note: Make sure to use `del grad` once done so that the garbage collector releases the resources used by grad object**

In [8]:
x = tf.Variable([2.0])

with tf.GradientTape(persistent=False, watch_accessed_variables=True) as grad:
    f = x ** 2
    h = x ** 3

print('The df/dx where f=(x)^2: \n', grad.gradient(f, x))
print('The df/dx where h=(x)^3: \n', grad.gradient(h, x))

The df/dx where f=(x)^2: 
 tf.Tensor([4.], shape=(1,), dtype=float32)


RuntimeError: GradientTape.gradient can only be called once on non-persistent tapes.