# Gradient Tape Basics

In this notebook, we'll get familiar with TensorFlow's built-in API called Gradient Tape, which helps in performing automatic differentiation.

## Imports

In [None]:
import tensorflow as tf

## Basics of Gradient Tape

Let's explore how we can use [tf.GradientTape()](https://www.tensorflow.org/api_docs/python/tf/GradientTape) to perform automatic differentiation.

In [None]:
# Define a 2x2 array of 1's
x = tf.ones((2,2))

with tf.GradientTape() as t:
    # Record the actions performed on tensor x with `watch`
    t.watch(x)

    # Define y as the sum of the elements in x
    y =  tf.reduce_sum(x)

    # Let z be the square of y
    z = tf.square(y)

# Get the derivative of z wrt the original input tensor x
dz_dx = t.gradient(z, x)

# Print our result
print(dz_dx)

tf.Tensor(
[[8. 8.]
 [8. 8.]], shape=(2, 2), dtype=float32)


### Gradient tape expiration

By default, GradientTape is not persistent (`persistent=False`), meaning it expires after a single use. If multiple gradients need to be computed, this default setting will not suffice. To see this in action, set up gradient tape and calculate a gradient. Notice how the gradient tape will 'expire' after this calculation.

In [None]:
x = tf.constant(3.0)

# Notice that persistent is False by default
with tf.GradientTape() as t:
    t.watch(x)

    # y = x^2
    y = x * x

    # z = y^2
    z = y * y

# Compute dz/dx. 4 * x^3 at x = 3 --> 108.0
dz_dx = t.gradient(z, x)
print(dz_dx)

tf.Tensor(108.0, shape=(), dtype=float32)


#### Gradient tape has expired

Observe the result when attempting to calculate another gradient after the gradient tape has already been used once.

In [None]:
# Try to compute dy/dx after the gradient tape has expired
try:
    dy_dx = t.gradient(y, x)  # 6.0
    print(dy_dx)
except RuntimeError as e:
    print("The error message is:")
    print(e)

The error message is:
A non-persistent GradientTape can only be used to compute one set of gradients (or jacobians)


### Make the gradient tape persistent

To ensure that the gradient tape can be used multiple times, set `persistent=True`.

In [None]:
x = tf.constant(3.0)

# Set persistent=True so that we can reuse the tape
with tf.GradientTape(persistent=True) as t:
    t.watch(x)

    # y = x^2
    y = x * x

    # z = y^2
    z = y * y

# Compute dz/dx. 4 * x^3 at x = 3
dz_dx = t.gradient(z, x)

print(dz_dx)

tf.Tensor(108.0, shape=(), dtype=float32)


#### Reuse the tape!

Let's calculate a second gradient using this persistent tape.

In [None]:
# We can still compute dy/dx because of the persistent flag.
dy_dx = t.gradient(y, x)
print(dy_dx)

tf.Tensor(6.0, shape=(), dtype=float32)


As we can see it still works. We can delete the tape variable `t` as it is no longer needed.

In [None]:
# Delete the reference to the tape
del t

### Nested Gradient Tapes
Now, let's compute a higher-order derivative by nesting `GradientTape`s:

#### Proper Indentation for the First Gradient Calculation
Ensure that the first gradient calculation of `dy_dx` occurs inside the outer `with` block.

In [None]:
x = tf.Variable(1.0)

with tf.GradientTape() as tape_2:
    with tf.GradientTape() as tape_1:
        y = x * x * x

    # The first gradient calculation should occur at least within the outer with block
    dy_dx = tape_1.gradient(y, x)
d2y_dx2 = tape_2.gradient(dy_dx, x)

print(dy_dx)
print(d2y_dx2)

tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)


The first gradient calculation can also be placed inside the inner `with` block.

In [None]:
x = tf.Variable(1.0)

with tf.GradientTape() as tape_2:
    with tf.GradientTape() as tape_1:
        y = x * x * x

        # The first gradient calculation can also be within the inner with block
        dy_dx = tape_1.gradient(y, x)
d2y_dx2 = tape_2.gradient(dy_dx, x)

print(dy_dx)
print(d2y_dx2)

tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)


#### Where not to indent the first gradient calculation
If the first gradient calculation is OUTSIDE of the outer `with` block, it won't persist for the second gradient calculation.

In [None]:
x = tf.Variable(1.0)

with tf.GradientTape() as tape_2:
    with tf.GradientTape() as tape_1:
        y = x * x * x

# The first gradient call is outside the outer with block so the tape will expire after this
dy_dx = tape_1.gradient(y, x)

# The tape is now expired and the gradient output will be `None`
d2y_dx2 = tape_2.gradient(dy_dx, x)

print(dy_dx)
print(d2y_dx2)

tf.Tensor(3.0, shape=(), dtype=float32)
None


Notice how the `d2y_dx2` calculation is now `None`. The tape has expired. Another thing to note is that setting `persistent=True` for both gradient tapes won't resolve this issue.

In [None]:
x = tf.Variable(1.0)

# Setting persistent=True still won't work
with tf.GradientTape(persistent=True) as tape_2:
    # Setting persistent=True still won't work
    with tf.GradientTape(persistent=True) as tape_1:
        y = x * x * x

# The first gradient call is outside the outer with block so the tape will expire after this
dy_dx = tape_1.gradient(y, x)

# The output will be `None`
d2y_dx2 = tape_2.gradient(dy_dx, x)

print(dy_dx)
print(d2y_dx2)

tf.Tensor(3.0, shape=(), dtype=float32)
None


### Proper indentation for the second gradient calculation

The second gradient calculation `d2y_dx2` can be indented as much as the first calculation of `dy_dx` but not more.

In [None]:
x = tf.Variable(1.0)

with tf.GradientTape() as tape_2:
    with tf.GradientTape() as tape_1:
        y = x * x * x

        dy_dx = tape_1.gradient(y, x)

        # This is acceptable
        d2y_dx2 = tape_2.gradient(dy_dx, x)

print(dy_dx)
print(d2y_dx2)

tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)


This is also acceptable

In [None]:
x = tf.Variable(1.0)

with tf.GradientTape() as tape_2:
    with tf.GradientTape() as tape_1:
        y = x * x * x

        dy_dx = tape_1.gradient(y, x)

    # This is also acceptable
    d2y_dx2 = tape_2.gradient(dy_dx, x)

print(dy_dx)
print(d2y_dx2)

tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)


This is also acceptable

In [None]:
x = tf.Variable(1.0)

with tf.GradientTape() as tape_2:
    with tf.GradientTape() as tape_1:
        y = x * x * x

        dy_dx = tape_1.gradient(y, x)

# This is also acceptable
d2y_dx2 = tape_2.gradient(dy_dx, x)

print(dy_dx)
print(d2y_dx2)

tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)
