In [2]:
import tensorflow as tf
import numpy as np

To run autograd we have to use the `tf.GradientTape` as the context object. We can use a `tf.Tensor` as the root variable on which the rest of the compute graph is built, or we can use a `tf.Variable` object for that. If we use a plain old `tf.Tensor` then we have to tell the gradient tape that we want to perform gradient calculations on it.

In [3]:
x = tf.ones(4)  # same as x = tf.constant(np.ones(4))
with tf.GradientTape() as t:
    t.watch(x)  # we tell the gradient tape that x is the variable
    y = x + 2
    y2 = y ** 2
    z = 3 * y2

In [4]:
dz_dx = t.gradient(z, x)
dz_dx

W0813 10:00:26.017770 4766352832 deprecation.py:323] From /Users/avilay/venvs/ai/lib/python3.7/site-packages/tensorflow/python/ops/math_grad.py:1205: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


<tf.Tensor: id=32, shape=(4,), dtype=float32, numpy=array([18., 18., 18., 18.], dtype=float32)>

In [5]:
x = tf.Variable(tf.ones(4))  # No need to call t.watch(x) because x is a Variable
with tf.GradientTape() as t:
    y = x + 2
    y2 = y ** 2
    z = 3 * y2

In [5]:
dz_dx = t.gradient(z, x)
dz_dx

<tf.Tensor: id=87, shape=(4,), dtype=float32, numpy=array([18., 18., 18., 18.], dtype=float32)>

We can calculate the graidents between any two variable nodes in the graph. e.g., we can calculate $\frac{dz}{dy}$ or $\frac{dy}{dx}$.

In [6]:
x = tf.ones(4)
with tf.GradientTape() as t:
    t.watch(x)
    y = x + 2
    y2 = y ** 2
    z = 3 * y2

In [6]:
dz_dy = t.gradient(z, y)
dz_dy

<tf.Tensor: id=87, shape=(4,), dtype=float32, numpy=array([18., 18., 18., 18.], dtype=float32)>

In [7]:
x = tf.ones(4)
with tf.GradientTape() as t:
    t.watch(x)
    y = x + 2
    y2 = y ** 2
    z = 3 * y2

dy_dx = t.gradient(y, x)
dy_dx

<tf.Tensor: id=113, shape=(4,), dtype=float32, numpy=array([1., 1., 1., 1.], dtype=float32)>

But once a gradient tape has been used, i.e., `t.gradient` has been called on it, it cannot be reused.

In [8]:
x = tf.ones(4)
with tf.GradientTape() as t:
    t.watch(x)
    y = x + 2
    y2 = y ** 2
    z = 3 * y2
    
dy_dx = t.gradient(y, x)
print(dy_dx)

try:
    dz_dx = t.gradient(z, x)
except RuntimeError as re:
    print(re)

tf.Tensor([1. 1. 1. 1.], shape=(4,), dtype=float32)
GradientTape.gradient can only be called once on non-persistent tapes.


We can always create a reusable gradient tape if needed. But the gradients are immutable, i.e., once I have calculated $\frac{dy}{dx}$, calling it repeatedly is going to give me the same value.

In [9]:
x = tf.ones(4)
with tf.GradientTape(persistent=True) as t:
    t.watch(x)
    y = x + 2
    y2 = y ** 2
    z = 3 * y2
    
dy_dx = t.gradient(y, x)
print(dy_dx)

dz_dx = t.gradient(z, x)
print(dz_dx)

dy_dx = t.gradient(y, x)
print(dy_dx)

tf.Tensor([1. 1. 1. 1.], shape=(4,), dtype=float32)
tf.Tensor([18. 18. 18. 18.], shape=(4,), dtype=float32)
tf.Tensor([1. 1. 1. 1.], shape=(4,), dtype=float32)


To calculate second order derivatives we have to use two gradient tapes.

In [11]:
x = tf.ones(4)
with tf.GradientTape() as t0:
    t0.watch(x)
    with tf.GradientTape() as t1:
        t1.watch(x)
        y = x * x * x
    dy_dx = t1.gradient(y, x)

In [12]:
d2y_dx2 = t0.gradient(dy_dx, x)
d2y_dx2

<tf.Tensor: id=210, shape=(4,), dtype=float32, numpy=array([6., 6., 6., 6.], dtype=float32)>

In [13]:
dy_dx

<tf.Tensor: id=195, shape=(4,), dtype=float32, numpy=array([3., 3., 3., 3.], dtype=float32)>

Use the function decorator for better performance. For development this does not seem to be needed. I'll look at this later when I get the stage where I am deploying my models in prod or prod-like.

In [4]:
@tf.function
def square(v):
    return v*v

In [5]:
x = tf.Variable(1.)
with tf.GradientTape() as t:
    y = square(x)

In [6]:
t.gradient(y, x)

<tf.Tensor: id=32, shape=(), dtype=float32, numpy=2.0>