# Taking derivatives with Tensorflow

In this short notebook, we'll look at some examples of how to do unusual derivatives with tensorflow.  Some of this is useful in the "real"  notebooks, and some of it is irrelevant to the AI4NP school but may be useful to you elsewhere.

Here's an example from the tensorflow Gradient Tape example:

In [12]:
import tensorflow as tf
import numpy

In [4]:
x = tf.constant(3.0)
with tf.GradientTape() as g:
  g.watch(x)
  y = x * x
dy_dx = g.gradient(y, x)
print(dy_dx)


tf.Tensor(6.0, shape=(), dtype=float32)


NOTE: We had to make sure to mark that we wanted a derivative with respect to x!  That is because tensorflow only cares (by default) about tf.Variable objects.

This is because NN weights are typically tf.Variables, and most users don't want the gradient with respect to the input.  In our applications, though, we absolutely want that.

In [5]:
x = tf.constant(3.0)
with tf.GradientTape() as g:
  y = x * x
dy_dx = g.gradient(y, x)
print(dy_dx)

None


Without declaring it, we get `None`

In [6]:
x = tf.Variable(3.0, trainable=True)
with tf.GradientTape() as g:
  y = x * x
dy_dx = g.gradient(y, x)
print(dy_dx)

tf.Tensor(6.0, shape=(), dtype=float32)


If x is a variable, we're all good again.

# Multiple Tapes

This example is from (Advanced Automatic Differentiation)[https://www.tensorflow.org/guide/advanced_autodiff]

In [7]:
x0 = tf.constant(0.0)
x1 = tf.constant(0.0)

with tf.GradientTape() as tape0, tf.GradientTape() as tape1:
  tape0.watch(x0)
  tape1.watch(x1)

  y0 = tf.math.sin(x0)
  y1 = tf.nn.sigmoid(x1)

  y = y0 + y1

  ys = tf.reduce_sum(y)


In [8]:
tape0.gradient(ys, x0).numpy()   # cos(x) => 1.0


1.0

In [9]:
tape1.gradient(ys, x1).numpy()   # sigmoid(x1)*(1-sigmoid(x1)) => 0.25


0.25

# TF Functions

Are tf functions really faster?  Let's find out.

In [39]:
def poorly_written_function(x, y):
    x = x + 3
    x = tf.exp(x)
    x = x - 7
    y = y**2
    y = tf.math.tanh(y)
    y = y -1
    
    return x + y

In [21]:
x = tf.convert_to_tensor(numpy.arange(0, 100., 0.01))
y = tf.convert_to_tensor(numpy.arange(-50.,50, 0.01))

In [22]:
z = poorly_written_function(x, y)

In [23]:
import timeit

t = timeit.Timer(lambda: poorly_written_function(x,y))  
print(t.timeit(5))

0.005386841017752886


In [25]:
def better_written_function(x,y):
    x = tf.exp(x + 3.) - 7.
    y = tf.math.tanh(y**2) - 1
    y = y -1
        
    return x + y

In [26]:
import timeit

t = timeit.Timer(lambda: better_written_function(x,y))  
print(t.timeit(5))

0.004837962798774242


Things went a little faster here because we didn't go into python as much.

In [31]:
traced_function = tf.function(poorly_written_function)
traced_function(x,y) # the very first call does the trace and graph compilation

<tf.Tensor: shape=(10000,), dtype=float64, numpy=
array([1.30855369e+01, 1.32873999e+01, 1.34912917e+01, ...,
       5.23965632e+44, 5.29231574e+44, 5.34550440e+44])>

In [32]:
import timeit

t = timeit.Timer(lambda: traced_function(x,y))  
print(t.timeit(5))

0.0027769929729402065


Even faster this time!

Even for this (useless) function, we can compute derivatives:

In [36]:
with tf.GradientTape(persistent=True) as tape:
    tape.watch(x)
    tape.watch(y)
    z = traced_function(x,y)


In [37]:
dzdx = tape.gradient(z, x)
print(dzdx)

tf.Tensor(
[2.00855369e+01 2.02873999e+01 2.04912917e+01 ... 5.23965632e+44
 5.29231574e+44 5.34550440e+44], shape=(10000,), dtype=float64)


In [38]:
dzdy = tape.gradient(z, y)
print(dzdy)

tf.Tensor([0. 0. 0. ... 0. 0. 0.], shape=(10000,), dtype=float64)


Is it right?  Well:
$z = tanh(y**2) - 1 + e^{x + 3} -7$

$\frac{dz}{dx} = \frac{d}{dx} e^{(x+3)} = e^3 * \frac{d}{dx} e^x $

In [43]:
numpy.exp(3.) * tf.exp(x) - dzdx

<tf.Tensor: shape=(10000,), dtype=float64, numpy=
array([0.00000000e+00, 3.55271368e-15, 0.00000000e+00, ...,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00])>

Looks pretty good!