In [1]:
import tensorflow as tf
tf.__version__

'2.0.0'

In [2]:
def f(w1, w2):
    return 3*w1 **2 + 2*w1*w2

In [11]:
w1 , w2 = tf.Variable(5.), tf.Variable(3.)

with tf.GradientTape() as tape:
    z = f(w1, w2)
    

In [12]:
gradients = tape.gradient(z, [w1, w2])

    We first define two variables w1 and w2, then we create a tf.GradientTape context
    that will automatically record every operation that involves a variable, and finally we
    ask this tape to compute the gradients of the result z with regards to both variables [w1, w2].

In [14]:
gradients

[<tf.Tensor: id=114, shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: id=106, shape=(), dtype=float32, numpy=10.0>]

the gradient() method only goes through the recorded computations
once (in reverse order), no matter how many variables there are, so it is incredibly
efficient. It’s like magic!

In [15]:
with tf.GradientTape() as tape:
    z = f(w1, w2)

dz_dw1 = tape.gradient(z, w1)

In [17]:
dz_dw1

<tf.Tensor: id=138, shape=(), dtype=float32, numpy=36.0>

In [18]:
dz_dw2 = tape.gradient(z, w2) # Error

RuntimeError: GradientTape.gradient can only be called once on non-persistent tapes.

In [19]:
# you can make the following changes


with tf.GradientTape(persistent=True) as tape:
    z = f(w1, w2)


In [20]:
dz_dw1 = tape.gradient(z, w1)
dz_dw2 = tape.gradient(z, w2)

In [21]:
dz_dw1, dz_dw2

(<tf.Tensor: id=162, shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: id=167, shape=(), dtype=float32, numpy=10.0>)


    By default tape will track only operations involving variables, so if you try to compute the gradient of z w.r.t 
    anything else than a Variable, the result will be None

In [22]:
c1 , c2 = tf.constant(5.), tf.constant(3.)

with tf.GradientTape() as tape:
    tape.watch(c1)
    tape.watch(c2)
    z = f(c1, c2)
    

In [23]:
gradients = tape.gradient(z, [c1, c2])

In [24]:
gradients

[<tf.Tensor: id=197, shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: id=189, shape=(), dtype=float32, numpy=10.0>]