<h1><b>Automatic Differentiation and Gradients</h1></b>
<br>Automatic differentiation is useful for implementing machine learning algorithms such as backpropagation for traning neural networks.
<br>This notebook will show you ways to compute gradients with TensorFlow.

In [7]:
import numpy as np
import matplotlib as plt
import tensorflow as tf

TensorFlow provides the <code>tf.GradientTape</code> API for automatic differentiation; that is, computing the gradient of a computation with respect to some inputs, usually <code>tf.Variables</code>. TensorFlow "records" relevant operations executed inside the context of a tf.GradientTape onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.

In [8]:
x = tf.Variable(3.0)

with tf.GradientTape() as tape:
    y = x**2

Use <code>GradientTape.gradient(target,sources)</code> to calculate the gradient of some targe(often a loss)  relative to some source(often the model's variables):

In [9]:
dy_dx = tape.gradient(y, x)
dy_dx.numpy()
print(dy_dx.numpy())

6.0


The above example uses scalars, but <code>tf.GradientTape</code> works as easily on any tensor:

In [10]:
w = tf.Variable(tf.random.normal((3,2)),name='w')
b = tf.Variable(tf.zeros(2,dtype=tf.float32),name = 'b')
x = [[1.,2.,3.]]
print(w)
print(b)

with tf.GradientTape(persistent=True) as tape:
    y = x @ w  + b 
    loss = tf.reduce_mean(y**2)


<tf.Variable 'w:0' shape=(3, 2) dtype=float32, numpy=
array([[-0.19012986, -0.64281493],
       [ 1.9111147 , -0.10302108],
       [ 1.3378382 ,  1.6591442 ]], dtype=float32)>
<tf.Variable 'b:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)>


<h1><b>Gradients with respect to model</b></h1>
<br>
Its common to collect <code>tf.Variables</code> into a <code>tf.Module</code> or one of its subclasses (<code>layers.Layer</code>,<code>keras.Model</code>) for checkpointing and exporting.

<br>
In most cases, you will want to calculate gradients with respect to a model's  trainable variables. Since all subclasses of <code>tf.Module</code> aggregate their variables in the <code>Module.trainable_variables</code> property, you can caulculate these gradients in a few lines of code:

In [12]:
layer = tf.keras.layers.Dense(2, activation='relu')
x = tf.constant([[1.,2.,3.,]])

with tf.GradientTape() as tape:
    y = layer(x)
    loss = tf.reduce_mean(y**2)

grad = tape.gradient(loss, layer.trainable_variables)

In [13]:
for var, g in zip(layer.trainable_variables, grad):
    print(f'{var.name}, shape: {g.shape}')

dense_1/kernel:0, shape: (3, 2)
dense_1/bias:0, shape: (2,)
