<a href="https://colab.research.google.com/github/Ayikanying-ux/Getting_started_-with_deep_learning/blob/main/automatic_differentiation_and_gradients.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Automatic Differentiation and Gradients
Automatic differentiation is useful for implementing machine learning algorithms such as backpropagation for training neural networks

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

### Computing gradients
To differentiate automatically, TensorFlow needs to remember what operations happen in what order during the forward pass, Then, during the backward pass, TensorFlow traverses the list of operations in reserve order to compute gradients

### Gradient tapes
TensorFlow provides the tf.GradientTape API for automatic differentiation; that is, computing the gradient of a computation with respect to some inputs, usually tf.Variables. TensorFlow "records" relevant operations executed inside the context of a tf.GradientTape onto a "tape". TensorFlow then used that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.

In [3]:
x = tf.Variable(3.0)

with tf.GradientTape() as tape:
  y = x**2



---
Once you've recorded some operations, use GradientTape.gradient(target, sources) to calculate teh gradient of some target (often a loss) relative to some source (often the model's varibles)

In [4]:
dy_dx = tape.gradient(y, x)
dy_dx.numpy()

6.0



---
The above example uses scalars, but tf.GradientTape works as easily on any tensor:


In [7]:
w = tf.Variable(tf.random.normal((3, 2)), name="w")
b = tf.Variable(tf.zeros(2, dtype=tf.float32), name='b')
x = [[1., 2., 3.]]

with tf.GradientTape(persistent=True) as tape:
  y = x @ w + b
  loss = tf.reduce_mean(y**2)



---
To get the gradient of loss with respect to both variables, you can pass both as sources to the gradient method. The tape is flexible about how sources are passed and will accept any nested combination of lists or dictionaries and return the gradient structured the same way (see tf.nest).


In [8]:
[dl_dw, dl_db] = tape.gradient(loss, [w, b])

In [9]:
print(w.shape)
print(dl_dw.shape)

(3, 2)
(3, 2)


In [10]:
my_vars={
    'w': w,
    'b': b
}
grad = tape.gradient(loss, my_vars)
grad['b']

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([3.93863  , 1.7535148], dtype=float32)>

### Gradient with respect to a model
- It's common to collect ```tf.Variables``` into ```tf.Module``` or one of its subclassed (```layers.Layer```, ```keras.Model```) fo checking and exporting.

- In most cases, you will want to calculate gradients with respect to a model's trainable variables. Since all subclasses of ```tf.Module``` aggregate their variables in the ```Module.trainable_variables``` property, you can calculate these gradients in a few lines of code

In [12]:
layer = tf.keras.layers.Dense(2, activation='relu')
x = tf.constant([[1., 2., 3.]])

with tf.GradientTape() as tape:
  # Forward pass
  y = layer(x)
  loss = tf.reduce_mean(y**2)

# Calculate gradients with respect to every trainable variable
grad = tape.gradient(loss, layer.trainable_variables)

In [13]:
for var, g in zip(layer.trainable_variables, grad):
  print(f'{var.name}, shape: {g.shape}')

dense_1/kernel:0, shape: (3, 2)
dense_1/bias:0, shape: (2,)


### Controlling what the tape watches
The default behaviour is to record all operations after accessing a trainable ```tf.Variable```.

The reasons for this are:
- The tape needs to know which operations to record in the forward pass to calculate the gradients in the backwards pass.
- The tape holds references to intermediate outputs, so you don't want to record unnecessary operations.
- The most common use case involves calculating the gradient of a loss with respect to all a model's trainable variables.


In [14]:
# A trainable variable
x0 = tf.Variable(3.0, name='x0')

# Not trainable
x1 = tf.Variable(3.0, name="x1", trainable=False)

# Not a Variable: A variable + tensor returns a tensor
x2 = tf.Variable(2.0, name='x2')+1.0

# Not a Variable
x3 = tf.constant(3.0, name='x3')

with tf.GradientTape() as tape:
  y=(x0**2)+(x1**2)+(x2**2)

grad = tape.gradient(y, [x0, x1, x2, x3])

for g in grad:
  print(g)

tf.Tensor(6.0, shape=(), dtype=float32)
None
None
None




---
You can list the variables being watched by the tape using the GradientTape.watched_variables method:


In [15]:
[var.name for var in tape.watched_variables()]

['x0:0']



---
* ```tf.GradientTape``` provides hooks that give the user control over what is or is not watched.
* To record gradients with respect to a ```tf.Tensor```, you need to call ```GradientTape.watch(x)```:

In [16]:
x=tf.constant(3.0)
with tf.GradientTape() as tape:
  tape.watch(x)
  y = x**2

dy_dx = tape.gradient(y, x)
print(dy_dx.numpy())

6.0




---
Conversely, to disable the default behavior of watching all ```tf.Variables```, set ```watch_accessed_variables=False``` when creating the gradient tape. This calculation uses two variables, but only connects the gradient for one of the variables: