From https://www.tensorflow.org/guide/basics

TensorFlow (TF) is an end-to-end platform for machine learning. It supports the following:

1. Multidimensional-array based numeric computation (similar to NumPy.)
2. GPU and distributed processing
3. Automatic differentiation
4. Model construction, training, and export
5. And more

---

Some important points about TF.

* TensorFlow was developed by the Google Brain team for internal use at Google. Then, it was made open source in November 2015.
* Keras is an API (application programming interface) for deep learning calculations. API means that Keras defines a specific interface to write codes.
* Keras was written in Python.
* Keras has multiple backends (libraries that are
responsible for doing the actual calculations): TensorFlow,
CNTK, Theano.
* Keras is focused on easy and fast prototyping, through
user friendliness, modularity, and extensibility.
* Although TF can be used as a backend for Keras, it is
recommended to use tf.keras, which is the implementation
of Keras in TF.





---

**Load essential libraries**

---

In [None]:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
%matplotlib inline

import tensorflow as tf

---

**Check TensorFlow version**

---

In [None]:
tf.__version__

---
Introduction to tensors from https://www.tensorflow.org/guide/tensor

---

In [None]:
# Some code examples here

---

Introduction to variables from https://www.tensorflow.org/guide/variable

---

In [None]:
# Some code examples here

---

Automatic differentiation using TF (https://www.tensorflow.org/guide/autodiff)

Example: calculate the sensitivity of $L(w) = 4w+w^3$ w.r.t. the input $w$ at $w=1.$

Sensitivity $\nabla_wL = 4+3w^2,$ which at $w=1$ is equal to $4+3\times1^2=7.$

---

In [None]:
w = tf.Variable(1.0)

with tf.GradientTape() as g:
    L = 4*w + w**3

gradL_w = g.gradient(L, w)
print('gradient of L w.r.t. w at w = 2 is', gradL_w)

---

Example: calculate the sensitivity of $L(w_1,w_2) = w_1+w_2^2$ w.r.t. the inputs $w_1, w_2$ at $w_1=1, w_2=2.$

Setting $\mathbf{w} = \begin{bmatrix}w_1\\w_2\end{bmatrix},$ sensitivity $\nabla_\mathbf{w}L= \begin{bmatrix}\nabla_{w_1}(w_1+w_2^2)\\\nabla_{w_2}(w_1+w_2^2)\end{bmatrix} = \begin{bmatrix}1\\2w_2\end{bmatrix},$

 which at $w_1=1,w_2=2$ is equal to $\begin{bmatrix}1\\4\end{bmatrix}.$

 ---

In [None]:
w1 = tf.Variable(1.0)
w2 = tf.Variable(2.0)

with tf.GradientTape() as g:
    L = w1 + w2**2

gradL_w1, gradL_w2 = g.gradient(L, [w1, w2])
print('gradL_w1 = ', gradL_w1.numpy(), '; gradL_w2 = ', gradL_w2.numpy())

---

In TF, we can control which input is considered an independent variable versus a constant value.

`gradient` will return `None` when the input is not a `tf.Variable`.

---

In [None]:
# Independent variable
w1 = tf.Variable(1.0, name = 'w1')

# A tf.constant is not a variable
c1 = tf.constant(-2.0, name = 'c1')

# Constant because we specify trainable = False
c2 = tf.Variable(-1.0, name = 'c2', trainable = False)

# variable + tensor returns a tensor. So c3 is not a tf.Variable.
c3 = tf.Variable(1.0, name = 'c3') + 1.0

# A variable but not used to compute y
alpha = tf.Variable(0., name = 'alpha')

with tf.GradientTape() as g:
    L = (w1 + c1)**2 + (c2**3) + 4*c3

grad = g.gradient(L, [w1, c1, c2, c3, alpha])

for dw in grad:
    print(dw)

---

A `tf.Tensor` can be used as a variable using the `watch` function.

For example, consider calculating the sensitivity of $L(w) = w^4$ at $w=-3.$

The sensitivity is $\nabla_wL = 4w^3,$ which at $w=-3$ is equal to $4\times(-3)^3 = -108.$

---

In [None]:
w = tf.constant(-3.0)

with tf.GradientTape() as g:
    g.watch(w)
    L = w**4

print(g.gradient(L, w))

---

We can use multiple tensor variables as input. This just means we calculate the sensitivity w.r.t. all the variables in the tensor.

For example, consider calculating the sensitivity of $L(w) = w[0]^2+w[1]^2$ at $w[0] = 1, w[1] = -3.$

---

In [None]:
w = tf.Variable([1, -3.0])
with tf.GradientTape() as g:
    L = tf.math.reduce_sum(w**2) # this means L = w[0]^2 + w[1]^2

print(w.numpy())
print(L.numpy()) # w[0]^2 + w[1]^2 = 1 + 9 = 10
print(g.gradient(L, w))  # (2w[0], 2w[1]) = (2,-6)

---

When `g.gradient` is called with a tensor dependent variable (tensor target), it returns the sum of the sensitivities for each component of the target variable.

---

In [None]:
w = tf.Variable(-1.)
with tf.GradientTape() as g:
    L = [2*w, w**4]

print([L[i].numpy() for i in range(2)]) # [-2,1]
print(g.gradient(L, w))  # 2 + 4w^3 = 2 - 4 = -2

---

By default, when we call `g.gradient`, all resources required to compute the gradient are released. This allows saving memory. However, there are cases when we want to call `g.gradient` several times, for example, to differentiate different chained functions. In that case, we must use the option `persistent=True`.

---

In [None]:
w1 = tf.Variable([1, -3.0])
with tf.GradientTape(persistent = True) as g:
    w2 = 2*w1
    L = w2**2

print(w1.numpy())
print(w2.numpy())
print(g.gradient(w2, w1))  # [2, 2]
print(g.gradient(L, w2))
del g # release resources

---

We can easily calculate sensitivies of functions and plot them.

---

In [None]:
w = tf.linspace(-10.0, 10.0, 129) # A tf.Tensor, not a tf.Variable

with tf.GradientTape() as g:
    g.watch(w)
    L = tf.math.tanh(w)

gradL_w = g.gradient(L, w)

plt.plot(w, L, label = 'L')
plt.plot(w, gradL_w, label = 'gradL_w')
plt.legend()
plt.xlabel('w');