<a href="https://colab.research.google.com/github/AnkurMali/IST597_Spring_2022/blob/main/IST597_week3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial IST597 (Spring 2022):- Intro to Eager Execution

# Enabling Eager Execution 
In version 2.0 and above eager execution is set TRUE by default. For all other versions $<1.7$ (if working on server with outdated tf version) enable using tf.enable_eager_execution() 

In [1]:
import tensorflow as tf

# tf.enable_eager_execution() # Only use if TF < 2.0

Check if eager execution is enabled or not

In [2]:
tf.executing_eagerly()
tf.__version__

'2.7.0'

# Executing tf Ops Eagerly 
More pythonic : Since by perfoming operations we can see the output directly.
No Session or sess.run(operation)

In [3]:
x = [[2.]]
m = tf.square(x)
print(m)

tf.Tensor([[4.]], shape=(1, 1), dtype=float32)


Also can call `.numpy` to retrieve the results of the tensor as a numpy array (Useful for people who are familiar with pytorch or numpy).

In [4]:
m.numpy()

array([[4.]], dtype=float32)

compute an operation including two tensors 

In [5]:
a = tf.constant([[1, 2],
                 [3, 4]])

b = tf.constant([[2, 1],
                 [3, 4]])

ab = tf.matmul(a, b)

print('a * b = \n', ab.numpy())

a * b = 
 [[ 8  9]
 [18 19]]


# Constants and Variables [Try to understand the difference between two]


*   `tf.constant`, creates a constant tensor populated with the values as argument. The values are immutable. 
*   `tf.Variable `, this method encapsultes a mutable tensor that can be changed later using assign. 
(From official tensorflow documentation.)


Create a constant tensor 

In [6]:
a = tf.constant([[2,3]])
print(a)

tf.Tensor([[2 3]], shape=(1, 2), dtype=int32)


As we discussed constant tensor is immutable so we cannot assign a new value to it. Let's see an example for this

In [7]:
try:
  a.assign([[3,4]])
except:
  print('Exception raised trying to change immutable tensor ')

Exception raised trying to change immutable tensor 


On the other hand variables are mutable and can be assigned a new value

In [8]:
v = tf.Variable(5.)

print('previous value v =', v.numpy())
v.assign(2.)
print('Current value v =', v.numpy())

previous value v = 5.0
Current value v = 2.0


increment/decrement the value of a tensor 

In [9]:
v.assign(2.)
print('value     : ', v.numpy())
print('increment : ', tf.math.add(v, 1).numpy())
print('decrement : ', tf.math.subtract(v, 1).numpy())

value     :  2.0
increment :  3.0
decrement :  1.0


In [10]:
v2 = tf.Variable(15.)
v2.assign(2.)
print('value     : ', v2.numpy())
print('increment : ', tf.compat.v1.assign_add(v2, 1).numpy())
print('decrement : ', tf.compat.v1.assign_sub(v2, 1).numpy())

value     :  2.0
increment :  3.0
decrement :  2.0


You can return many information from a tensor variable same as numpy, like name, type, shape and system device function is executed on. 

In [11]:
print('name  : ', v.name)
print('type  : ', v.dtype)
print('shape : ', v.shape)
print('device: ', v.device)

name  :  Variable:0
type  :  <dtype: 'float32'>
shape :  ()
device:  /job:localhost/replica:0/task:0/device:GPU:0


# Gradient Evaluation[Imp Concept]

Gradient evaluation is important in evaluating our deep learning model. It based on function optimization and will provide true gradients for your model. You can use `tf.GradientTape()` method to record the gradient of any valid arbitrary function

In [12]:
w = tf.Variable(2.0)

#watch the gradient of the loss operation
with tf.GradientTape() as tape:
  loss = w * w

grad = tape.gradient(loss, w)
print(f'The gradient of w^2 at {w.numpy()} is {grad.numpy()}')

The gradient of w^2 at 2.0 is 4.0


We can also compute the gradient of the function using tape. In this example we evaluate the gradient of the sigmoid function 

$$f(x) = \frac{1}{1+e^{-x}}$$

Note that 

$$f'(x) = \frac{e^{-x}}{(1+e^{-x})^2} = f(x)(1-f(x)) $$

In [13]:
w = tf.Variable(2.0)
z = 1/(1 + tf.exp(-2.0))
print(z) # Print value of your function
@tf.function
def sigmoid(x):
  return 1/(1 + tf.exp(-x))
print(tf.math.sigmoid(2.0)) # Check with inbuilt function
with tf.GradientTape(persistent=True) as tape:
  sigmoid_value = sigmoid(w)
grad_sigmoid = tape.gradient(sigmoid_value, w)
print('The gradient of the sigmoid function at 2.0 is ', grad_sigmoid.numpy())

tf.Tensor(0.880797, shape=(), dtype=float32)
tf.Tensor(0.880797, shape=(), dtype=float32)
The gradient of the sigmoid function at 2.0 is  0.104993574


You can also compute higher order derivatives by nesting a gradient functions or gradient tape. For instance, 

$$f(x) = \log(x) , f'(x) = \frac{1}{x}, f''(x) = \frac{-1}{x^2}$$

In [14]:
x = tf.Variable(1.0)
@tf.function
def log(x):
  return tf.math.log(x)
with tf.GradientTape(persistent=True) as tape3:
  with tf.GradientTape(persistent=True) as tape2:
    with tf.GradientTape(persistent=True) as tape1:
      dx = log(x)
    dx_log = tape1.gradient(dx, x)
  dx2_log = tape2.gradient(dx_log, x )
dx3_log = tape3.gradient(dx2_log, x)

print('The first  derivative of log at x = 1 is ', dx_log.numpy())
print('The second derivative of log at x = 1 is ', dx2_log.numpy())
print('The third  derivative of log at x = 1 is ', dx3_log.numpy())

The first  derivative of log at x = 1 is  1.0
The second derivative of log at x = 1 is  -1.0
The third  derivative of log at x = 1 is  2.0


# Custom Gradients

Some times the gradient is not what we want espeically if there is a problem in numerical instabilitiy. Consider the following function and its gradient 

$$f(x) = \log(1+e^x)$$

The gradient is 

$$f'(x) = \frac{e^x}{1+e^x}$$

Note that at big values of $x$ the gradient value will blow up.

In [15]:
x = tf.Variable(1.0)
x1 = tf.Variable(100.0)
@tf.function
def logexp(x):
  return tf.math.log(1 + tf.exp(x))
with tf.GradientTape(persistent=True) as tape:
  grad_value = logexp(x)
  grad_value2 = logexp(x1)
grad_logexp = tape.gradient(grad_value, x)
grad_logexp2 = tape.gradient(grad_value2, x1)
print('The gradient at x = 0  is ', grad_logexp.numpy())  

print('The gradient at x1 = 100 is ', grad_logexp2.numpy()) 

The gradient at x = 0  is  0.7310586
The gradient at x1 = 100 is  nan


 We can revaluate the gradient by overriding the gradient of the function. We can recompute the gradient as 

$$f(x) =  \frac{1+e^x -e^x }{1+e^x} = 1 - \frac{1}{1 + e^{x}}$$

In [16]:
x = tf.Variable(1.0)
x1 = tf.Variable(100.0)
@tf.custom_gradient
def logexp_stable(x):
  e = tf.exp(x)
  #dy is optional, allows computation of vector jacobian products for vectors other than the vector of ones.
  def grad(dy):
    return dy * (1 - 1 / (1 + e))
  return tf.math.log(1 + e), grad
with tf.GradientTape(persistent=True) as tape:
  grad_value = logexp_stable(x)
  grad_value2 = logexp_stable(x1)
grad_logexp_stable = tape.gradient(grad_value, x)
grad_logexp_stable1 = tape.gradient(grad_value2, x1)

print('The gradient at x = 1 is ', grad_logexp_stable.numpy()) 
print('The gradient at x1 = 100 is ', grad_logexp_stable1.numpy()) 

The gradient at x = 1 is  0.7310586
The gradient at x1 = 100 is  1.0
