Prerequisites:
* python3


# Differentiate functions

TensorFlow offers **Automatic Differentiation Mechanism** to differentiate functions. We can use `tf.GradientTape()` to get the slope of a multivariable function, $L(w,b)=\|Xw+b-y\|^2$ at $w=(1,2)^T, b=2$.

In [8]:
import tensorflow as tf
tf.enable_eager_execution()

X = tf.constant([[1., 2.], [3., 4.]])
y = tf.constant([[1.], [2.]])
# initialize variable 'w' to [1.] [2.] with a float32 dtype
# initialize variable 'b' to 2. with a float32 dtype
w = tf.get_variable('w', shape=[2, 1], initializer=tf.constant_initializer([[1.], [2.]]))
b = tf.get_variable('b', shape=[1], initializer=tf.constant_initializer([2.]))
with tf.GradientTape() as tape:
    L = 0.5 * tf.reduce_sum(tf.square(tf.matmul(X, w) + b - y))
w_grad, b_grad = tape.gradient(L, [w, b])        # Differentiate L with respect to w and b.
print([L.numpy(), w_grad.numpy(), b_grad.numpy()])

[78.5, array([[39.],
       [56.]], dtype=float32), array([17.], dtype=float32)]


# Gradient descent 
## Numpy version

In [13]:
import numpy as np
a, b = 0, 0
X = np.asarray([[1., 2.], [3., 4.]])
y = np.asarray([[1.], [2.]])
num_epoch = 10000
learning_rate = 1e-3
for e in range(num_epoch):
    # Calculate the gradient of the loss function with respect to arguments (model parameters) manually.
    y_pred = a * X + b
    grad_a, grad_b = (y_pred - y).dot(X), (y_pred - y).sum()

    # Update parameters.
    a, b = a - learning_rate * grad_a, b - learning_rate * grad_b

print(a, b)

[[  -296.79894979  -4676.98496935]
 [-10408.63468162   1139.76860426]] 7168.217266327967


## TensorFlow version

As seen from the Numpy version, it can be inevitable to differentiate functions manually or update parameters based on the gradients manually. 

Tensorflow offers a series of critical functions including GPU optimization, automatic differentiation and optimizers, etc., to address the aforementioned pain points.

In [20]:
X = tf.constant([[1., 2., 3.], [4., 5., 6.]])
y = tf.constant([[1.], [2.]])

W = tf.get_variable('W', dtype=tf.float32, shape=[], initializer=tf.zeros_initializer)
b = tf.get_variable('b', dtype=tf.float32, shape=[], initializer=tf.zeros_initializer)
variables = [W, b]

num_epoch = 10000
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-3)
for e in range(num_epoch):
    # Use tf.GradientTape() to record the gradient info of the loss function
    with tf.GradientTape() as tape:
        y_pred = W * X + b
        loss = tf.reduce_mean(tf.square(y_pred - y))
    # calculates the gradients of the loss function with respect to each argument (model paramter) automatically.
    grads = tape.gradient(loss, variables)
    # updates parameters automatically based on gradients.
    optimizer.apply_gradients(grads_and_vars=zip(grads, variables))

print(W.numpy(), b.numpy())

0.26025155 0.5866921


A cleaner version using Python OOP

In [22]:
X = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
y = tf.constant([[10.0], [20.0]])


class Linear(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.dense = tf.keras.layers.Dense(units=1, kernel_initializer=tf.zeros_initializer(),
            bias_initializer=tf.zeros_initializer())

    def call(self, input):
        output = self.dense(input)
        return output


# The structure of the following codes is similar to the previous one.
model = Linear()
num_epoch = 10000
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
for i in range(num_epoch):
    with tf.GradientTape() as tape:
        y_pred = model(X)      # Call the model.
        loss = tf.reduce_mean(tf.square(y_pred - y))
    grads = tape.gradient(loss, model.variables)
    optimizer.apply_gradients(grads_and_vars=zip(grads, model.variables))
print(model.variables)

[<tf.Variable 'linear_1/dense_1/kernel:0' shape=(3, 1) dtype=float32, numpy=
array([[4.8735565e-06],
       [1.1111153e+00],
       [2.2222164e+00]], dtype=float32)>, <tf.Variable 'linear_1/dense_1/bias:0' shape=(1,) dtype=float32, numpy=array([1.1111081], dtype=float32)>]


## TensorFlow version2 

0.4002787 0.49918035


In [6]:
y_grad

<tf.Tensor: id=18, shape=(1,), dtype=float32, numpy=array([6.], dtype=float32)>