# Image classification example: cats vs dogs

Learn how to use Keras and TensorFlow

In [None]:
import tensorflow as tf
from tensorflow import keras

Learn about:

- Tensors, variables, and gradients in TensorFlow
- Creating layers by subclassing the Layer class
- Writing low-level training loops
- Tracking losses created by layers via the add_loss() method
- Tracking metrics in a low-level training loop
- Speeding up execution with a compiled tf.function
- Executing layers in training or inference mode
- The Keras Functional API

## Tensors

TensorFlow is an infrastructure layer for differentiable programming. At its heart, it's a framework for manipulating N-dimensional arrays (tensors), much like NumPy.

However, there are three key differences between NumPy and TensorFlow:

1. TensorFlow can leverage hardware accelerators such as GPUs and TPUs.
2. TensorFlow can automatically compute the gradient of arbitrary differentiable tensor expressions.
3. TensorFlow computation can be distributed to large numbers of devices on a single machine, and large number of machines (potentially with multiple devices each).

In [None]:
constant_tensor = tf.constant([[1, 0], [0, 1]])
constant_tensor

You can get the tensor's values as a Numpy nd-array.

In [None]:
constant_tensor.numpy()

Find its type and shape the same way of Numpy arrays.

In [None]:
print(f'DType: {constant_tensor.dtype}')
print(f'Shape: {constant_tensor.shape}')

Constant tensors of 1s and 0s.

In [None]:
tf.ones(shape=(3, 2))


In [None]:
tf.zeros(shape=(2, 3))

Random uniform tensor.

In [None]:
tf.random.uniform(shape=(3, 1), minval=0., maxval=1.)

Random from normal distribution.

In [None]:
tf.random.normal(shape=(1, 3), mean=0., stddev=1.)

## Variables

Variables are special tensors used to store mutable state (such as the weights of a neural network). You create a variable using some initial value.

In [None]:
gaussian_tensor = tf.random.normal(shape=(1, 10))
variable = tf.Variable(gaussian_tensor)
print(variable)

To update a variable just use the <code>sub/add/assign</code> methods. Keep in mind that like arrays and matrixes the shape must be coherent to allow operations.

In [None]:
# assign new tensor to variable
new_gaussian_tensor = tf.random.normal(shape=(1, 10))
variable.assign(new_gaussian_tensor)
# verify the assign operation is true
for i in range(new_gaussian_tensor.shape[0]):
    for j in range(new_gaussian_tensor.shape[1]):
        assert new_gaussian_tensor[i, j] == new_gaussian_tensor[i, j]

# add tensor to variable
added_uniform_tensor = tf.random.uniform(shape=(1, 10))
variable.assign_add(added_uniform_tensor)
# verify the assign + add operation is true
for i in range(new_gaussian_tensor.shape[0]):
    for j in range(new_gaussian_tensor.shape[1]):
        assert variable[i, j] == new_gaussian_tensor[i, j] + added_uniform_tensor[i, j]

# subtract tensor to variable
subbed_uniform_tensor = tf.random.uniform(shape=(1, 10))
variable.assign_sub(subbed_uniform_tensor)
# verify the assign + add + sub operation is true
for i in range(new_gaussian_tensor.shape[0]):
    for j in range(new_gaussian_tensor.shape[1]):
        assert variable[i, j] == new_gaussian_tensor[i, j] + added_uniform_tensor[i, j] - subbed_uniform_tensor[i, j]

## Doing math in TensorFlow

If you've used NumPy, doing math in TensorFlow will look very familiar. The main difference is that your TensorFlow code can run on GPU and TPU.

In [None]:
a = tf.random.normal(shape=(2, 2))
b = tf.random.normal(shape=(2, 2))

# sum
c = a + b
# subtraction
d = a - b
# moltiplication
e = a * b
# division
f = a / b
# square
g = tf.square(c)
# exponential
h = tf.exp(c)

c, d, e, f, g, h

## Gradients

Here's another big difference with NumPy: you can automatically retrieve the gradient of any differentiable expression. Just open a GradientTape, start "watching" a tensor via <code>watch</code> method, and compose a differentiable expression using this tensor as input.

In [None]:
with tf.GradientTape() as tape:
    # start recording the history of operations applied to a
    tape.watch(a)  
    # do operations
    c = tf.sqrt(tf.square(a) + tf.square(b))  
    # what's the gradient of c with respect to a?
    dc_da = tape.gradient(c, a)
    print(dc_da)

By default, <b>variables are watched automatically</b>, so you don't need to manually watch them.

In [None]:
a = tf.Variable(a)
with tf.GradientTape() as tape:
    c = tf.sqrt(tf.square(a) + tf.square(b))
    dc_da = tape.gradient(c, a)
    print(dc_da)

Note that you can compute higher-order derivatives by nesting tapes.

In [None]:
with tf.GradientTape() as outer_tape:
    with tf.GradientTape() as tape:
        c = tf.sqrt(tf.square(a) + tf.square(b))
        dc_da = tape.gradient(c, a)
    d2c_da2 = outer_tape.gradient(dc_da, a)
    print(d2c_da2)

## Keras layers

While TensorFlow is an infrastructure layer for differentiable programming, dealing with tensors, variables, and gradients. Keras is a user interface for deep learning, dealing with layers, models, optimizers, loss functions, metrics, and more. Keras serves as the high-level API for TensorFlow: Keras is what makes TensorFlow simple and productive.

The <code>Layer</code> class is the fundamental abstraction in Keras. A layer encapsulates a state (weights) and some computation (defined in the call method).

In [None]:
class Linear(keras.layers.Layer):
    """Linear layer of the type:
        y = w.x + b"""

    def __init__(self, units=32, input_dim=32):
        super(Linear, self).__init__()
        w_init = tf.random_normal_initializer()
        self.w = tf.Variable(initial_value=w_init(shape=(input_dim, units), dtype="float32"), trainable=True)
        b_init = tf.zeros_initializer()
        self.b = tf.Variable(initial_value=b_init(shape=(units,), dtype="float32"), trainable=True)

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

You would use a Layer instance much like a Python function.

In [None]:
# instantiate layer
linear_layer = Linear(units=4, input_dim=2)

# the layer can be called as a function
y = linear_layer(tf.ones((2, 2)))
y

In [None]:
# assert the shape of the layer
assert y.shape == (2, 4)

The weight variables (created in <code>__init__</code>) are automatically tracked under the weights property.

In [None]:
assert linear_layer.weights == [linear_layer.w, linear_layer.b]

In [None]:
linear_layer.weights

You have many built-in layers available, from Dense to Conv2D to LSTM to fancier ones like Conv3DTranspose or ConvLSTM2D. Be smart about reusing built-in functionality.

## Layer weight creation

The <code>self.add_weight()</code> method gives you a shortcut for creating weights.



In [None]:
class Linear(keras.layers.Layer):
    """Linear layer of the type:
        y = w.x + b"""

    def __init__(self, units=32):
        super(Linear, self).__init__()
        self.units = units

    def build(self, input_shape):
        self.w = self.add_weight(shape=(input_shape[-1], self.units), initializer="random_normal", trainable=True)
        self.b = self.add_weight(shape=(self.units,), initializer="random_normal", trainable=True)

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

In [None]:
# initialise
linear_layer = Linear(units=4)

# this will also call "build" and create weights
y = linear_layer(tf.ones(shape=(2, 2)))
y

## Layer gradients

You can automatically retrieve the gradients of the weights of a layer by calling it inside a GradientTape. Using these gradients, you can update the weights of the layer, either manually, or using an optimizer object. Of course, you can modify the gradients before using them, if you need to.

In [None]:
# prepare a dataset
(x_train, y_train), _ = tf.keras.datasets.mnist.load_data()
dataset = tf.data.Dataset.from_tensor_slices(tensors=(x_train.reshape(60000, 784).astype("float32")/255, y_train))
dataset = dataset.shuffle(buffer_size=1024).batch(64)

# instantiate our linear layer (defined above) with 10 units
linear_layer = Linear(10)
# instantiate a logistic loss function that expects integer targets
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# instantiate an optimizer
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)

# iterate over the batches of the dataset
for step, (x, y) in enumerate(dataset):
    # open a GradientTape
    with tf.GradientTape() as tape:
        # forward pass
        logits = linear_layer(x)
        # loss value for this batch
        loss = loss_fn(y, logits)

    # get gradients of the loss wrt the weights
    gradients = tape.gradient(loss, linear_layer.trainable_weights)
    # update the weights of our linear layer
    optimizer.apply_gradients(zip(gradients, linear_layer.trainable_weights))

    # logging
    if step % 100 == 0:
        print("Step:", step, "Loss:", float(loss))


## Trainable and non-trainable weights

Weights created by layers can be either trainable or non-trainable. They're exposed in trainable_weights and non_trainable_weights respectively. Here's a layer with a non-trainable weight.

In [None]:
class ComputeSum(keras.layers.Layer):
    """Returns the sum of the inputs."""

    def __init__(self, input_dim):
        super(ComputeSum, self).__init__()
        # create a non-trainable weight
        self.total = tf.Variable(initial_value=tf.zeros((input_dim,)), trainable=False)

    def call(self, inputs):
        self.total.assign_add(tf.reduce_sum(inputs, axis=0))
        return self.total

In [None]:
# initialise
my_sum = ComputeSum(2)
x = tf.ones((2, 2))

y = my_sum(x)
print(y.numpy())  # [2. 2.]

y = my_sum(x)
print(y.numpy())  # [4. 4.]

In [None]:
# assert weights
assert my_sum.weights == [my_sum.total]
assert my_sum.non_trainable_weights == [my_sum.total]
assert my_sum.trainable_weights == []

## Layers that own layers

Layers can be recursively nested to create bigger computation blocks. Each layer will track the weights of its sublayers (both trainable and non-trainable).

In [None]:
# let's reuse the Linear class with a `build` method that we defined above.
class MLP(keras.layers.Layer):
    """Simple stack of Linear layers."""

    def __init__(self):
        super(MLP, self).__init__()
        self.linear_1 = Linear(32)
        self.linear_2 = Linear(32)
        self.linear_3 = Linear(10)

    # stacking of the 3 layers
    def call(self, inputs):
        x = self.linear_1(inputs)
        x = tf.nn.relu(x)
        x = self.linear_2(x)
        x = tf.nn.relu(x)
        return self.linear_3(x)

In [None]:
# initialise
mlp = MLP()
# the first call to the `mlp` object will create the weights
y = mlp(tf.ones(shape=(3, 64)))
y

In [None]:
# weights are recursively tracked
assert len(mlp.weights) == 6

Note that our manually-created MLP above is equivalent to the following built-in option.

In [None]:
mlp = keras.Sequential([keras.layers.Dense(32, activation=tf.nn.relu), keras.layers.Dense(32, activation=tf.nn.relu), keras.layers.Dense(10)])
mlp