# Chapter 12: Custom Models and Training with Tensorflow

In [6]:
import numpy as np
import tensorflow as tf
from tensorflow import keras

## 12.1 A Quick Tour of Tensorflow

- Similar to NumPy but with GPU support.
- Supports distributed computing.
- Includes a just-in-time (JIT) compiler that allows it to optimize computations for speed and memory usage.
- Computation graphs can be exported to a portable format.
- Implements autodiff and provides some excellent optimizers.

## 12.2 Using TensorFlow like NumPy

**TensorFlow** - API revolves around **tensors**, which flow from operation to operation.

**Tensor** - Very similar to NumPy `ndarray`: it is usually a multidimensional array, but can also hold a scalar.

### 12.2.1 Tensors and Operations

Create a tensor with `tf.constant()`.

In [None]:
tf.constant([[1., 2., 3.], [4., 5., 6.]]) # matrix

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

In [None]:
t = tf.constant([[1., 2., 3.], [4., 5., 6.]])
t.shape

TensorShape([2, 3])

In [None]:
t.dtype

tf.float32

In [None]:
# Indexing similar to NumPy
t[:, 1:]

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[2., 3.],
       [5., 6.]], dtype=float32)>

In [None]:
t[..., 1, tf.newaxis] # ... = Access all unspecified elements

<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[2.],
       [5.]], dtype=float32)>

In [None]:
t + 10

# Python calls t.__add__(10)
# Which calls tf.add(t, 10)

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[11., 12., 13.],
       [14., 15., 16.]], dtype=float32)>

In [None]:
tf.square(t)

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)>

In [None]:
t @ tf.transpose(t)

# TensorFlow creates a new tensor object for transpose
# Cannot do NumPy's t.T

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[14., 32.],
       [32., 77.]], dtype=float32)>

> #### Keras' Low-Level API

> Keras API has its own low-level API, located in `keras.backend`. In `tf.keras`, these functions generally just call the corresponding TensorFlow operations. But if you want to write code that will be portable to other Keras implementations, you should use these Keras functions.

In [None]:
from tensorflow import keras

In [None]:
K = keras.backend
K.square(K.transpose(t)) + 10

<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[11., 26.],
       [14., 35.],
       [19., 46.]], dtype=float32)>

### 12.2.2 Tensors and NumPy

You can create a tensor from a NumPy array, and vice versa. You can even apply TensorFlow operations to NumPy arrays and NumPy operations to tensors.

In [None]:
a = np.array([2., 4., 5.])
tf.constant(a)

<tf.Tensor: shape=(3,), dtype=float64, numpy=array([2., 4., 5.])>

In [None]:
t.numpy() # or np.array(t)

array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)

In [None]:
tf.square(a)

<tf.Tensor: shape=(3,), dtype=float64, numpy=array([ 4., 16., 25.])>

In [None]:
np.square(t)

array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)

### 12.2.3 Type Conversions

Type conversions can significantly hurt performance. To avoid this, TensorFlow does not perform any type conversions automatically; it just raises an exception if you try to execute an operation on tensors with incompatible types.

In [None]:
tf.constant(2.) + tf.constant(40) # Cannot add float and integer tensors

InvalidArgumentError: ignored

In [None]:
tf.constant(2.) + tf.constant(40., dtype=tf.float64) # Cannot add 32-bit float and 64-bit float tensors

InvalidArgumentError: ignored

In [None]:
t2 = tf.constant(40., dtype=tf.float64)
tf.constant(2.0) + tf.cast(t2, tf.float32) # Use tf.cast() to convert types

<tf.Tensor: shape=(), dtype=float32, numpy=42.0>

### 12.2.4 Variables

`tf.Tensor` values are immutable: you cannot modify them.

Not helpful as weights in neural networks since they need to be tweaked by backpropagation.

Use `tf.Variable`.

In [None]:
v = tf.Variable([[1., 2., 3.], [4., 5., 6.]])
v

<tf.Variable 'Variable:0' shape=(2, 3) dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

A `tf.Variable` acts much like a `tf.Tensor` but it can also be modified in place using the `assign()` method.

In [None]:
v.assign(2 * v) # Mutates v

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]], dtype=float32)>

In [None]:
v[0, 1].assign(42)

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2., 42.,  6.],
       [ 8., 10., 12.]], dtype=float32)>

In [None]:
v[:, 2].assign([0., 1.])

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2., 42.,  0.],
       [ 8., 10.,  1.]], dtype=float32)>

In [None]:
# Assign/update specific indices with specific values
v.scatter_nd_update(indices=[[0, 0], [1, 2]], updates=[100., 200.])

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[100.,  42.,   0.],
       [  8.,  10., 200.]], dtype=float32)>

### 12.2.5 Other Data Structures

**Sparse tensors** (`tf.SparseTensor`): Efficiently represent tensors containing mostly 0s.

**Tensor arrays** (`tf.TensorArray`): Lists of tensors. All tensors contained must have the same shape and data type.

**Ragged tensors** (`tf.RaggedTensor`): Represent static lists of lists of tensors, where every tensor has the same shape and data type.

**String tensors**: Regular tensors of type `tf.string`.
- These represent byte strings, not Unicode strings.
- `tf.string` is atomic, meaning that its length does not appear in the tensor's shape.
- Once you convert it to a Unicode tensor, then the length appears in the shape.

**Sets**: Represented as regular tensors (or sparse tensors).
- `tf.constant([[1, 2], [3, 4]])` represents 2 sets [1, 2] and [3, 4].

**Queues**: Store tensors across multiple steps, in `tf.queue` package.
- First In, First Out (FIFO) queues, "`FIFOQueue`"
- Queues that can prioritize some items, "`PriorityQueue`"
- Shuffle the items, "`RandomShuffleQueue`"
- Batch items of different shapes by padding, "`PaddingFIFOQueue`"

## 12.3 Customizing Models and Training Algorithms

### 12.3.1 Custom Loss Functions

Let's imagine implementing the Huber loss.

> Note: Always try to use vectorized implementation for better performance. To benefit from TensorFlow's graph feature, you should only use TensorFlow operations.

In [4]:
# FROM TEXTBOOK NOTEBOOK

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()
X_train_full, X_test, y_train_full, y_test = train_test_split(
    housing.data, housing.target.reshape(-1, 1), random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(
    X_train_full, y_train_full, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_valid_scaled = scaler.transform(X_valid)
X_test_scaled = scaler.transform(X_test)

Downloading Cal. housing from https://ndownloader.figshare.com/files/5976036 to /root/scikit_learn_data


In [3]:
def huber_fn(y_true, y_pred):
    error = y_true - y_pred
    is_small_error = tf.abs(error) < 1
    squared_loss = tf.square(error) / 2
    linear_loss = tf.abs(error) - 0.5
    return tf.where(is_small_error, squared_loss, linear_loss)

In [7]:
# FROM TEXTBOOK NOTEBOOK

input_shape = X_train.shape[1:]

model = keras.models.Sequential([
    keras.layers.Dense(30, activation="selu", kernel_initializer="lecun_normal",
                       input_shape=input_shape),
    keras.layers.Dense(1),
])

In [8]:
model.compile(loss=huber_fn, optimizer="nadam")
# From textbook notebook
model.fit(X_train_scaled, y_train, epochs=2,
          validation_data=(X_valid_scaled, y_valid))

Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x7fe97c116b50>

### 12.3.2 Saving and Loading Models That Contain Custom Components

When you load a model containing custom objects, you need to map the names to the objects.

In [9]:
# From textbook notebook
model.save("my_model_with_a_custom_loss.h5")

model = keras.models.load_model("my_model_with_a_custom_loss.h5",
                                custom_objects={"huber_fn": huber_fn})

In [11]:
# Function that creates a configured loss function
def create_huber(threshold=1.0):
    def huber_fn(y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < threshold
        squared_loss = tf.square(error) / 2
        linear_loss = threshold * tf.abs(error) - threshold**2 / 2
        return tf.where(is_small_error, squared_loss, linear_loss)
    return huber_fn

model.compile(loss=create_huber(2.0), optimizer="nadam")

When you save the model, the `threshold` will not be saved. This means that you will have to specify the `threshold` value when loading the model.

In [12]:
# From textbook notebook
model.save("my_model_with_a_custom_loss_threshold_2.h5")

model = keras.models.load_model("my_model_with_a_custom_loss_threshold_2.h5",
                                custom_objects={"huber_fn": create_huber(2.0)})

By creating a subclass of `keras.losses.Loss` and implementing its `get_config()` method, you can solve this problem of having to specify the `threshold` value.

In [13]:
class HuberLoss(keras.losses.Loss):
    def __init__(self, threshold=1.0, **kwargs):
        self.threshold = threshold
        super().__init__(**kwargs)
    def call(self, y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < self.threshold
        squared_loss = tf.square(error) / 2
        linear_loss = self.threshold * tf.abs(error) - self.threshold**2 / 2
        return tf.where(is_small_error, squared_loss, linear_loss)
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "threshold": self.threshold}

Code explanation:

1. Constructor (`__init__`) accepts `**kwargs` and passes them to the parent constructor (`super().__init__`), which handles standard hyperparameters.
    - Note: `**kwargs` stands for unpacking (`**`) the keyword arguments dictionary (`kwargs`).

2. The `call()` method takes the labels and predictions, computes all the instance losses, and returns them.
    - Exact same as `huber_fn` from above.

3. The `get_config()` method returns a dictionary mapping each hyperparameter name to its value.
    - First calls the parent class's `get_config()` method (`super().get_config()`).
    - Then adds the new hyperparameters to this dictionary.
    - Note: `**base_config` unpacks the dictionary.

In [14]:
model.compile(loss=HuberLoss(2.), optimizer="nadam")

In [15]:
# From textbook notebook
model.save("my_model_with_a_custom_loss_class.h5")

model = keras.models.load_model("my_model_with_a_custom_loss_class.h5",
                                custom_objects={"HuberLoss": HuberLoss})

### 12.3.3 Custom Activation Functions, Initializers, Regularizers, and Constraints

### 12.3.4 Custom Metrics

### 12.3.5 Custom Layers

### 12.3.6 Custom Models

### 12.3.7 Losses and Metrics Based on Model Internals

### 12.3.8 Computing Gradients Using Autodiff

### 12.3.9 Custom Training Loops

## 12.4 TensorFlow Functions and Graphs

### 12.4.1 AutoGraph and Tracing

### 12.4.2 TF Function Rules