In [90]:
import tensorflow as tf
import numpy as np

# A Quick Tour of Tensorflow

Here's a summary of what TensorFlow has to offer:

1. Its core is very similar to NumPy, but with GPU support
2. It supports distributed computing (across multiple devices and servers)
3. It includes a kind of just-in-time (JIT) compiler that allows it to optimize computations for speed and memory usage. It works by extracting the *computation graph* from a Python function, then optimizing it (e.g. by pruning unused nodes), and finally running it efficiently (e.g. by automatically running independent operations in parallel.)
4. Computation graphs can be exported to a portable format so you can train a TensorFlow model in on environment (e.g. using Python on Linux) and run it in another (e.g. using Java on an Android device)
5. It implements autodiff (see Chapter 10 and Appendix D) and provides some excellent optimizers, such as RMSProp and Nadam (see Chapter 11), so you can easily minimize all sorts of loss functions.

TensorFlow offers many more features built on top of these core features: the most important is of course tf.keras, but it also has data loading and preprocessing ops, image processing ops, signal processing ops, and more. 

As you may know, GPUs can dramatically speed up computations by splitting them into many smaller chunks and running them in parallel across many GPU threads. TPUs are even faster: they are custom ASIC chips built specifically for Deep Learning operations.

There's even more to the TensorFlow library:
1. TensorBoard - for visualization
2. TensorFlow Extended (TFX) - a set of libraries built by Google to productionize TensorFlow projects. It includes tools for data validation, preprocessing, model analysis, and serving.
3. TensorFlow Hub - provides a way to easily download and reuse pretrained neural networks. You can also get many neural network architectures, some of them pretrained, in TensorFlows *model garden*
4. TensorFlow Resources - contains TensorFlow-based projects. You will find hundreds of TensorFlow projects on GitHub, so it is often easy to find existing coded for whatever you are trying to do.

More and more ML papers are released along with their implementations, and sometimes even with pretrained models. Check out https://paperswithcode.com/ to easily find them

## Using Tensorflow like NumPy

TensorFlow's API revolves around tensors. A tensor is usually a multidimensional array, b ut it can also hold a scaler. Let's see how to create and manipulate them

### Tensors and Operations

In [77]:
tf.constant([[1., 2., 3.], [4., 5., 6]]), tf.constant(42)

(<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
 array([[1., 2., 3.],
        [4., 5., 6.]], dtype=float32)>,
 <tf.Tensor: shape=(), dtype=int32, numpy=42>)

In [79]:
# Just like an ndarray, a tf.Tensor has a shape and a data type
t = tf.constant([[1., 2., 3.], [4., 5., 6]])
t.shape, t.dtype

(TensorShape([2, 3]), tf.float32)

In [81]:
# Indexing works much like in Numpy
t[:, 1:]

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[2., 3.],
       [5., 6.]], dtype=float32)>

In [83]:
t[..., 1, tf.newaxis]

<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[2.],
       [5.]], dtype=float32)>

In [85]:
# More importantly, all sorts of tensor operations are available
t + 10, tf.square(t), t @ tf.transpose(t)

(<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
 array([[11., 12., 13.],
        [14., 15., 16.]], dtype=float32)>,
 <tf.Tensor: shape=(2, 3), dtype=float32, numpy=
 array([[ 1.,  4.,  9.],
        [16., 25., 36.]], dtype=float32)>,
 <tf.Tensor: shape=(2, 2), dtype=float32, numpy=
 array([[14., 32.],
        [32., 77.]], dtype=float32)>)

You will find all the basic math operations you need (tf.add(), tf.multiply(), tf.square(), tf.exp(), tf.sqrt(), etc.) and most operations that you can find in Numpy (e.g. tf.reshape(), tf.squeeze(), tf.tile()). Some functions have a different name than Numpy; for instance, tf.reduce_mean(), tf.reduce_sum(), tf.reduce_max(), and tf.math.log() are the equivalent of np.mean(), np.sum(), np.max() and np.log().

When the name differs, there is often a good reason for it. For example, in TensorFlow you must write tf.transpose(t); you cannot use write t.T like in NumPy. The reason is that the tf.transpose() function does not do exactly the same thing as Numpy's T attribute: in TensorFlow, a new tensor is created with its own copy of the transposed data, whlie in Numpy, t.T is just a transposed view of the same data. 

Similarly, the tf.reduce_sum() operation is named this way because its GPU kernel (i.e. GPU implementation) uses a reduce algorithm that does not guarantee the order in which the elements are added: because 32-bit floats have limited precision, the result may change ever so slightly every time you call this operation.

### Tensors and NumPy

In [94]:
# Tensors play nice with Numpy
a = np.array([2., 4., 5.])
tf.constant(a), t.numpy()

(<tf.Tensor: shape=(3,), dtype=float64, numpy=array([2., 4., 5.])>,
 array([[1., 2., 3.],
        [4., 5., 6.]], dtype=float32))

In [96]:
tf.square(a), np.square(t)

(<tf.Tensor: shape=(3,), dtype=float64, numpy=array([ 4., 16., 25.])>,
 array([[ 1.,  4.,  9.],
        [16., 25., 36.]], dtype=float32))

Notice that NumPy uses 64-bit precision by default, while TensorFlow uses 32-bit. This is because 32-bit precision is generally more than enough for neural networks, plus it runs faster and uses less RAM. So when you create a tensor from a NumPy array, make sure to set dtype=tf.float32

### Type Conversions

Type conversions can significantly hurt performance, and they can easily go unnoticed when they are done automatically. To avoid this, TensorFlow does not perform any type conversions automatically: it just raises an exception if you try to execute an operation on tensors with incompatible types. This may be a bit annoying at first, but remember that it's for a good cause! And of course you can use tf.cast() when you really need to convert types.

In [103]:
# Example of Exception
# tf.constant(2.) + tf.constant(40)

' InvalidArgumentError: cannot compute AddV2 as input #1(zero-based) was expected to be a float tensor but is a int32 tensor [Op:AddV2]'

' InvalidArgumentError: cannot compute AddV2 as input #1(zero-based) was expected to be a float tensor but is a int32 tensor [Op:AddV2]'

In [101]:
# Example casting variables as the correct type
t2 = tf.constant(40., dtype=tf.float64)
tf.constant(2.0) + tf.cast(t2, tf.float32)

<tf.Tensor: shape=(), dtype=float32, numpy=42.0>

## Variables

The tf.Tensor values we've seen so far are immutable: you cannot modify them. For mutable tf.Tensor values we need tf.Variable. A tf.Variable acts much like a tf.Tensor: you can perform the same operations with it, it plays nicely with NumPy as well, and it is just as picky with types. But it can also be modified in place using the assign() method (or assign_add() or assign_sub(), which increment or decrement the variable by the given value).

In practice you will rarely have to create variables manually, since Keras provides an add_weight() method that will take care of it for you, as we will see. Moreover, model parameters will generally be updated directly by the optimizers, so you will rarely need to update variables manually.

In [106]:
# Variable examples
v = tf.Variable([[1., 2., 3.], [4., 5., 6.]])
v

<tf.Variable 'Variable:0' shape=(2, 3) dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

In [110]:
v.assign(2 * v)

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]], dtype=float32)>

In [112]:
v[0, 1].assign(42)

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2., 42.,  6.],
       [ 8., 10., 12.]], dtype=float32)>

In [114]:
v[:, 2].assign([0., 1.])

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2., 42.,  0.],
       [ 8., 10.,  1.]], dtype=float32)>

In [116]:
v.scatter_nd_update(
    indices=[[0, 0], [1, 2]],
    updates=[100., 200.]
)

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[100.,  42.,   0.],
       [  8.,  10., 200.]], dtype=float32)>

## Other Data Structures

TensorFlow supports several other data structures, including the following:

1. Sparse tensors (tf.SparseTensor)
> Efficiently represent tensors containing mostly zeros. The tf.sparse package contains operations for sparse tensors.

2. Tensor arrays (tf.TensorArray)
> Are lists of tensors. They have a fixed size by default but can optionally be made dynamic. All tensors they contain must have the same shape and data type.

3. Ragged tensors (tf.RaggedTensor)
> Represent static lists of lists of tensors, where every tensor has the same shape and data type. The tf.ragged package contains operations for ragged tensors.

4. String tensors
> Are regular tensors of type tf.string. These represent byte strings, not Unicode strings, so if you create a string tensor using a Unicode string (e.g., a regular Pythong 3 string like "coffee"), then it will get encoded to UTF-8 automatically. Alternatively, you can represent Unicode strings using tensors of type tf.int32, where each item represents a Unicode code point (e.g., [99, 97, 102, 233]). The tf.strings package (with an s) contains ops for byte strings and Unicode strings (and to convert one into the other). It's important to note that a tf.string is atomic, meaning that its length does not appear in the tensor's shape. Once you convert it to a Unicode tensor (i.e. a tensor of type tf.int32 holding Unicode code points), the length appears in the shape.

5. Sets
> Are represented as regular tensors (or sparse tensors). For example, tf.constant([[1, 2], [3, 4]]) represents the two sets {1, 2} and {3, 4}. More generally, each set is represented by a vector in the tensor's last axis. You can manipulate sets using operations from the tf.sets package.

6. Queues
> Store tensors across multiple steps. TensorFlow offers various kinds of queues, these classes are all in the tf.queue package:
>    1. Simple First In, First Out (FIFO) queues (FIFOQueue)
    2. Queues that can prioritize some items (PriorityQueue)
    3. Queues that shuffle their items (RandomShuffleQueue)
    4. Queus that batch items of different shapes by padding (PaddingFIFOQueue)

# Customizing Models and Training Algorithms

## Custom Loss Functions

Suppose you want to train a regression model, but your training set is a bit noisy. Of course, you start by trying to clean up your dataset by removing or fixing the outliers, but that turns out to be insufficient; the dataset is still noisy. Which loss function should you use? The mean squared error might penalize large errors too much and cause your model to be imprecise. The mean absolute error would not penalize outliers as much, but training might take a while to converge, and the trained model might not be very precise. This is probably a good time to use the Huber loss (introduced in Chapter 10) instead of the good old MSE. The Huber loss is not currently part of the official Keras API, but it is available in tf.keras (just use an instance of the keras.losses.Huber class). But let's pretend it's not there: implementing it is easy as pie! Just create a function that takes the labels and predictions as arguments, and use TensorFlow operations to compute every instance's loss:

In [140]:
def huber_fn(y_true, y_pred):
    error = y_true - y_pred
    is_small_error = tf.abs(error) < 1
    squared_loss = tf.square(error) / 2
    linear_loss = tf.abs(error) - 0.5
    return tf.where(is_small_error, squared_loss, linear_loss)

' For best performance you should always vectorize implementations, as in this example. Moreover, if you want to benefit from TensorFlows graph features, you should use only TensorFlow operations'

# To use with a model
# model.compile(loss=huber_fn, optimizer='nadam')
# model.fit(X_train, y_train, [...])

' For best performance you should always vectorize implementations, as in this example. Moreover, if you want to benefit from TensorFlows graph features, you should use only TensorFlow operations'

It is also preferable to return a tensor containing one loss per instance, rather than returning the mean loss. This way, Keras can apply class weights or sample weights when requested (Chapter 10). Now you can use this loss when you compile the Keras model, then train your model, and that's it! But what happens to this custom loss when you save the model?

## Saving and Loading Models That Contain Custom Components

Saving a model containing a custom loss function works fine, as Keras saves the name of the function. When you load a model containing custom objects, you need to map the names to the objects, as shown below.

In [144]:
# model = tf.keras.models.load_model('model_with_custom_loss.h5', custom_objects={'huber_fn': huber_fn})

With the current implementation, any error between -1 and 1 is considered "small". But what if you want a different threshold? You can solve this by creating a subclass of the keras.losses.Loss class, and then implementing its get_config() method, as shown below

In [147]:
class HuberLoss(tf.keras.losses.Loss):
    def __init__(self, threshold=1.0, **kwargs):
        self.threshold = threshold
        super().__init__(**kwargs)
        
    def call(self, y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < self.threshold
        squared_loss = tf.square(error) / 2
        linear_loss = self.threshold * tf.abs(error) - self.threshold**2 / 2
        return tf.where(is_small_error, squared_loss, linear_loss)
    
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, 'threshold': self.threshold}

The Keras API currently only specifies how to use subclassing to define layers, models, callbacks, and regularizers. If you build other components (such as losses, metrics, initializers, or constraints) using subclassing, they may not be portable to other Keras implementations. It's likely that the Keras API will be updated to specify subclassing for all these components as well.

That said, let's walk through the code above:

1. The constructor accepts **kwargs** and passes them to the parent constructor, which handles standard hyperparameters: the name of the loss and the reduction algorithm to use to aggregate the individual instance losses. By default, it is 'sum_over_batch_size', which means that the loss will be the sum of the instance losses, weighted by the sample weights, if any, and divided by the batch size (not by the sum of weights, so this is *not* the weighted mean). It would not be a good idea to use a weighted mean: if you did, then two instances with the same weight but in different batches would have a different impact on training, depending on the total weight of each batch. Other possible values are 'sum' and 'None'.

2. The call() method takes the labels and predictions, computes all the instance losses, and returns them.

3. The get_config() method returns a dictionary mapping each hyperparameter name to its value. It first calls the parent classes get_config() method, then adds the new hyperparameters to this dictionary.

You can then use any instance of this class when you compile the model. When you save the model, the threshold will be saved along with it; and when you load the model, you just need to map the class name to the class itself, as shown below.

In [150]:
# model.compile(loss=HuberLoss(threshold=2.), optimizer='nadam')
# model = tf.keras.models.load_model('my_model_with_custom_loss_class.h5', custom_objects={'HuberLoss': HuberLoss})

## Custom Activation Functions, Initializers, Regularizers, and Contstraints

Most Keras functionalities, such as losses, regularizers, constraints, initializers, metrics, activation functions, layers, and even full models, can be customized in very much the same way. Most of the time, you will just need to write a simple function with the appropriate inputs and outputs. Some examples below:

In [155]:
def my_softplus(z):
    ' Return value is just tf.nn.softplus(z)'
    return tf.math.log(tf.exp(z) + 1.0)

In [157]:
def my_glorot_initializer(shape, dtype=tf.float32):
    stddev = tf.sqrt(2. / (shape[0] + shape[1]))
    return tf.random.normal(shape, stddev=stddev, dtype=dtype)

In [159]:
def my_l1_regularizer(weights):
    return tf.reduce_sum(tf.abs(0.01 * weights))

In [161]:
def my_positive_weights(weights):
    ' Return value is just tf.nn.relu(weights)'
    return tf.where(weights < 0., tf.zeroes_like(weights), weights)

In [163]:
layer = tf.keras.layers.Dense(
    units=30,
    activation=my_softplus,
    kernel_initializer=my_glorot_initializer,
    kernel_regularizer=my_l1_regularizer,
    kernel_constraint=my_positive_weights
)

The activiation function will be applied to the output of this Dense layer, and its result will be passed on to the next layer. The layer's weights will be initialized using the value returned by the initializer. At each training step the weights will be passed to the regularization function to compute the regularization loss, which will be added to the main loss to get the final loss used for training. Finally, the constraint function will be called after each training step, and the layer's weights will be replaced by the constained weights.

If a function has hyperparameters that need to be saved along with the model, then you will want to subclass the appropriate class. Note that yo umust implement the call() method for losses, layers (including activation functions), and models, or the __ call__() method for regularizers, initializers, and constraints. For metrics, things are a bit different.

In [168]:
class MyL1Regularizer(tf.keras.regularizers.Regularizer):
    def __init__(self, factor):
        self.factor = factor
        
    def __call__(self, weights):
        return tf.reduce_sum(tf.abs(self.factor * weights))
    
    def get_config(self):
        return {'factor': self.factor}

## Custom Metrics

Losses and metrics are conceptually not the same thing: losses are used by Gradient Descent to _train_ a model, so they must be differentiable and their gradients should not be 0 everywhere. In contrast, metrics are used to _evaluate_ a model: they must be more easily interpretable, and they can be non-differentiable or have 0 gradients everywhere. 

That said, in most cases, defining a custom metric function is exactly the same as defining a custom loss function. In fact, we could even use the Huber loss function we created earlier as a metric; it would work just fine. 

For each batch during training, Keras will compute this metric and keep track of its mean since the beginning of the epoch. Most of the time, this is exactly what you want. But not always! Consider a binary classifier's precision, for example. In this case, what we need is an object that can keep track of the number of true positives and the number of false positives and that can compute their ratio when requested. This is precisely what the tf.keras.metrics.Precision class does. This is called a _streaming metric_ (or _stateful metric_), as iti s gradually updated, batch after batch.

If you need to create such a streaming metric, create a subclass of the tf.keras.metrics.Metric class. Here is simple example that keeps track of the total Huber loss and the number of instances seen so far.

In [None]:
def create_huber(threshold=1.0):
    def huber_fn(y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < threshold
        squared_loss = tf.square(error) / 2
        linear_loss = threshold * tf.abs(error) - threshold**2 / 2
        return tf.where(is_small_error, squared_loss, linear_loss)

In [174]:
# Example using a Loss function as a metric
# model.compile(loss='mse', optimizer='nadam', metrics=[create_huber(2.0)])

In [176]:
class HuberMetric(tf.keras.metrics.Metric):
    def __init__(self, threshold=1.0, **kwargs):
        super().__init__(**kwargs) # handles base args (e.g. dtype)
        self.huber_fn = create_huber(threshold)
        self.total = self.add_weight('total', initializer='zeros')
        self.count = self.add_weight('count', initializer='zeros')
        
    def update_state(self, y_true, y_pred, sample_weights=None):
        metric = self.huber_fn(y_true, y_pred)
        self.total.assign_add(tf.reduce_sum(metric))
        self.count.assign_add(tf.cast(tf.size(y_true), tf.float32))
        
    def result(self):
        return self.total / self.count
    
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, 'threshold': self.threshold}

Let's walk through the class above:

1. The constructor uses the add_weight() method to create the variables needed to keep track of the metric's state over multiple batches - in this case, the sum of all Huber losses (total) and the number of instances seen so far (count). You could just create variables manually if you preferred. Keras tracks any tf.Variable that is set as an attribute (and more generally, any 'trackable' object, such as layers or models).

2. The update_state() method is called when you use an instance of this class as a function. It updates the variables, given the labels and predictions for one batch (and sample weights, but in this case we ignore them).

3. The result() method computes and returns the final result, in this case the mean Huber metric over all instances. When you use the metric as a function, the update_state() method gets called first, then the result() method is called, and its output is returned.

4. We also implement the get_config() method to ensure the threshold gets saved along with the model.

5. The default implementation of the reset_states() method resets all variables to 0.0 (but you can override this if needed).

Keras will take care of variable persistence seamlessly; no action is required. Now that we have built a streaming metric, building a custom layer will seem like a walk in the park!

## Custom Layers

## Custom Models

# Losses and Metrics Based on Model Internals

## Computing Gradients Using Autodiff

## Custom Training Loops

# TensorFlow Functions and Graphs

## AutoGraph and Tracing

## TF Function Rules

# Exercises

1. **How would you describe TensorFlow in a short sentence? What are its main features? Can you name other popular Deep Learning libraries?**

My Answer:

Book Answer:

2. **Is TensorFlow a drop-in replacement for NumPy? What are the main differences between the two?**

My Answer:

Book Answer:

3. **Do you get the same result with tf.range(10) and tf.constant(np.arange(10))?**

My Answer:

Book Answer:

4. **Can you name six other data structures available in TensorFlow, beyond regular tensors?**

My Answer:

Book Answer:

5. **A custom loss function can b e defined by writing a function or by subclassing the keras.losses.Loss class. When would you use each option?**

My Answer:

Book Answer:

6. **Similarly, a custom metric can be defined in a function or a subclass of keras.metrics.Metric. When would you use each option?**

My Answer:

Book Answer:

7. **When should you create a custom layer versus a custom model?**

My Answer:

Book Answer:

8. **What are some use cases that require writing your own custom training loop?**

My Answer:

Book Answer:

9. **Can custom Keras components contain arbitrary Python code, or must they be convertible to TF Functions?**

My Answer:

Book Answer:

10. **What are the main rules to respect if you want a function to be convertible to a TF Function?**

My Answer:

Book Answer:

11. **When would you need to create a dynamic Keras model? How do you do that? Why not make all your models dynamic?**

My Answer:

Book Answer:

12. **Implement a custom layer that performs *Layer Normalization* (we will use this type of layer in Chapter 15):**

    1. The build() method should define two trainable weights $\alpha$ and $\beta$, both of shape input_shape[-1:] and data type tf.float32. $\alpha$ should be initialized with 1s and $\beta$ with 0s

    2. The call() method should compute the mean $\mu$ and standard deviation $\sigma$ of each instance's features. For this, you can use tf.nn.moments(inputs, axes=-1, keepdims=True), which returns the mean $\mu$ and the variance $\sigma^{2}$ of all instances (compute the square root of the variance to get the standard deviation). Then the function should computer and return $\alpha\bigotimes\frac{(X - \mu)}{\sigma + \epsilon} + \beta$, where \bigotimes represents itemwise multiplication and \epsilon is a smoothing term (small constant to avoid division by zero, e.g. 0.001)

    3. Ensure that your custom layer produces the same (or very nearly the same) outoput as the keras.layers.LayerNormalization layer.

13. **Train a model using a custom training loop to tackle the Fashion MNIST dataset (see Chapter 10).**

    1. Display the epoch, iteration, mean training loss, and mean accuracy over each epoch (updated at each iteration), as well as the validation loss and accuracy at the end of each epoch.
    
    2. Try using a different optimizer with a different learning rate for the upper layers and the lower layers.