# Chapter 12 - Custom Models and Training with TensorFlow

## Tensorflow's Architecture

![tf_architecture](./images/ch12_tensorflows_architecture.png)

## Using TensorFlow like NumPy


In [5]:
import tensorflow as tf
import numpy as np
from tensorflow import keras

2022-01-06 15:09:54.235583: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-01-06 15:09:54.235615: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


### Tensors and Operations

In [6]:
t = tf.constant([[1., 2., 3.], [4., 5., 6.]]) # matrix

In [8]:
t

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

In [9]:
t.shape

TensorShape([2, 3])

In [10]:
t.dtype

tf.float32

In [12]:
t[:, 0]

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([1., 4.], dtype=float32)>

In [11]:
tf.constant(42)

<tf.Tensor: shape=(), dtype=int32, numpy=42>

### Tensors and Numpy

In [15]:
a = np.array([2., 4., 5.])
tf.constant(a)

<tf.Tensor: shape=(3,), dtype=float64, numpy=array([2., 4., 5.])>

In [16]:
t.numpy() # or np.array(t)

array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)

In [18]:
tf.square(a)

<tf.Tensor: shape=(3,), dtype=float64, numpy=array([ 4., 16., 25.])>

In [19]:
np.square(t)

array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)

Notice that NumPy uses 64-bit precision by default, while TensorFlow uses 32-bit. This is because 32-bit precision is generally more than enough for neural networks, plus it runs faster and uses less RAM. So when you create a tensor from a NumPy array, make sure to set `dtype=tf.float32`.

### Type Conversions

Type conversions can significantly hurt performance, and they can easily go unnoticed when they are done automatically. To avoid this, TensorFlow does not perform any type conversions automatically: it just raises an exception if you try to execute an operation on tensors with incompatible types. For example, you cannot add a float tensor and an integer tensor, and you cannot even add a 32-bit float and a 64-bit float:

In [22]:
# tf.constant(2.) + tf.constant(40)

In [23]:
# tf.constant(2.) + tf.constant(40., dtype=tf.float64)

To perform type conversion, use `tf.cast`

In [24]:
t2 = tf.constant(40., dtype=tf.float64)

In [25]:
tf.constant(2.0) + tf.cast(t2, tf.float32)

<tf.Tensor: shape=(), dtype=float32, numpy=42.0>

### Variables

tensorflow objects like `tf.constant` are immutable, meaning that we cannot use regular tensors to implement weights in a neural network. Our only option is to use `tf.Variable`, which works just like `tf.constant`.

In [28]:
v = tf.Variable([[1., 2., 3.], [4., 5., 6.]])
v

<tf.Variable 'Variable:0' shape=(2, 3) dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

In [31]:
# constant
t[0, 0] = 2

TypeError: 'tensorflow.python.framework.ops.EagerTensor' object does not support item assignment

## Customizing Models and Training Algorithms

### Custom Loss Functions

Suppose we want to implement a custom Huber Loss function (The Huber loss is not currently part of the official Keras API, but it is available in tf.keras just use an instance of the keras.losses.Huber class, but let's pretend it does not exist).

We must simply create a function that receives the labels and the predictions as arguments:

In [32]:
# Use tensorflow operation to benefit from TensorFlow's graph features
def huber_fn(y_true, y_pred):
    
    error = y_true - y_pred
    is_small_error = tf.abs(error) < 1
    squared_loss = tf.square(error) / 2
    linear_loss = tf.abs(error) - 0.5
    
    return tf.where(is_small_error, squared_loss, linear_loss)

It is also preferable to return a tensor containing one loss per instance, rather
than returning the mean loss. This way, Keras can apply class weights or sample
weights when requested (see Chapter 10).

To use this custom loss function, just pass it to the loss argument in the `compile` method.

```python
model.compile(loss=huber_fn, optimizer="nadam")
model.fit(X_train, y_train, [...])
```

### Saving and Loading Models That Contain Custom Components

Once the model is saved, to load it with a custom loss function, for example, we need to provide a dictionary that maps the function name to the actual function.

```python
model = keras.models.load_model("my_model_with_a_custom_loss.h5", custom_objects={"huber_fn": huber_fn})
```

note that the current implementation considers any error between -1 and 1 as small. If we wish to turn this threshold into a parameter, we need to create a function that outputs the custom loss function with the desired threshold:

In [33]:
def create_huber(threshold=1.0):
    def huber_fn(y_true, y_pred):
        
        error = y_true - y_pred
        is_small_error = tf.abs(error) < threshold
        squared_loss = tf.square(error) / 2
        linear_loss = threshold * tf.abs(error) - threshold**2 / 2
        
        return tf.where(is_small_error, squared_loss, linear_loss)
    return huber_fn

```python
model.compile(loss=create_huber(2.0), optimizer="nadam")
```

However, when saving the model, the threshold will not be saved. We need to pass it along with the function:

```python
model = keras.models.load_model("my_model_with_a_custom_loss.h5", custom_objects={"huber_fn": create_huber(2.0)})
```

*note that the name to use is `huber_fn`, which is the name of the function you gave Keras, not the name of the function that created it.*

To solve this problem, we can create a subclass of the `keras.losses.loss` class, and then implementing its `get_config()` method:

In [41]:
class HuberLoss(keras.losses.Loss):
    def __init__(self, threshold=1.0, **kwargs):
        self.threshold = threshold
        super().__init__(**kwargs)
    
    
    def call(self, y_true, y_pred):
        
        error = y_true - y_pred
        is_small_error = tf.abs(error) < self.threshold
        squared_loss = tf.square(error) / 2
        linear_loss = self.threshold * tf.abs(error) - self.threshold**2 / 2
        
        return tf.where(is_small_error, squared_loss, linear_loss)
    
    
    def get_config(self):
        base_config = super().get_config()
        
        return {**base_config, "threshold": self.threshold}

- The constructor accepts \*\*kwargs and passes them to the parent constructor, which handles standard hyperparameters: the name of the loss and the `reduction` algorithm to use to aggregate the individual instance losses. By default, it is `"sum_over_batch_size"`, which means that the loss will be the sum of the instance losses, weighted by the sample weights, if any, and divided by the batch size (not by the sum of weights, so this is not the weighted mean). Other possible values are `"sum"` and `None`.

- The `call()` method takes the labels and predictions, computes all the instance losses, and returns them.

- The `get_config()` method returns a dictionary mapping each hyperparameter name to its value. It first calls the parent class’s `get_config()` method, then adds the new hyperparameters to this dictionary (note that the convenient {\*\*x} syntax was added in Python 3.5).

Then, we can use any instance of this class when compiling the model. And when saving the model, the threshold will be saved along with it. To load the model, we just need to map the class name to the class itself:

```python
# compiling the model
model.compile(loss=HuberLoss(2.), optimizer="nadam")
```

```python
# loading the model
model = keras.models.load_model("my_model_with_a_custom_loss_class.h5", custom_objects={"HuberLoss": HuberLoss})
``` 

### Custom Activation Functions, Initializers, Regularizers, and Constraints

Most Keras functionalities, such as losses, regularizers, constraints, initializers, metrics, activation functions, layers, and even full models, can be customized in very much the same way. Most of the time, you will just need to write a simple function with the appropriate inputs and outputs.

Here are examples of a custom activation function (equivalent to `keras.activations.softplus()` or `tf.nn.softplus()`), a custom Glorot initializer (equivalent to `keras.initializers.glorot_normal()`), a custom $\ell_1$ regularizer (equivalent to `keras.regularizers.l1(0.01)`), and a custom constraint that ensures weights are all positive (equivalent to `keras.constraints.nonneg()` or `tf.nn.relu()`):



```python
def my_softplus(z): # return value is just tf.nn.softplus(z)
    return tf.math.log(tf.exp(z) + 1.0)

def my_glorot_initializer(shape, dtype=tf.float32):
    stddev = tf.sqrt(2. / (shape[0] + shape[1]))
    return tf.random.normal(shape, stddev=stddev, dtype=dtype)

def my_l1_regularizer(weights):
    return tf.reduce_sum(tf.abs(0.01 * weights))

def my_positive_weights(weights): # return value is just tf.nn.relu(weights)
    return tf.where(weights < 0., tf.zeros_like(weights), weights)
```

The arguments depend on the type of the custom function. These custom functions could be used as follows:

```python
layer = keras.layers.Dense(30, activation=my_softplus,
                           kernel_initializer=my_glorot_initializer,
                           kernel_regularizer=my_l1_regularizer,
                           kernel_constraint=my_positive_weights)
```

If a function has hyperparameters that need to be saved along with the model, then you must subclass the appropriate class, just as we did before. Examples of classes are `keras.regularizers.Regularizer`, `keras.constraints.Constraint`, `keras.initializers.Initializer`, or `keras.layers.Layer` (for any layer, including activation functions). Every class must have a `call()` method for losses, layers (including activation functions), and models, or the `__call__()` method for regularizers, initializers, and constraints. For metrics, things are a bit different, as we will see now.

### Custom Metrics

In most cases, defining a custom metric function is exactly the same as defining a custom loss function. In fact, we could even use the Huber loss function we created earlier as a metric. it would work just fine (and persistence
would also work the same way, in this case only saving the name of the function, `"huber_fn"`):

```python
model.compile(loss="mse", optimizer="nadam", metrics=[create_huber(2.0)])
```

For each batch, during training, Keras will compute this metric and keep track of its mean since the beginning of the epoch. however, this is not always what we want. It makes no sense to average the precision of the classifier for each batch, it is simply wrong to compute it like this. What we need is to keep track of the true positives and false positives and that can compute their ratio when requested. This is what the `keras.metrics.Precision` class does:

```python
>>> precision = keras.metrics.Precision()
>>> precision([0, 1, 1, 1, 0, 1, 0, 1], [1, 1, 0, 1, 0, 1, 0, 1])
<tf.Tensor: id=581729, shape=(), dtype=float32, numpy=0.8>
>>> precision([0, 1, 0, 0, 1, 0, 1, 1], [1, 0, 1, 1, 0, 0, 0, 0])
<tf.Tensor: id=581780, shape=(), dtype=float32, numpy=0.5>
```

Note that in this example, the precision metrics is used as a function that receives the labels and the predictions (could also receives sample weights). For each batch (function call), it updates the metrics (first 0.8, then 0.5). This is called a *streaming metric* (or *stateful metric*), as it is gradually updated, batch after batch. 

At any point, we can call the `result()` method to get the current value of the metric. See its variables with `variables` attribute, or use the `reset_states()` method to reset the value of the metric to 0.

```python
>>> p.result()
<tf.Tensor: id=581794, shape=(), dtype=float32, numpy=0.5>
>>> p.variables
[<tf.Variable 'true_positives:0' [...] numpy=array([4.], dtype=float32)>,
<tf.Variable 'false_positives:0' [...] numpy=array([4.], dtype=float32)>]
>>> p.reset_states() # both variables get reset to 0.0
```

To create a streaming metric, create a subclass of the `keras.metrics.Metric` class.

```python
class HuberMetric(keras.metrics.Metric):
    def __init__(self, threshold=1.0, **kwargs):
        super().__init__(**kwargs) # handles base args (e.g., dtype)
        self.threshold = threshold
        self.huber_fn = create_huber(threshold)
        self.total = self.add_weight("total", initializer="zeros")
        self.count = self.add_weight("count", initializer="zeros")
        
    def update_state(self, y_true, y_pred, sample_weight=None):
        metric = self.huber_fn(y_true, y_pred)
        self.total.assign_add(tf.reduce_sum(metric))
        self.count.assign_add(tf.cast(tf.size(y_true), tf.float32))

    def result(self):
        return self.total / self.count
    
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "threshold": self.threshold}
```

Walking through the code: 

- The constructor uses `add_weight()` to create the `tf.variables` need to keep track of the metric's state over multiple batches, but we could simply declare a `tf.Variable` that keras would keep track of it for us.

- the `update_state()` method is called when we use an instance of this class as a function. It updates the variables for one batch. We are ignoring `sample_weight` for now.

- the `result()` method returns the current state of the metric. When we use an instance of this class as a function, it calls the `update_state()`, then the `result()` state.

- the default implementation of the `reset_states()` method resets all variables to 0.0 (but you can override it if needed).

### Custom Layers

To build a custom stateful layer (i.e., a layer with weights), we need to subclass the class `keras.layers.Layers`.

```python

class MyDense(keras.layers.Layer):
    def __init__(self, units, activation=None, **kwargs):
        super().__init__(**kwargs)
        self.units = units
        self.activation = keras.activations.get(activation)
    
    def build(self, batch_input_shape):
        self.kernel = self.add_weight(
            name="kernel", shape=[batch_input_shape[-1], self.units],
            initializer="glorot_normal")
        self.bias = self.add_weight(
            name="bias", shape=[self.units], initializer="zeros")
        super().build(batch_input_shape) # must be at the end
        
    def call(self, X):
        return self.activation(X @ self.kernel + self.bias)

    def compute_output_shape(self, batch_input_shape):
        return tf.TensorShape(batch_input_shape.as_list()[:-1] + [self.units])

    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "units": self.units,
                "activation": keras.activations.serialize(self.activation)}
```

- The construct takes all default hyperparameters with `**kwargs`: this takes care of standard arguments such as `input_shape`, `trainable`, and `name`.

- The `build()` method creates the layers' variables. We must pass the number of the neurons in the previous layer in order to create the connection weights matrix (i.e., the "kernel"). in the end we call the `super.build()` method to tell keras the layer was created. 

- The `call()` method performs the desired operations.

- The `compute_output_shape()` method simply returns the shape of this layer’s outputs.

### Custom Models

We already saw in chapter 10 how to use the subclass API. As a plus, If you also want to be able to save the model using the `save()` method and load it using the `keras.models.load_model()` function, you must implement the `get_config()` method just like we did before.

In addition, The `Model` class is a subclass of the `Layer` class, so models can be defined and
used exactly like layers. But a model has some extra functionalities. Then why distinguish between the two? to be more explicitly and develop a cleaner code.

With that, you can naturally and concisely build almost any model that you find in a paper, using the Sequential API, the Functional API, the Subclassing API, or even a mix of these. “Almost” any model? Yes, there are still a few things that we need to look at: first, how to define losses or metrics based on model internals, and second, how to build a custom training loop.

### Losses and Metrics Based on Model Internals

The custom losses and metrics we defined earlier were all based on the labels
and the predictions (and optionally sample weights). There will be times when
you want to define losses based on other parts of your model, such as the
weights or activations of its hidden layers. This may be useful for regularization
purposes or to monitor some internal aspect of your model.

**To define a custom loss based on model internals, compute it based on any part
of the model you want, then pass the result to the `add_loss()` method**.For
example, let’s build a custom regression MLP model composed of a stack of five
hidden layers plus an output layer. This custom model will also have an auxiliary
output on top of the upper hidden layer. The loss associated to this auxiliary
output will be called the *reconstruction loss* (see Chapter 17): it is the mean
squared difference between the reconstruction and the inputs. By adding this
reconstruction loss to the main loss, we will encourage the model to preserve as
much information as possible through the hidden layers—even information that
is not directly useful for the regression task itself. In practice, this loss
sometimes improves generalization (it is a regularization loss). Here is the code
for this custom model with a custom reconstruction loss:

```python
class ReconstructingRegressor(keras.Model):
    def __init__(self, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.hidden = [keras.layers.Dense(30, activation="selu",
        kernel_initializer="lecun_normal")
        for _ in range(5)]
        self.out = keras.layers.Dense(output_dim)
        
    def build(self, batch_input_shape):
        n_inputs = batch_input_shape[-1]
        self.reconstruct = keras.layers.Dense(n_inputs)
        super().build(batch_input_shape)
        
    def call(self, inputs):
        Z = inputs
        for layer in self.hidden:
            Z = layer(Z)
        reconstruction = self.reconstruct(Z)
        recon_loss = tf.reduce_mean(tf.square(reconstruction - inputs))
        self.add_loss(0.05 * recon_loss)
        return self.out(Z)
```

- The constructor creates a DNN with five hidden layers and one dense output layer.

- The `build()` method creates an extra dense layer which will be used to reconstruct the inputs of the model. It must be created here because its number of units must be equal to the number of inputs, and this number is unknown before the `build()` method is called.

- The `call()` method processes the inputs through all five hidden layers, then passes the result through the reconstruction layer, which produces the reconstruction.

- Then the `call()` method computes the reconstruction loss (the mean squared difference between the reconstruction and the inputs), and adds it to the model’s list of losses using the `add_loss()` method.

- Finally, the `call()` method passes the output of the hidden layers to the output layer and returns its output.

## Computing Gradients Using Autodiff

In chapter 10 (Appendix 10) we commented about *automatic differentiation* while introducing the tool that made possible training deep neural networks: the backpropagation algorithm. There are various autodiff techniques with its pros and cons. The one used by backpropagation is called reverse-mode autodiff.

To understand how autodiff works, consider the following function:

In [1]:
def f(w1, w2):
    return 3 * w1 ** 2 + 2 * w1 * w2

From calculus we know that the gradient of f when $(w_1, w_2) = (5, 3)$ is $(36, 10)$. To get this solution, we can compute an approximation of each partial derivative by measuring how much the function's output changes when we tweak the corresponding parameter:

In [2]:
w1, w2 = 5, 3
eps = 1e-6
(f(w1 + eps, w2) - f(w1, w2)) / eps

36.000003007075065

In [3]:
(f(w1, w2 + eps) - f(w1, w2)) / eps

10.000000003174137

The problem with this approach is that we'd have to call $f$ once per parameter. Instead, we should use autodiff:

In [7]:
w1, w2 = tf.Variable(5.), tf.Variable(3.)
with tf.GradientTape() as tape:
    z = f(w1, w2)
    
gradients = tape.gradient(z, [w1, w2])

In [9]:
gradients

[<tf.Tensor: shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=10.0>]

## Custom Training Loops

Sometimes we may need to implement our own training loop. Let's see how we can do it.

In [20]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()
X_train_full, X_test, y_train_full, y_test = train_test_split(
    housing.data, housing.target.reshape(-1, 1), random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(
    X_train_full, y_train_full, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_valid_scaled = scaler.transform(X_valid)
X_test_scaled = scaler.transform(X_test)

In [21]:
l2_reg = keras.regularizers.l2(0.05)
model = keras.models.Sequential([
    keras.layers.Dense(30, activation="elu", kernel_initializer="he_normal",
    kernel_regularizer=l2_reg),
    keras.layers.Dense(1, kernel_regularizer=l2_reg)
])

In [22]:
# sample random batches
def random_batch(X, y, batch_size=32):
    idx = np.random.randint(len(X), size=batch_size)
    return X[idx], y[idx]

In [23]:
# print status bar
def print_status_bar(iteration, total, loss, metrics=None):
    metrics = " - ".join(["{}: {:.4f}".format(m.name, m.result())
                          for m in [loss] + (metrics or [])])
    
    end = "" if iteration < total else "\n"
        
    print("\r{}/{} - ".format(iteration, total) + metrics, end=end)

In [24]:
# Define hyperparameters and choose the optimizer
n_epochs = 5
batch_size = 32
n_steps = len(X_train) // batch_size
optimizer = keras.optimizers.Nadam(lr=0.01)
loss_fn = keras.losses.mean_squared_error
mean_loss = keras.metrics.Mean()
metrics = [keras.metrics.MeanAbsoluteError()]

  super(Nadam, self).__init__(name, **kwargs)


Now we are ready to build the custom loop:

In [25]:
for epoch in range(1, n_epochs + 1):
    print("Epoch {}/{}".format(epoch, n_epochs))
    
    for step in range(1, n_steps + 1):
        X_batch, y_batch = random_batch(X_train_scaled, y_train)
        
        with tf.GradientTape() as tape:
            y_pred = model(X_batch, training=True)
            main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
            loss = tf.add_n([main_loss] + model.losses)
            
        gradients = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        mean_loss(loss)
        
        for metric in metrics:
            metric(y_batch, y_pred)
            
        print_status_bar(step * batch_size, len(y_train), mean_loss, metrics)
    print_status_bar(len(y_train), len(y_train), mean_loss, metrics)
    
    for metric in [mean_loss] + metrics:
        metric.reset_states()

Epoch 1/5
11610/11610 - mean: 1.4295 - mean_absolute_error: 0.5820
Epoch 2/5
11610/11610 - mean: 0.6504 - mean_absolute_error: 0.5209
Epoch 3/5
11610/11610 - mean: 0.6789 - mean_absolute_error: 0.5335
Epoch 4/5
11610/11610 - mean: 0.6382 - mean_absolute_error: 0.5182
Epoch 5/5
11610/11610 - mean: 0.6535 - mean_absolute_error: 0.5260


- First we create two training loops, one for epochs and the other for the batches.
- Then, we sample a random batch from the training set
- Inside the `tf.GradientTape()` block, we make a prediction for one batch (using the model as a function), and we compute the loss: it is equal to the main loss plus the other losses (in this model, there is one regularization loss per layer). Since the `mean_squared_error()` function returns one loss per instance, we compute the mean over the batch using `tf.reduce_mean()` (if you wanted to apply different weights to each instance, this is where you would do it). The regularization losses are already reduced to a single scalar each, so we just need to sum them (using `tf.add_n()`, which sums multiple tensors of the same shape and data type).
- Next, we ask the tape to compute the gradient of the loss with regard to each trainable variable (not all variables!), and we apply them to the optimizer to perform a Gradient Descent step.
- Then we update the mean loss and the metrics (over the current epoch), and we display the status bar.
- At the end of each epoch, we display the status bar again to make it look complete and to print a line feed, and we reset the states of the mean loss and the metrics.

If you set the optimizer’s `clipnorm` or `clipvalue` hyperparameter, it will take care of this for you. If you want to apply any other transformation to the gradients, simply do so before calling the `apply_gradients()` method.

If you add weight constraints to your model (e.g., by setting `kernel_constraint` or `bias_constraint` when creating a layer), you should update the training loop to apply these constraints just after `apply_gradients()`:

```python
for variable in model.variables:
    if variable.constraint is not None:
        variable.assign(variable.constraint(variable))
```

## Exercises

1. **How would you describe TensorFlow in a short sentence? What are its main features? Can you name other popular Deep Learning libraries?**

TensorFlow is an open-source library for numerical computation, particularly well suited and fine-tuned for large-scale Machine Learning. Its core is similar to NumPy, but it also features GPU support, support for distributed computing, computation graph analysis and optimization capabilities (with a portable graph format that allows you to train a TensorFlow model in one environment and run it in another), an optimization API based on reverse-mode autodiff, and several powerful APIs such as tf.keras, tf.data, tf.image, tf.signal, and more. Other popular Deep Learning libraries include PyTorch, MXNet, Microsoft Cognitive Toolkit, Theano, Caffe2, and Chainer.

2. **Is TensorFlow a drop-in replacement for NumPy? What are the main differences between the two?**

No. Even though TensorFlow offers similar operations, there are some significant changes to NumPy. The function names does not match, and some functions behave differently. Also, NumPy arrays are mutable, while TensorFlow tensors are not.

3. **Do you get the same result with `tf.range(10)` and `tf.constant(np.arange(10))`?**

In [6]:
import tensorflow as tf
import numpy as np

In [7]:
tf.range(10)

<tf.Tensor: shape=(10,), dtype=int32, numpy=array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)>

In [8]:
tf.constant(np.arange(10))

<tf.Tensor: shape=(10,), dtype=int64, numpy=array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])>

The answer is no. What happens is that TensorFlow natively only deals with 32 bits objects. Meanwhile, NumPy natively creates objects of 64 bits. Therefore, the result may appear to be the same, but the data types are different.

4. **Can you name six other data structures available in TensorFlow, beyond regular tensors?**

- *Sparse Tensors* (`tf.SparseTensor`): Efficiently represent tensors containing mostly zeros.
- *Tensor Arrays* (`tf.TensorArray`): Lists of tensors. 
- *Ragged Tensors* (`tf.RaggedTensor`): Represent static lists of lists of tensors, where every tensor has the same shape and data type.
- *String Tensors*: regular tensors of type `tf.string`.
- *Sets*: Represented as regular tensors (or sparse tensors)
- *Queues*: Store tensors across multiple steps.

5. **A custom loss function can be defined by writing a function or by subclassing the `keras.losses.Loss` class. When would you use each option?**

We can simply write a function to represent a custom loss function when such function has no parameter to be persisted. If there is at least one parameter that must be persisted in order to make the code portable, then, we must use the subclassing API implementing the `get_config` method in the custom loss function class. 

6. **Similarly, a custom metric can be defined in a function or a subclass of `keras.metrics.Metric`. When would you use each option?**

If we want to create a static metric which simply receives the inputs and outputs it, independently of any external contexts, we are safe to just implement it as a function. However, if our metrics need to be streamed during training, for instance, then we must implement it using the subclassing API to keep all states we are interested in. What I want to say is that if computing the metric over a whole epoch is not equivalent to computing the mean metric over all batches in that epoch, then we must subclass `keras.metrics.Metric`.

7. **When should you create a custom layer versus a custom model?**

Custom models subclasses the `keras.models.Model`, which is a subclass of the `keras.layers.Layer` class. Therefore, a model can be viewed as a layer with a few more methods. However, for the sake of clarity of code, we must face these objects as different things, and implement a custom layers for layers and custom models for models. 

8. **What are some use cases that require writing your own custom training loop?**

- You may want more control over the training process.
- You may want to use more than one optimizer during training. 

9. **Can custom Keras components contain arbitrary Python code, or must they be convertible to TF Functions?**

They can. However, this will kill the performance and optization operations performed by TensorFlow. The recommended is to stick to TF operations only inside the body of the function. If you absolutely
need to include arbitrary Python code in a custom component, you can either wrap it in a `tf.py_function()` operation (but this will reduce performance and limit your model's portability) or set `dynamic=True` when creating the custom layer or model (or set `run_eagerly=True` when calling the model's `compile()` method).

10. **What are the main rules to respect if you want a function to be convertible to a TF Function?**

- Use TensorFlow operations whenever possible.
- Do not use code with side effects such as logging or updating a counter because they will only run during tracing, which happens only in the first call to the function.
- If there is no alternative other than use arbitrary python code in a function, wrap it with `ty.py_function()`.
- Functions called inside other functions must be implemented over the sample principles.
- If the function creates a TensorFlow variable, it must do so upon the very first call, and only then, or else you will get an exception.
- The source code of the function must be available to TensorFlow.
- TensorFlow will only capture `for` loops that iterate over a tensor or a dataset.
- Prefer vectorized implementations whenever you can.

11. **When would you need to create a dynamic Keras model? How do you do that? Why not make all your models dynamic?**

Creating a dynamic Keras model can be useful for debugging, as it will not compile any custom component to a TF Function, and you can use any Python debugger to debug your code. It can also be useful if you want to include arbitrary Python code in your model (or in your training code), including calls to external libraries. To make a model dynamic, you must set `dynamic=True` when creating it. Alternatively, you can set `run_eagerly=True` when calling the model's `compile()` method. Making a model dynamic prevents Keras from using any of TensorFlow's graph features, so it will slow down training and inference, and you will not have the possibility to export the computation graph, which will limit your model's portability.

Exercises **12** and **13** are not implemented, refer to [https://github.com/ageron/handson-ml2](https://github.com/ageron/handson-ml2).