# Chapter 12: Custom Models and Training with TensorFlow Exercises

## 1.

> How would you describe TensorFlow in a short sentence?

> What are its main features?

> Can you name other popular Deep Learning libraries?

TensorFlow is a powerful library for numerical computation, particularly well suited and fine-tuned for large-scale Machine Learning.

Its main features are:
- Core is very similar to NumPy, but with GPU support.
- Supports distributed computing.
- A just-in-time compiler that allows it to optimize computations for speed and memory usage.
- Computation graphs can be exported to a portable format.
- Implements autodiff and provides some excellent optimizers.

Other popular Deep Learning libraries include: PyTorch, Microsoft Cognitive Toolkit (CNTK), and Theano.

## 2.

> Is TensorFlow a drop-in replacement for NumPy?

> What are the main differences between the two?

Although TensorFlow shares many similarities with NumPy it is not a drop-in replacement.

- Not all functions share the same name.
- Some TensorFlow functions create a new object whereas NumPy changes its view.
- Tensors are immutable compared to ndarrays which are mutable.

## 3.

> Do you get the same result with `tf.range(10)` and `tf.constant(np.arange(10))`?

They both return the same tensor but the numbers in `tf.range(10)` will be 32-bit while `tf.constant(np.arange(10))` will be 64-bit since NumPy uses 64-bit.

## 4.

> Can you name six other data structures available in TensorFlow, beyond regular tensors?

- Sparse tensors
- Tensor arrays
- Ragged tensors
- String tensors
- Sets
- Queues

## 5.

> A custom loss function can be defined by writing a function or by subclassing the `keras.losses.Loss` class.

> When would you use each option?

Writing a function is the general use case for creating a custom loss function, and you should use it when the implemented loss functions such as mean squared error (MSE) or MAE (mean absolute error) do not work well in your model.

Subclassing the `keras.losses.Loss` class should be used when you want to save the hyperparameter values in the custom loss function by implementing its `get_config()` method.

## 6.

> Similarly, a custom metric can be defined in a function or a subclass of `keras.metrics.Metric`.

> When would you use each option?

Defining a custom metric in a function is the general use case since Keras automatically calls it for each batch and keeps track of the mean during each epoch.

Defining a custom metric by subclassing `keras.metrics.Metric` is useful when you want to save the hyperparameter values or need to implement a streaming metric, which gradually updates batch after batch instead of an average of all batches.

## 7.

> When should you create a custom layer versus a custom model?

Create custom layers for the internal components of the model (ie. layers or reusable blocks of layers).

Create a custom model for the model itself (ie. the object you will train).

## 8.

> What are some use cases that require writing your own custom training loop?

- When you want to use multiple optimizers since the `fit()` method will only use the one when originally compiled.
- When you want more control over the code and have it explicitly written out.

## 9.

> Can custom Keras components contain arbitrary Python code, or must they be convertible to TF Functions?

Custom Keras components can contain arbitrary Python code since Keras automatically converts them into TF Functions.

## 10.

> What are the main rules to respect if you want a function to be convertible to a TF Function?

- Code from external libraries will not be included in a TensorFlow graph. The function must only include TensorFlow constructs (tensors, operations, variables, datasets, etc).
- Other Python or TF Functions can be called but must follow the same rules.
- TensorFlow variables must be created in the very first call or else an exception will be raised.
- The source code of the Python function must be available to TensorFlow or else the graph generation process will fail or have limited functionality.
- TensorFlow will only capture `for` loops that iterate over a tensor or a dataset.
- Always use vectorized implementation if possible.

## 11.

> When would you need to create a dynamic Keras model?

> How do you do that?

> Why not make all your models dynamic?

Creating a dynamic Keras model is useful for:
- Debugging since you can run the model in a Python debugger
- Including arbitrary Python code that you don't want to be converted
- Including code from external libraries like NumPy

To create a dynamic Keras model, set `dynamic=True` when creating a custom layer or model or set `run_eagerly=True` when calling the model's `compile()` method.

It's not a good idea to make all models dynamic because certain code will not run since it's not a TF Function and will be slower since TensorFlow will not be able to do any graph optimization on the code.

## 12.

> Implement a custom layer that performs *Layer Normalization*:

> a. The `build()` method should define two trainable weights $\alpha$ and $\beta$, both of shape `input_shape[-1:]` and data type `tf.float32`. $\alpha$ should be initialized with 1s, and $\beta$ with 0s.

> b. The `call()` method should compute the mean $\mu$ and standard deviation $\sigma$ of each instance's features.

>> - For this, you can use `tf.nn.moments(inputs, axes=-1, keepdims=True)`, which returns the mean $\mu$ and the variance $\sigma^2$ of all instances (compute the square root of the variance to get the standard deviation).

>> - Then the function should compute and return $\alpha \oplus (\mathbf{X} - \mu)/(\sigma + \epsilon) + \beta$, where $\oplus$ represents itemwise multiplication (*) and $\epsilon$ is a smoothing term (small constant to avoid division by zero, eg. 0.001).

> c. Ensure that your custom layer produces the same (or very nearly the same) output as the `keras.layers.LayerNormalization` layer.

## 13.

> Train a model using a custom training loop to tackle the Fashion MNIST dataset.

> a. Display the epoch, iteration, mean training loss, and mean accuracy over each epoch (updated at each iteration), as well as the validation loss and accuracy at the end of each epoch.

> b. Try using a different optimizer with a different learning rate for the upper layers and the lower layers.