# Exercises

1. How would you describe TensorFlow in a short sentence? What are its main features? Can you name other popular deep learning libraries?
2. Is TensorFlow a drop-in replacement for NumPy? What are the main differences between the two?
3. Do you get the same result with `tf.range(10)` & `tf.constant(np.arange(10))`?
4. Can you name six other data structures available in TensorFlow, beyond regular tensors?
5. A custom loss function can be defined by writing a function or by subclassing the `keras.losses.Loss` class. When would you use each option?
6. Similarly, a custom metric can be defined in a function or a subclass of `keras.metrics.Metric`. When would you use each option?
7. When should you create a custom layer versus a custom model?
8. What are some use cases that require writing your own custom training loop?
9. Can custom keras components contain arbitrary python code, or must they be convertible to tf functions?
10. What are the main rules to respect if you want a function to be convertible to a tf function?
11. When would you need to create a dynamic keras model? How do you do that? Why not make all your models dynamic?
12. Implement a custom layer that performs *Layer Normalisation*:
   * The `build()` method should define two trainable weights $\alpha$ & $\beta$, both of shape `input_shape[-1:]` & data type `tf.float32`. $\alpha$ should be initialised with 1s & $\beta$ with 0s.
   * The `call()` method should compute the mean $\mu$ & standard deviation $\sigma$ of each instance's features. For this, you can use `tf.nn.moments(inputs, axes = -1, keepdims = True)`, which returns the mean $\mu$ & the variance $\sigma^2$ of all instances (compute the square root of the variance to get the standard deviation). Then the function should compute & return $\alpha \otimes (X - \mu)/(\sigma + \varepsilon) + \beta$, where $\otimes$ represents itemwise multiplication ($*$) & $\varepsilon$ is a smoothing term (small constant to avoid division by zero, e.g., 0.001).
   * Ensure that your custom layer produces the same (or very nearly the same) output as the `keras.layers.LayerNormalization` layer.
13. Train a model using a custom training loop to tackle the fashion MNIST dataset.
   * Display the epoch, iteration, mean training loss, & mean accuracy over each epoch (updated at each iteration), as well as the validation loss & accuracy at the end of each epoch.
   * Try using a different optimiser with a different learning rate for the upper layers & the lower layers.

---

1. TensorFlow is a powerful library for large-scale machine learning or mathematical computation, that is also at the center of an ecosystem of libraries built for data validation, visualisation, preprocessing, all backed by a dedicated team of passionate developers & a large community contributing to improving it. It's core features include GPU support for multithreading, distributed computing (across multiple devices & servers), computation optimisation (for speed & memory usage), exportable computation graphs (train TensorFlow models in one environment (Python on Linux) & run it in another (Java on Android)), reverse-mode autodiff implementations, & optimisers (RMSProp & Nadam). Other popular deep learning libraries include pytorch & mxnet.
2. While TensorFlow can perform many of the same math operations as NumPy can, their differences lie mainly in the data types. TensorFlow is much stricter on performing operations with different data types. For example, in NumPy, you can add a float with an integer; but in TensorFlow, you cannot, because they are different data types. This is because when training large neural networks, type conversions can significantly increase training time, so TensorFlow will not automatically convert your data type. It will raise an exception if you execute an operation with incompatible types.

# 3.

In [None]:
import tensorflow as tf

tf.range(10)

In [None]:
import numpy as np

tf.constant(np.arange(10))

Yes, they seem the same.

4. TensorFlow supports several other data structures, beside regular tensors: *sparse tensors* (efficiently represent tensors containing mostly 0s), *tensor arrays* (lists of tensors of a fixed size, *all tensors must have the same shape & data type), *ragged tensors* (static lists of lists of tensors, where every tensor has the same shape & data type), *string tensors* (regular tensors of type `tf.string`), *sets* (regular or sparse tensors containing multiple sets of values {ex: `tf.constant([[1, 2], [3, 4]])` contains two sets {1, 2} & {3, 4}), & *queues* (store tensors across multiple steps).
5. If you create a custom loss function by writing a python function, the model containing your custom loss function can be saved & loaded, but you would have to manually set your loss function's hyperparameters every time you load your model. In other words, the hyperparameters of your custom loss function will not be saved. You can save the hyperparameters of your loss function along with your model by creating a subclass of the `keras.losses.Loss` class & implementing its `__init__()`, `call()`, & `get_config()` method. When you load your model this way, you just need to map the class name to the class, same as you would for a python function, but with a python function, you would have the specify the value for the hyperparameter.
6. Similar to loss functions, subclassing `keras.metrics.Metric` will save your metric's hyperparameters along with your model. Also, for metrics that cannot be averaged over batches, like precision, you must implement a streaming metric, which requires you to subclass `keras.metrics.Metric`. If you do not want to save your metrics hyperparameters along with the model, or your metric can be averaged over batches, then a simple python function will suffice.
7. You should create custom layers or custom blocks of layers if your model's architecture is very repetitive. For custom layers with no weights, you could wrap it in a `keras.layers.Lambda` layer, but with weights, you would subclass the `keras.layers.Layer` class. You should create a custom model if the model's architecture cannot be created given the tools of keras. For custom models, you should subclass the `keras.models.Model` class.
8. You might write a custom training loop if you want full control of of the training process, or if you just want to understand what's going on during model training, or if you want to use different optimisers for different parts of your neural network, like for the wide & deep example. Most of the time though, you want to avoid writing custom training loops because they are error-prone; you need to make sure a lot of thing are right for the loop to function properly.
9. When you write a custom loss function, metric, layer, or any other function & use it in a keras model, it will automatically be converted into a tf function for more efficient execution. Most of the time though, you can either decorate your python function with `@tf.function` or let keras handle it. For arbitrary python code, you should wrap it in `tf.py_function()`. This will hinder efficiency & portability, because TensorFlow cannot perform graph optimisations on the code & the graph will only run on platforms where python is available.
10. (1) Use TensorFlow constructs as much as you can. For arbitrary python code, wrap it in a `tf.py_function()` operation. (2) Create variables outside of the tf function (e.g., in the `build()` method of a custom layer. If you want to assign a new value to the variable, call its `assign()` method instead of using the `=` operator. (3) The source code of your python function should be available to TensorFlow. (4) Make sure your loops iterate over a tensor or a dataset (Use `for i in tf.range(x)` instead of `for i in range(x)` so that the graph will be optimised). (5) Use vectorised implementations to optimise for efficiency.
11. To create a dynamic keras model, set `dynamic = True` when creating the model. Alternatively, you can set `run_eagerly = True` when calling the model's `compile()` method. You might create a dynamic model for help in debugging, but it really slows down the training process, because the model will not create a graph for efficiency optimisation.

# 12.

In [1]:
import tensorflow as tf
import keras

class LayerNormalisation(keras.layers.Layer):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
    def build(self, input_shape):
        self.alpha = self.add_weight(name = "alpha", shape = input_shape[-1:],
                                     initializer = "Ones") # keras.initializers.Ones()
        self.beta = self.add_weight(name = "beta", shape = input_shape[-1:],
                                    initializer = "Zeros") # keras.initializers.Zeros()
        super().build(input_shape) # must have at the end
    def call(self, inputs):
        mean, var = tf.nn.moments(inputs, axes = -1, keepdims = True)
        return self.alpha * (inputs - mean)/(tf.sqrt(var + 0.001)) + self.beta
    def compute_output_shape(self, input_shape):
        return tf.TensorShape(input_shape.as_list()[:-1])
    def get_config(self):
        base_config = super().get_config()
        return {**base_config}

In [2]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
import numpy as np

housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target.reshape(-1, 1))
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train)

X = X_train.astype(np.float32)

custom_layernorm = LayerNormalisation()
keras_layernorm = keras.layers.LayerNormalization()

tf.reduce_mean(keras.losses.mean_absolute_error(keras_layernorm(X), 
                                                custom_layernorm(X)))

2024-09-06 21:41:23.586517: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


<tf.Tensor: shape=(), dtype=float32, numpy=4.229783e-08>

# 13.

In [3]:
import tensorflow as tf
from tensorflow import keras
from keras.datasets import fashion_mnist

(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()
X_train, y_train = X_train[6000:] / 255.0, y_train[6000:] / 255.0
X_val, y_val = X_train[:6000] / 255.0, y_train[:6000] / 255.0

In [4]:
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape = (28, 28)),
    keras.layers.Dense(100, activation = "relu"), # selu takes too long tbh
    keras.layers.Dense(10, activation = "softmax")
])

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense (Dense)               (None, 100)               78500     
                                                                 
 dense_1 (Dense)             (None, 10)                1010      
                                                                 
Total params: 79,510
Trainable params: 79,510
Non-trainable params: 0
_________________________________________________________________


In [5]:
n_epochs = 1
batch_size = 32
n_steps = len(X_train) // batch_size
optimizer = keras.optimizers.Nadam(learning_rate = 0.01)
loss_fn = keras.losses.sparse_categorical_crossentropy
mean_loss = keras.metrics.Mean()
metrics = [keras.metrics.SparseCategoricalAccuracy()]

In [6]:
def random_batch(X, y, batch_size = 32):
    num_id = np.random.randint(len(X), size = batch_size)
    return X[num_id], y[num_id]

In [7]:
from tqdm import trange
import numpy as np
from collections import OrderedDict

with trange(1, n_epochs + 1, desc="All epochs") as epochs:
    for epoch in epochs:
        with trange(1, n_steps + 1, desc="Epoch {}/{}".format(epoch, n_epochs)) as steps:
            for step in steps:
                X_batch, y_batch = random_batch(X_train, y_train)
                with tf.GradientTape() as tape:
                    y_pred = model(X_batch)
                    main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
                    loss = tf.add_n([main_loss] + model.losses)
                gradients = tape.gradient(loss, model.trainable_variables)
                optimizer.apply_gradients(zip(gradients, model.trainable_variables))
                for variable in model.variables:
                    if variable.constraint is not None:
                        variable.assign(variable.constraint(variable))                    
                status = OrderedDict()
                mean_loss(loss)
                status["loss"] = mean_loss.result().numpy()
                for metric in metrics:
                    metric(y_batch, y_pred)
                    status[metric.name] = metric.result().numpy()
                steps.set_postfix(status)
            y_pred = model(X_val)
            status["val_loss"] = np.mean(loss_fn(y_val, y_pred))
            status["val_accuracy"] = np.mean(keras.metrics.sparse_categorical_accuracy(
                tf.constant(y_val, dtype = np.float32), y_pred))
            steps.set_postfix(status)
        for metric in [mean_loss] + metrics:
            metric.reset_states()

All epochs:   0%|          | 0/1 [00:00<?, ?it/s]
Epoch 1/1:   0%|          | 0/1687 [00:00<?, ?it/s][A
Epoch 1/1:   0%|          | 0/1687 [00:00<?, ?it/s, loss=1.81, sparse_categorical_accuracy=0][A
Epoch 1/1:   0%|          | 1/1687 [00:00<05:58,  4.71it/s, loss=1.81, sparse_categorical_accuracy=0][A
Epoch 1/1:   0%|          | 1/1687 [00:00<05:58,  4.71it/s, loss=0.908, sparse_categorical_accuracy=0.0938][A
Epoch 1/1:   0%|          | 1/1687 [00:00<05:58,  4.71it/s, loss=0.606, sparse_categorical_accuracy=0.125] [A
Epoch 1/1:   0%|          | 1/1687 [00:00<05:58,  4.71it/s, loss=0.455, sparse_categorical_accuracy=0.133][A
Epoch 1/1:   0%|          | 1/1687 [00:00<05:58,  4.71it/s, loss=0.364, sparse_categorical_accuracy=0.119][A
Epoch 1/1:   0%|          | 5/1687 [00:00<01:41, 16.58it/s, loss=0.364, sparse_categorical_accuracy=0.119][A
Epoch 1/1:   0%|          | 5/1687 [00:00<01:41, 16.58it/s, loss=0.303, sparse_categorical_accuracy=0.13] [A
Epoch 1/1:   0%|          | 5/1

In [8]:
keras.backend.clear_session()

lower_layers = keras.models.Sequential([
    keras.layers.Flatten(input_shape = (28, 28)),
    keras.layers.Dense(50, activation = "relu")
])

upper_layers = keras.models.Sequential([
    keras.layers.Dense(10, activation = "softmax")
])

model = keras.models.Sequential([lower_layers, upper_layers])

lower_optimiser = keras.optimizers.SGD(learning_rate = 0.1)
upper_optimiser = keras.optimizers.Nadam(learning_rate = 0.01)

In [9]:
n_epochs = 1
batch_size = 32
n_steps = len(X_train) // batch_size
loss_fn = keras.losses.sparse_categorical_crossentropy
mean_loss = keras.metrics.Mean()
metrics = [keras.metrics.SparseCategoricalAccuracy()]

In [10]:
with trange(1, n_epochs + 1, desc="All epochs") as epochs:
    for epoch in epochs:
        with trange(1, n_steps + 1, desc="Epoch {}/{}".format(epoch, n_epochs)) as steps:
            for step in steps:
                X_batch, y_batch = random_batch(X_train, y_train)
                with tf.GradientTape(persistent=True) as tape:
                    y_pred = model(X_batch)
                    main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
                    loss = tf.add_n([main_loss] + model.losses)
                for layers, optimizer in ((lower_layers, lower_optimiser),
                                          (upper_layers, upper_optimiser)):
                    gradients = tape.gradient(loss, layers.trainable_variables)
                    optimizer.apply_gradients(zip(gradients, layers.trainable_variables))
                del tape
                for variable in model.variables:
                    if variable.constraint is not None:
                        variable.assign(variable.constraint(variable))                    
                status = OrderedDict()
                mean_loss(loss)
                status["loss"] = mean_loss.result().numpy()
                for metric in metrics:
                    metric(y_batch, y_pred)
                    status[metric.name] = metric.result().numpy()
                steps.set_postfix(status)
            y_pred = model(X_val)
            status["val_loss"] = np.mean(loss_fn(y_val, y_pred))
            status["val_accuracy"] = np.mean(keras.metrics.sparse_categorical_accuracy(
                tf.constant(y_val, dtype=np.float32), y_pred))
            steps.set_postfix(status)
        for metric in [mean_loss] + metrics:
            metric.reset_states()

All epochs:   0%|          | 0/1 [00:00<?, ?it/s]
Epoch 1/1:   0%|          | 0/1687 [00:00<?, ?it/s][A
Epoch 1/1:   0%|          | 0/1687 [00:00<?, ?it/s, loss=2.24, sparse_categorical_accuracy=0][A
Epoch 1/1:   0%|          | 0/1687 [00:00<?, ?it/s, loss=1.32, sparse_categorical_accuracy=0.0312][A
Epoch 1/1:   0%|          | 0/1687 [00:00<?, ?it/s, loss=0.948, sparse_categorical_accuracy=0.0312][A
Epoch 1/1:   0%|          | 3/1687 [00:00<01:18, 21.45it/s, loss=0.948, sparse_categorical_accuracy=0.0312][A
Epoch 1/1:   0%|          | 3/1687 [00:00<01:18, 21.45it/s, loss=0.752, sparse_categorical_accuracy=0.0391][A
Epoch 1/1:   0%|          | 3/1687 [00:00<01:18, 21.45it/s, loss=0.632, sparse_categorical_accuracy=0.0437][A
Epoch 1/1:   0%|          | 3/1687 [00:00<01:18, 21.45it/s, loss=0.548, sparse_categorical_accuracy=0.0417][A
Epoch 1/1:   0%|          | 3/1687 [00:00<01:18, 21.45it/s, loss=0.481, sparse_categorical_accuracy=0.0714][A
Epoch 1/1:   0%|          | 3/1687 [00