# Custom Models and Training with TensorFlow

Up until now, we’ve used only TensorFlow’s high-level API, tf.keras, but it already got us pretty far: we built various neural network architectures, including regression and classification nets, Wide & Deep nets, and self-normalizing nets, using all sorts of techniques, such as Batch Normalization, dropout, and learning rate schedules. In fact, 95% of the use cases you will encounter will not require anything other than tf.keras and tf.data;

But now it’s time to dive deeper into TensorFlow and take a look at its lower-level Python API. This will be useful when you need extra control to write custom loss functions, custom metrics, layers, models, initializers, regularizers, weight constraints, and more. 

You may even need to fully control the training loop itself, for example to apply special transformations or constraints to the gradients (beyond just clipping them) or to use multiple optimizers for different parts of the network.

## A Quick Tour of TensorFlow

### So what does TensorFlow offer? Here’s a summary:

#### 1) Its core is very similar to NumPy, but with GPU support.

#### 2) It supports distributed computing (across multiple devices and servers).

#### Important point
#### 3) It includes a kind of just-in-time (JIT) compiler that allows it to optimize computations for speed and memory usage. It works by extracting the computation graph from a Python function, then optimizing it (e.g., by pruning unused nodes), and finally running it efficiently (e.g., by automatically running independent operations in parallel).


#### Important point

#### 4) Computation graphs can be exported to a portable format, so you can train a TensorFlow model in one environment (e.g., using Python on Linux) and run it in another (e.g., using Java on an Android device).

#### 5) It implements autodiff and provides some excellent optimizers, such as RMSProp and Nadam, so you can easily minimize all sorts of loss functions.


### TensorFlow APIs

TensorFlow offers many more features built on top of these core features: the most important is of course tf.keras, but it also has data loading and preprocessing ops (tf.data, tf.io, etc.), image processing ops (tf.image), signal processing ops (tf.signal)

##### IMPORTANT!! See figure 12-1 TensorFlow’s Python API for the list of all the APIs offered by Tensorflow

### TensorFlow foundational structure

At the lowest level, each TensorFlow operation (op for short) is implemented using highly efficient C++ code. Many operations have multiple implementations called kernels: each kernel is dedicated to a specific device type, such as CPUs, GPUs, or even TPUs (tensor processing units). As you may know, GPUs can dramatically speed up computations by splitting them into many smaller chunks and running them in parallel across many GPU threads.

##### TPUs are even faster: they are custom ASIC chips built specifically for Deep Learning operations

##### To learn more about TPUs and how they work, check out https://homl.info/tpus.

TensorFlow’s architecture is shown in Figure 12-2. Most of the time your code will use the high-level APIs (especially tf.keras and tf.data); but when you need more flexibility, you will use the lower-level Python API, handling tensors directly. Note that APIs for other languages are also available. In any case, TensorFlow’s execution engine will take care of running the operations efficiently, even across multiple devices and machines if you tell it to.


### TensorFlow cross-device compatibility

TensorFlow runs not only on Windows, Linux, and macOS, but also on mobile devices (using TensorFlow Lite), including both iOS and Android. If you do not want to use the Python API, there are C++, Java, Go, and Swift APIs. There is even a JavaScript implementation called TensorFlow.js that makes it possible to run your models directly in your browser.

### TensorFlow Ecosystem

TensorFlow is at the center of an extensive ecosystem of libraries. First, there’s TensorBoard for visualization

Next, there’s TensorFlow Extended (TFX), which is a set of libraries built by Google to productionize TensorFlow projects: it includes tools for data validation, preprocessing, model analysis, and serving with TF Serving

Google’s TensorFlow Hub provides a way to easily download and reuse pretrained neural networks. You can also get many neural network architectures, some of them pre-trained, in TensorFlow’s model garden.

More and more ML papers are released along with their implementations, and sometimes even with pretrained models. Check out https://paperswithcode.com/ to easily find them.

## Using TensorFlow like NumPy

TensorFlow’s API revolves around tensors, which flow from operation to operation hence the name TensorFlow. A tensor is very similar to a NumPy ndarray: it is usually a multidimensional array, but it can also hold a scalar (a simple value, such as 42). These tensors will be important when we create custom cost functions, custom metrics, custom layers, and more, so let’s see how to create and manipulate them.

In [1]:
import tensorflow as tf
import numpy as np




### Tensors and Operations

You can create a tensor with tf.constant(). For example, here is a tensor representing a matrix with two rows and three columns of floats:

In [2]:
sample_tensor = tf.constant([[1., 2., 3.], [4., 5., 6.]])

In [3]:
sample_tensor

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

In [4]:
scalar_tensor = tf.constant(42)

In [5]:
scalar_tensor

<tf.Tensor: shape=(), dtype=int32, numpy=42>

##### Just like an ndarray, a tf.Tensor has a shape and a data type (dtype):

In [6]:
sample_tensor.shape

TensorShape([2, 3])

In [7]:
sample_tensor.dtype

tf.float32

##### Indexing works much like in NumPy:

In [8]:
sample_tensor[:, 1:]

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[2., 3.],
       [5., 6.]], dtype=float32)>

#### Tensor operations

##### Addition

In [9]:
sample_tensor + 10


<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[11., 12., 13.],
       [14., 15., 16.]], dtype=float32)>

In [10]:
sample_tensor

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

##### squaring

In [11]:
tf.square(sample_tensor)

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)>

In [12]:
sample_tensor

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

##### Matrix Multiplication tf.matmul() 

In [13]:
sample_tensor @ tf.transpose(sample_tensor) # The @ operator was added in Python 3.5, for matrix multiplication: it is equivalent to calling the tf.matmul() function.

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[14., 32.],
       [32., 77.]], dtype=float32)>

### TensoFlow operation list

1) You will find all the basic math operations you need (tf.add(), tf.multiply(), tf.square(), tf.exp(), tf.sqrt(), etc.) and most operations that you can find in NumPy (e.g., tf.reshape(), tf.squeeze(), tf.tile())

2) Some functions have a different name than in NumPy; for instance, tf.reduce_mean(), tf.reduce_sum(), tf.reduce_max(), and tf.math.log() are the equivalent of np.mean(), np.sum(), np.max() and np.log().

#### Important Point!!!!!!
4) When the name differs, there is often a good reason for it. For example, in TensorFlow you must write tf.transpose(t); you cannot just write t.T like in NumPy. The reason is that the tf.transpose() function does not do exactly the same thing as NumPy’s T attribute: in TensorFlow, a new tensor is created with its own copy of the transposed data, while in NumPy, t.T is just a transposed view on the same data. Similarly, the tf.reduce_sum() operation is named this way because its GPU kernel (i.e., GPU implementation) uses a reduce algorithm that does not guarantee the order in which the elements are added: because 32-bit floats have limited precision, the result may change ever so slightly every time you call this operation. The same is true of tf.reduce_mean() (but of course tf.reduce_max() is deterministic).

## Tensors and NumPy

Tensors play nice with NumPy: you can create a tensor from a NumPy array, and vice versa. You can even apply TensorFlow operations to NumPy arrays and NumPy operations to tensors:

#### Arrays to Tensor

In [14]:
a = np.array([2., 4., 5.])
a

array([2., 4., 5.])

In [15]:
tensor_a = tf.constant(a)
tensor_a

<tf.Tensor: shape=(3,), dtype=float64, numpy=array([2., 4., 5.])>

In [16]:
tensor_a[1]

# tensor_a[1] = 3

<tf.Tensor: shape=(), dtype=float64, numpy=4.0>

#### Tensors to Array

In [17]:
sample_tensor_to_numpy = sample_tensor.numpy()
print(sample_tensor_to_numpy,'converting using TF')

print('\n\n\n')
# The above part can also be done as follows
sample_numpy_array = np.array(sample_tensor)
print(sample_numpy_array,'converting using numpy')

[[1. 2. 3.]
 [4. 5. 6.]] converting using TF




[[1. 2. 3.]
 [4. 5. 6.]] converting using numpy


### IMPORTANT WARNING!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
NumPy uses 64-bit precision by default, while TensorFlow uses 32-bit. This is because 32-bit precision is generally more than enough for neural networks, plus it runs faster and uses less RAM. So when you create a tensor from a NumPy array, make sure to set dtype=tf.float32.

### Type Conversions

Type conversions can significantly hurt performance, and they can easily go unnoticed when they are done automatically. To avoid this, TensorFlow does not perform any type conversions automatically: it just raises an exception if you try to execute an operation on tensors with incompatible types.

For example, you cannot add a float tensor and an integer tensor, and you cannot even add a 32-bit float and a 64-bit float:

In [18]:
# tf.constant(2.) + tf.constant(40)

In [19]:
# tf.constant(2.,dtype=tf.float32) + tf.constant(40., dtype=tf.float64)

#### This may be a bit annoying at first, but remember that it’s for a good cause! And of course you can use tf.cast() when you really need to convert types:

In [20]:
t2 = tf.constant(40., dtype=tf.float64)

In [21]:
tf.constant(2.0) + tf.cast(t2, tf.float32)

<tf.Tensor: shape=(), dtype=float32, numpy=42.0>

### Variables

The tf.Tensor values we’ve seen so far are immutable: you cannot modify them. This means that we cannot use regular tensors to implement weights in a neural network, since they need to be tweaked by backpropagation. Plus, other parameters may also need to change over time (e.g., a momentum optimizer keeps track of past gradients).

#### What we need is a tf.Variable:

In [22]:
v = tf.Variable([[1., 2., 3.], [4., 5., 6.]])
v

<tf.Variable 'Variable:0' shape=(2, 3) dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

A tf.Variable acts much like a tf.Tensor: you can perform the same operations with it, it plays nicely with NumPy as well, and it is just as picky with types. But it can also be modified in place using the assign() method (or assign_add() or assign_sub(), which increment or decrement the variable by the given value). You can also modify individual cells (or slices), by using the cell’s (or slice’s) assign() method (direct item assignment will not work) or by using the scatter_update() or scatter_nd_update() methods:

#### Assign

In [23]:
v.assign(2 * v)


<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]], dtype=float32)>

In [24]:
v

<tf.Variable 'Variable:0' shape=(2, 3) dtype=float32, numpy=
array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]], dtype=float32)>

In [25]:
v[0, 1].assign(42)
v

<tf.Variable 'Variable:0' shape=(2, 3) dtype=float32, numpy=
array([[ 2., 42.,  6.],
       [ 8., 10., 12.]], dtype=float32)>

In [26]:
v[:, 2].assign([0., 1.])

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2., 42.,  0.],
       [ 8., 10.,  1.]], dtype=float32)>

#### Scatter and update

In [27]:
v.scatter_nd_update(indices=[[0, 0], [1, 2]], updates=[100., 200.])

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[100.,  42.,   0.],
       [  8.,  10., 200.]], dtype=float32)>

### Other Data Structures

##### Sparse tensors (tf.SparseTensor)

Efficiently represent tensors containing mostly zeros. The tf.sparse package contains operations for sparse tensors.

##### Tensor arrays (tf.TensorArray)

Are lists of tensors. They have a fixed size by default but can optionally be made dynamic. All tensors they contain must have the same shape and data type.

##### Ragged tensors (tf.RaggedTensor)

Represent static lists of lists of tensors, where every tensor has the same shape
and data type. The tf.ragged package contains operations for ragged tensors.

##### String tensors
Are regular tensors of type tf.string. These represent byte strings, not Unicode strings, so if you create a string tensor using a Unicode string (e.g., a regular Python 3 string like "café"), then it will get encoded to UTF-8 automatically (e.g., b"caf\xc3\xa9"). Alternatively, you can represent Unicode strings using tensors of type tf.int32, where each item represents a Unicode code point (e.g., [99, 97, 102, 233]). The tf.strings package (with an s) contains ops for byte strings and Unicode strings (and to convert one into the other). It’s important to note that a tf.string is atomic, meaning that its length does not appear in the tensor’s shape. Once you convert it to a Unicode tensor (i.e., a tensor of type tf.int32 holding Unicode code points), the length appears in the shape.

##### Sets
Are represented as regular tensors (or sparse tensors). For example, tf.constant([[1, 2], [3, 4]]) represents the two sets {1, 2} and {3, 4}. More generally, each set is represented by a vector in the tensor’s last axis. You can manipulate sets using operations from the tf.sets package.


##### Queues
Store tensors across multiple steps. TensorFlow offers various kinds of queues: simple First In, First Out (FIFO) queues (FIFOQueue), queues that can prioritize some items (PriorityQueue), shuffle their items (RandomShuffleQueue), and batch items of different shapes by padding (PaddingFIFOQueue). These classes are all in the tf.queue package.

## Customizing Models and Training Algorithms

### Custom Loss Functions

Let’s start by creating a custom loss function, which is a simple and common use case.

#### Custom Loss Function Scenario

Suppose you want to train a regression model, but your training set is a bit noisy. Of course, you start by trying to clean up your dataset by removing or fixing the outliers, but that turns out to be insufficient; the dataset is still noisy. Which loss function should you use? 

The mean squared error might penalize large errors too much and cause your model to be imprecise. The mean absolute error would not penalize outliers as much, but training might take a while to converge, and the trained model might not be very precise.

This is probably a good time to use the Huber loss (introduced in Chapter 10) instead of the good old MSE.

The Huber loss is not currently part of the official Keras API, but it is available in tf.keras (just use an instance of the keras.losses.Huber class). But let’s pretend it’s not there:

Just create a function that takes the labels and predictions as arguments, and use TensorFlow operations to compute every instance’s loss:

In [28]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import keras
housing = fetch_california_housing()
X_train_full, X_test, y_train_full, y_test = train_test_split(
    housing.data, housing.target.reshape(-1, 1), random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(
    X_train_full, y_train_full, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_valid_scaled = scaler.transform(X_valid)
X_test_scaled = scaler.transform(X_test)

#### Creating the Custom Loss Function

In [29]:
def huber_fn(y_true, y_pred):
    print(y_true,y_pred)
    error = y_true - y_pred
    is_small_error = tf.abs(error) < 1
    squared_loss = tf.square(error) / 2
    linear_loss = tf.abs(error) - 0.5
    print(tf.where(is_small_error, squared_loss, linear_loss),' dekhle')
    return tf.where(is_small_error, squared_loss, linear_loss)

#### Important Note about the Custom Loss Function
It is also preferable to return a tensor containing one loss per instance, rather than returning the mean loss. This way, Keras can apply class weights or sample weights when requested ( Refer the Important Note about Training block from the chapter 10 jupyter nb)

In [30]:
X_train.shape[1:]

(8,)

#### Training a network with the custom loss function

In [31]:
input_shape = X_train.shape[1:]

model = keras.models.Sequential([
    keras.layers.Dense(30, activation="selu", kernel_initializer="lecun_normal",
                       input_shape=input_shape),
    keras.layers.Dense(1),
])




In [32]:
model.compile(loss=huber_fn, optimizer="nadam", metrics=["mae"])




In [33]:
model.fit(X_train_scaled, y_train, epochs=2,
          validation_data=(X_valid_scaled, y_valid))


Epoch 1/2
Tensor("IteratorGetNext:1", shape=(None, 1), dtype=float32) Tensor("sequential/dense_1/BiasAdd:0", shape=(None, 1), dtype=float32)
Tensor("huber_fn/SelectV2:0", shape=(None, 1), dtype=float32)  dekhle


Tensor("IteratorGetNext:1", shape=(None, 1), dtype=float32) Tensor("sequential/dense_1/BiasAdd:0", shape=(None, 1), dtype=float32)
Tensor("huber_fn/SelectV2:0", shape=(None, 1), dtype=float32)  dekhle
Tensor("huber_fn/SelectV2:0", shape=(None, 1), dtype=float32)  dekhle
Epoch 2/2


<keras.src.callbacks.History at 0x275c910a1d0>

#### Working of the model
For each batch during training, Keras will call the huber_fn() function to compute the loss and use it to perform a Gradient Descent step. Moreover, it will keep track of the total loss since the beginning of the epoch, and it will display the mean loss.

### Saving and Loading Models That Contain Custom Components

Now, in the previous section you defined a custom loss function, But what happens to this custom loss when you save the model?

Saving a model containing a custom loss function works fine, as Keras saves the name of the function. Whenever you load it, you’ll need to provide a dictionary that maps the function name to the actual function. More generally, when you load a model containing custom objects, you need to map the names to the objects:


In [34]:
model.save("my_model_with_a_custom_loss.keras")

Saving a model containing a custom loss function works fine, as Keras saves the name of the function. Whenever you load it, you’ll need to provide a dictionary that maps the function name to the actual function.

More generally, when you load a model containing custom objects, you need to map the names to the objects:

In [35]:
model = keras.models.load_model("my_model_with_a_custom_loss.keras", custom_objects={"huber_fn": huber_fn})

#### Customized Threshold

With the current implementation, any error between –1 and 1 is considered “small.” But what if you want a different threshold? One solution is to create a function that creates a configured loss function:

In [36]:
def create_huber(threshold=1.0):
    def huber_fn(y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < threshold
        squared_loss = tf.square(error) / 2
        linear_loss = threshold * tf.abs(error) - threshold**2 / 2
        return tf.where(is_small_error, squared_loss, linear_loss)
    return huber_fn


In [37]:
model.compile(loss=create_huber(2.0), optimizer="nadam")

In [38]:
model.save('my_model_with_a_custom_loss_threshold_2.keras')

#### !!!WARNING FOR CUSTOMIZED THRESHOLD!!!!!

Unfortunately, when you save the model, the threshold will not be saved. This means that you will have to specify the threshold value when loading the model (note that the name to use is "huber_fn", which is the name of the function you gave Keras, not the name of the function that created it, which basically means the function that actually calculated the loss, yes it means that although you pass reate_huber(2.0) when you compile the model, the function that actually calculates the loss is the huber_fn() and that is what keras also gets, so remember to pass that name.):


In [39]:
model = keras.models.load_model("my_model_with_a_custom_loss_threshold_2.keras",
 custom_objects={"huber_fn": create_huber(2.0)})



### Implemention Keras Subclassing API to solve the problem of custom threshold not being saved

You can solve this by creating a subclass of the keras.losses.Loss class, and then implementing its get_config() method:

In [40]:
class HuberLoss(keras.losses.Loss):
    def __init__(self, threshold=1.0, **kwargs):
        self.threshold = threshold
        super().__init__(**kwargs)
    def call(self, y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < self.threshold
        squared_loss = tf.square(error) / 2
        linear_loss = self.threshold * tf.abs(error) - self.threshold**2 / 2
        return tf.where(is_small_error, squared_loss, linear_loss)
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "threshold": self.threshold}

#### Understanding the above code

1) The constructor accepts **kwargs and passes them to the parent constructor, which handles standard hyperparameters: the name of the loss and the reduction algorithm to use to aggregate the individual instance losses. By default, it is "sum_over_batch_size", which means that the loss will be the sum of the instance losses, weighted by the sample weights, if any, and divided by the batch size (not by the sum of weights, so this is not the weighted mean. It would not be a good idea to use a weighted mean: if you did, then two instances with the same weight but in different batches would have a different impact on training, depending on the total weight of each batch.). Other possible values are "sum" and "none".

2) The call() method takes the labels and predictions, computes all the instance losses, and returns them.


3) The get_config() method returns a dictionary mapping each hyperparameter name to its value. It first calls the parent class’s get_config() method, then adds the new hyperparameters to this dictionary (the new hyperparameters are the new loss calculating ways that you defined). Now, a side note about the **base_config, so first of all this get_config() is called only when you save or serialize or clone the model, it is a way for keras to remember the custom things that you implemented, like the loss function. Now what **base_config does is that it's like the **kwargs like it is a dictionary, but here instead of passing the values to a function, it actually adds the self.threshold into the original  base_config dictionary. AN EXAMPLE IS SHOWN BELOW PLASE CHECK!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

In [41]:
def upd_dict(dictio):
    
    return {**dictio,'stof':321}

dd = {'aaa':11,'ddd':234}

update_dict = upd_dict(dictio=dd)
print(update_dict,' first one \n\n\n')

print({**update_dict,'my_day':111})

{'aaa': 11, 'ddd': 234, 'stof': 321}  first one 



{'aaa': 11, 'ddd': 234, 'stof': 321, 'my_day': 111}


#### Compiling the model

You can then use any instance of this class when you compile the model:

In [42]:
model.compile(loss=HuberLoss(2.), optimizer="nadam")

In [43]:
model.fit(X_train_scaled, y_train, epochs=2,
          validation_data=(X_valid_scaled, y_valid))


Epoch 1/2
Epoch 2/2


<keras.src.callbacks.History at 0x275c9797cd0>

#### Saving this model

When you save the model, the threshold will be saved along with it; and when you load the model, you just need to map the class name to the class itself:

In [44]:
model.save('my_model_with_a_custom_loss_threshold_and_subclassedAPI.keras')

#### Loading the model

In [45]:
model = keras.models.load_model("my_model_with_a_custom_loss_threshold_and_subclassedAPI.keras", custom_objects={"HuberLoss": HuberLoss})

#### Working of the Saving and Loading of the model
When you save a model, Keras calls the loss instance’s get_config() method and saves the config as JSON. When you load the model, it calls the from_config() class method on the HuberLoss class: this method is implemented by the base class (Loss) and creates an instance of the class, passing **config to the constructor.

### Custom Activation Functions, Initializers, Regularizers, and Constraints

Most Keras functionalities, such as losses, regularizers, constraints, initializers, metrics, activation functions, layers, and even full models, can be customized in very much the same way. Most of the time, you will just need to write a simple function with the appropriate inputs and outputs. Here are examples of a custom activation function (equivalent to keras.activations.softplus() or tf.nn.softplus()), a custom Glorot initializer (equivalent to keras.initializers.glorot_normal()), a custom ℓ1 regularizer (equivalent to keras.regularizers.l1(0.01)), and a custom constraint that ensures weights are all positive (equivalent to keras.constraints.nonneg() or tf.nn.relu()):

In [46]:
def my_softplus(z): # return value is just tf.nn.softplus(z)
    return tf.math.log(tf.exp(z) + 1.0)


def my_glorot_initializer(shape, dtype=tf.float32):
    stddev = tf.sqrt(2. / (shape[0] + shape[1]))
    return tf.random.normal(shape, stddev=stddev, dtype=dtype)


def my_l1_regularizer(weights):
    return tf.reduce_sum(tf.abs(0.01 * weights))


def my_positive_weights(weights): # return value is just tf.nn.relu(weights)
    return tf.where(weights < 0., tf.zeros_like(weights), weights)

As you can see, the arguments depend on the type of custom function. These custom functions can then be used normally; for example:

In [47]:
layer = keras.layers.Dense(30, activation=my_softplus, kernel_initializer=my_glorot_initializer, kernel_regularizer=my_l1_regularizer, kernel_constraint=my_positive_weights)

#### Working of the layer with Customized parameters

1) The activation function will be applied to the output of this Dense layer, and its result will be passed on to the next layer. 

2) The layer’s weights will be initialized using the value returned by the initializer.

#### IMPORTANT POINT!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
3) At each training step the weights will be passed to the regularization function to compute the regularization loss, which will be added to the main loss to get the final loss used for training.

4) Finally, the constraint function will be called after each training step, and the layer’s weights will be replaced by the constrained weights(def my_positive_weights(weights):).

#### Training the Custom model

In [48]:
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)


In [49]:
model = keras.models.Sequential([
    keras.layers.Dense(30, activation="selu", kernel_initializer="lecun_normal",
                       input_shape=input_shape),
    keras.layers.Dense(1, activation=my_softplus,
                       kernel_regularizer=my_l1_regularizer,
                       kernel_constraint=my_positive_weights,
                       kernel_initializer=my_glorot_initializer),
])


In [50]:
model.compile(loss="mse", optimizer="nadam", metrics=["mae"])

In [51]:
model.fit(X_train_scaled, y_train, epochs=2,
          validation_data=(X_valid_scaled, y_valid))



Epoch 1/2
Epoch 2/2


<keras.src.callbacks.History at 0x275cc46c8d0>

#### Saving and Loading the Custom Model

In [52]:
model.save("my_model_with_many_custom_parts.keras")



In [53]:
model = keras.models.load_model(
    "my_model_with_many_custom_parts.keras",
    custom_objects={
       "my_l1_regularizer": my_l1_regularizer,
       "my_positive_weights": my_positive_weights,
       "my_glorot_initializer": my_glorot_initializer,
       "my_softplus": my_softplus,
    })

#### Implementing Subclassing API for custom functions that have hyperparameters that need to be saved separately

If a function has hyperparameters that need to be saved along with the model, then you will want to subclass the appropriate class, such as keras.regularizers.Regularizer, keras.constraints.Constraint, keras.initializers.Initializer, or keras.layers.Layer (for any layer, including activation functions). 

##### Implementing Subclassing API for Custom Regularizer

Much like we did for the custom loss, here is a simple class for ℓ1 regularization that saves its factor hyperparameter

#### IMPORTANT NOTE ABOUT THE REGULARIZER SUBCLASS API!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
(this time we do not need to call the parent constructor or the get_config() method, as they are not defined by the parent class):

In [54]:
class MyL1Regularizer(keras.regularizers.Regularizer):
    def __init__(self, factor):
        self.factor = factor
    def __call__(self, weights):
        return tf.reduce_sum(tf.abs(self.factor * weights))
    def get_config(self):
        return {"factor": self.factor}


In [55]:
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)


In [56]:
model = keras.models.Sequential([
    keras.layers.Dense(30, activation="selu", kernel_initializer="lecun_normal",
                       input_shape=input_shape),
    keras.layers.Dense(1, activation=my_softplus,
                       kernel_regularizer=MyL1Regularizer(0.01),
                       kernel_constraint=my_positive_weights,
                       kernel_initializer=my_glorot_initializer),
])

In [57]:
model.compile(loss="mse", optimizer="nadam", metrics=["mae"])



In [58]:
model.fit(X_train_scaled, y_train, epochs=2,
          validation_data=(X_valid_scaled, y_valid))

Epoch 1/2
Epoch 2/2


<keras.src.callbacks.History at 0x275cc6d2350>

In [59]:
model.save("my_model_with_many_custom_parts_and_subclassingAPI.keras")

In [60]:
model = keras.models.load_model(
    "my_model_with_many_custom_parts_and_subclassingAPI.keras",
    custom_objects={
       "MyL1Regularizer": MyL1Regularizer,
       "my_positive_weights": my_positive_weights,
       "my_glorot_initializer": my_glorot_initializer,
       "my_softplus": my_softplus,
    })


## Custom Metrics

Losses and metrics are conceptually not the same thing:

#### Losses

Losses (e.g., cross entropy) are used by Gradient Descent to train a model, so they must be differentiable (at least where they are evaluated), and their gradients should not be 0 everywhere. Plus, it’s OK if they are not easily interpretable by humans.

#### Metrics

Metrics (e.g., accuracy) are used to evaluate a model: they must be more easily interpretable, and they can be non-differentiable or have 0 gradients everywhere.


#### Metrics and Losses Overlap

That said, in most cases, defining a custom metric function is exactly the same as defining a custom loss function. In fact, we could even use the Huber loss function we created earlier as a metric; it would work just fine (and persistence would also work the same way, in this case only saving the name of the function, "huber_fn"):

In [61]:
model.compile(loss="mse", optimizer="nadam", metrics=[create_huber(2.0)])

For each batch during training for the model defined above, Keras will compute this metric and keep track of its mean since the beginning of the epoch. Most of the time, this is exactly what you want, BUT NOT ALWAYS!! IN THE BELOW SUBSECTION  Metrics and Losses DIFFERENCE!!! WE SEE A SCENARIO THAT HIGHLIGHTS THE DIFFERENCE.

#### Metrics and Losses DIFFERENCE!!!

Consider a binary classifier’s precision, for example. As we saw in Chapter 3, precision is the number of true positives divided by the number of positive predictions (including both true positives and false positives).

Suppose the model made five positive predictions in the first batch, four of which were correct: that’s 80% precision.

Then suppose the model made three positive predictions in the second batch, but they were all incorrect: that’s 0% precision for the second batch.

If you just compute the mean of these two precisions, you get 40%.

But wait a second—that’s not the model’s precision over these two batches! Indeed, there were a total of four true positives (4 + 0) out of eight positive predictions (5 + 3), so the overall precision is 50%, not 40%.

What we need is an object that can keep track of the number of true positives and the number of false positives and that can compute their ratio when requested.

##### This is precisely what the keras.metrics.Precision class does:

In [62]:
precision = keras.metrics.Precision()

In [63]:
precision([0, 1, 1, 1, 0, 1, 0, 1], [1, 1, 0, 1, 0, 1, 0, 1])


<tf.Tensor: shape=(), dtype=float32, numpy=0.8>

In [64]:
 precision([0, 1, 0, 0, 1, 0, 1, 1], [1, 0, 1, 1, 0, 0, 0, 0])

<tf.Tensor: shape=(), dtype=float32, numpy=0.5>

#### Explanation of Precision object of the keras.metrics.Precision created in the above example!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

In this example, we created a Precision object, then we used it like a function, passing it the labels and predictions for the first batch, then for the second batch (note that we could also have passed sample weights).

We used the same number of true and false positives as in the example we just discussed.

#### IMPORTANT POINT!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
After the first batch, it returns a precision of 80%; then after the second batch, it returns 50% (which is the overall precision so far, not the second batch’s precision).

This is called a streaming metric (or stateful metric), as it is gradually updated, batch after batch.

#### Precision Attributes

At any point, we can call the result() method to get the current value of the metric. We can also look at its variables (tracking the number of true and false positives) by using the variables attribute, and we can reset these variables using the reset_states() method:

In [65]:
precision.result()

<tf.Tensor: shape=(), dtype=float32, numpy=0.5>

In [66]:
precision.variables


[<tf.Variable 'true_positives:0' shape=(1,) dtype=float32, numpy=array([4.], dtype=float32)>,
 <tf.Variable 'false_positives:0' shape=(1,) dtype=float32, numpy=array([4.], dtype=float32)>]

In [67]:
precision.reset_states() # both variables get reset to 0.0

In [68]:
precision.variables

[<tf.Variable 'true_positives:0' shape=(1,) dtype=float32, numpy=array([0.], dtype=float32)>,
 <tf.Variable 'false_positives:0' shape=(1,) dtype=float32, numpy=array([0.], dtype=float32)>]

### Creating a Streaming Metric using Subclassing

If you need to create such a streaming metric, create a subclass of the keras.metrics.Metric class.

Here is a simple example that keeps track of the total Huber loss and the number of instances seen so far. When asked for the result, it returns the ratio, which is simply the mean Huber loss:


In [69]:
class HuberMetric(keras.metrics.Metric):
    
    def __init__(self, threshold=1.0, **kwargs):
        super().__init__(**kwargs) # handles base args (e.g., dtype)
        self.threshold = threshold
        self.huber_fn = create_huber(threshold)
        self.total = self.add_weight("total", initializer="zeros")
        self.count = self.add_weight("count", initializer="zeros")

    
    def update_state(self, y_true, y_pred, sample_weight=None):
        metric = self.huber_fn(y_true, y_pred)
        self.total.assign_add(tf.reduce_sum(metric))
        self.count.assign_add(tf.cast(tf.size(y_true), tf.float32))

    
    def result(self):
        return self.total / self.count

    
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "threshold": self.threshold}

#### Working of the Code!!!

1) The constructor uses the add_weight() method to create the variables which are self.total and self.count needed to keep track of the metric’s state over multiple batches—in this case, the sum of all Huber losses (total) and the number of instances seen so far (count). You could just create variables manually if you preferred. Keras tracks any tf.Variable that is set as an attribute (and more generally, any “trackable” object, such as layers or models).

2) The update_state() method is called when you use an instance of this class as a function (as we did with the Precision object and what we do when write model.compile(loss=create_huber(2.0), optimizer="nadam", metrics=[HuberMetric(2.0)]), here metrics=[HuberMetric(2.0)] means we are using the instance metrics as a function.). It updates the variables which are the total and the count, given the labels and predictions for one batch (and sample weights, but in this case we ignore them).

3) The result() method computes and returns the final result, in this case the mean Huber metric over all instances. When you use the metric as a function, the update_state() method gets called first, then the result() method is called, and its output is returned.

4) We also implement the get_config() method to ensure the threshold gets saved along with the model.

#### Training a model with this custom metric

In [70]:
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)


In [71]:
model = keras.models.Sequential([
    keras.layers.Dense(30, activation="selu", kernel_initializer="lecun_normal",
                       input_shape=input_shape),
    keras.layers.Dense(1),
])


In [72]:
model.compile(loss=create_huber(2.0), optimizer="nadam", metrics=[HuberMetric(2.0)])


In [73]:
model.fit(X_train_scaled.astype(np.float32), y_train.astype(np.float32), epochs=2)

Epoch 1/2
Epoch 2/2


<keras.src.callbacks.History at 0x275cd770b90>

In [74]:
model.save("my_model_with_a_custom_metric.keras")

In [75]:
model = keras.models.load_model("my_model_with_a_custom_metric.keras",
                                custom_objects={"huber_fn": create_huber(2.0),
                                                "HuberMetric": HuberMetric})

#### Custom Metrics conclusion
When you define a metric using a simple function, Keras automatically calls it for each batch, and it keeps track of the mean during each epoch, just like we did manually. So the only benefit of our HuberMetric class is that the threshold will be saved. But of course, some metrics, like precision, cannot simply be averaged over batches: in those cases, there’s no other option than to implement a streaming metric.


## Custom Layers

#### Custom Layer Scenario

You may occasionally want to build an architecture that contains an exotic layer for which TensorFlow does not provide a default implementation. In this case, you will need to create a custom layer. Or you may simply want to build a very repetitive architecture, containing identical blocks of layers repeated many times, and it would be convenient to treat each block of layers as a single layer. For example, if the model is a sequence of layers A, B, C, A, B, C, A, B, C, then you might want to define a custom layer D containing layers A, B, C, so your model would then simply be D, D, D. Let’s see how to build custom layers.

#### Layers without Weights

First, some layers have no weights, such as keras.layers.Flatten or keras.layers.ReLU. If you want to create a custom layer without any weights, the simplest option is to write a function and wrap it in a keras.layers.Lambda layer. For example, the following layer will apply the exponential function to its inputs:

In [76]:
exponential_layer = keras.layers.Lambda(lambda x: tf.exp(x))

This custom layer can then be used like any other layer, using the Sequential API, the Functional API, or the Subclassing API. You can also use it as an activation function (or you could use activation=tf.exp, activation=keras.activations.exponential, or simply activation="exponential"). The exponential layer is sometimes used in the output layer of a regression model when the values to predict have very different scales (e.g., 0.001, 10., 1,000.).


In [77]:
model = keras.models.Sequential([
    keras.layers.Dense(30, activation="relu", input_shape=input_shape),
    keras.layers.Dense(1),
    exponential_layer
])
model.compile(loss="mse", optimizer="sgd")
model.fit(X_train_scaled, y_train, epochs=5,
          validation_data=(X_valid_scaled, y_valid))
model.evaluate(X_test_scaled, y_test)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


0.37407541275024414

### Custom Layer with Weights IMPORTANT!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

To build a custom stateful layer (i.e., a layer with weights), you need to create a subclass of the keras.layers.Layer class. For example, the following class implements a simplified version of the Dense layer:

In [78]:
class MyDense(keras.layers.Layer):
    def __init__(self, units, activation=None, **kwargs):
        super().__init__(**kwargs)
        self.units = units
        self.activation = keras.activations.get(activation)

    def build(self, batch_input_shape):
        self.kernel = self.add_weight(
            name="kernel", shape=[batch_input_shape[-1], self.units],
            initializer="glorot_normal")
        self.bias = self.add_weight(
            name="bias", shape=[self.units], initializer="zeros")
        super().build(batch_input_shape) # must be at the end

    def call(self, X):
        return self.activation(X @ self.kernel + self.bias)

    def compute_output_shape(self, batch_input_shape):
        return tf.TensorShape(batch_input_shape.as_list()[:-1] + [self.units])

    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "units": self.units,
                "activation": keras.activations.serialize(self.activation)}

#### Working of the CODE!!!!
In the code, units means the number of neurons.

#### The Constructor
1) The constructor takes all the hyperparameters as arguments (in this example, units and activation), and importantly it also takes a **kwargs argument. It calls the parent constructor, passing it the kwargs: this takes care of standard arguments such as input_shape, trainable, and name. Then it saves the hyperparameters as attributes, converting the activation argument to the appropriate activation function using the keras.activations.get() function (it accepts functions, standard strings like "relu" or "selu", or simply None). See, it's basically like how you define a layer right?! there you pass all these parameters keras.layers.Dense(30, activation="relu", input_shape=input_shape), so it's like that only, all that you passed like this gets passed to the parent constructor.

#### The build() method IMPORTANT!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
The build() method’s role is to create the layer’s variables by calling the add_weight() method for each weight. The build() method is called the first time the layer is used. At that point, Keras will know the shape of this layer’s inputs, and it will pass it to the build() method,9 which is often necessary to create some of the weights. For example, we need to know the number of neurons in the previous layer in order to create the connection weights matrix (i.e., the "kernel"): this corresponds to the size of the last dimension of the inputs.  At the end of the build() method (and only at the end), you must call the parent’s build() method: this tells Keras that the layer is built (it just sets self.built=True).

#### The call() method IMPORTANT!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
The call() method performs the desired operations. In this case, we compute the matrix multiplication of the inputs X and the layer’s kernel, we add the bias vector, and we apply the activation function to the result, and this gives us the output of the layer.

#### The compute_output_shape() method
The compute_output_shape() method simply returns the shape of this layer’s outputs. In this case, it is the same shape as the inputs, except the last dimension is replaced with the number of neurons in the layer. Note that in tf.keras, shapes are instances of the tf.TensorShape class, which you can convert to Python lists using as_list().





An example is, suppose you are calling it for the second layer and it has 100 neurons or units, now suppose the first layer has 300 neurons and the batch size is 500. Now when this second layer got the input, the shape of the input was (500,300) so this layer's 100 neurons had connection to the first layer's 300 neurons. Now after the call method when we have computed the output of this layer, we only have 100 left, since there are just 100 neurons in this layer, hence the shape that will be passed onto the next layer will be (500,100)

#### The get_config() method
The get_config() method is just like in the previous custom classes. Note that we save the activation function’s full configuration by calling keras.activations.serialize().

#### You can now use a MyDense layer just like any other layer!

#### Training a model with Custom Layer

In [79]:
model = keras.models.Sequential([
    MyDense(30, activation="relu", input_shape=input_shape),
    MyDense(1)
])

In [80]:
model.compile(loss="mse", optimizer="nadam")
model.fit(X_train_scaled, y_train, epochs=2,
          validation_data=(X_valid_scaled, y_valid))
model.evaluate(X_test_scaled, y_test)


Epoch 1/2
Epoch 2/2


0.4902033507823944

In [81]:
model.save("my_model_with_a_custom_layer.keras")

In [82]:
model = keras.models.load_model("my_model_with_a_custom_layer.keras",
                                custom_objects={"MyDense": MyDense})


### Custom Dynamic Layer

To create a layer with multiple inputs (e.g., Concatenate), the argument to the call() method should be a tuple containing all the inputs, and similarly the argument to the compute_output_shape() method should be a tuple containing each input’s batch shape.

To create a layer with multiple outputs, the call() method should return the list of outputs, and compute_output_shape() should return the list of batch output shapes (one per output).

 For example, the following toy layer takes two inputs and returns three outputs:

In [83]:
class MyMultiLayer(keras.layers.Layer):
    def call(self, X):
        X1, X2 = X
        print("X1.shape: ", X1.shape ," X2.shape: ", X2.shape) # Debugging of custom layer
        return X1 + X2, X1 * X2

    def compute_output_shape(self, batch_input_shape):
        batch_input_shape1, batch_input_shape2 = batch_input_shape
        return [batch_input_shape1, batch_input_shape2]

#### IMPORTANT NOTE FOR THE ABOVE CODE!!!!!!!!!!!!!!!!!!
See that we're just returning the batch shape and not adding the self.units or the number of neurons, or in the call() we are not performing the operation of input multiplied by weights plus the bias that is because this code is only showing the changes we need to make, there are no other things that were there in the previous one like the __init__ method and all so we didn't add the self.units in the compute_output_shape() method

#### IMPORTANT NOTE ABOUT CUSTOM DYNAMIC LAYER!!!!!!!!!!!!!
This layer may now be used like any other layer, but of course only using the Functional and Subclassing APIs, not the Sequential API (which only accepts layers with one input and one output)

### Custom Layer with different behaviour during Training and Testing


If your layer needs to have a different behavior during training and during testing (e.g., if it uses Dropout or BatchNormalization layers), then you must add a train ing argument to the call() method and use this argument to decide what to do.

For example, let’s create a layer that adds Gaussian noise during training (for regularization) but does nothing during testing (Keras has a layer that does the same thing, keras.layers.GaussianNoise):


In [84]:
class MyGaussianNoise(keras.layers.Layer):
    def __init__(self, stddev, **kwargs):
        super().__init__(**kwargs)
        self.stddev = stddev
    def call(self, X, training=None):
        if training:
            noise = tf.random.normal(tf.shape(X), stddev=self.stddev)
            return X + noise
        else:
            return X
    def compute_output_shape(self, batch_input_shape):
        return batch_input_shape

#### IMPORTANT NOTE FOR THE ABOVE CODE!!!!!!!!!!!!!!!!!!
See that we're just returning the batch shape and not adding the self.units or the number of neurons, or in the call() we are not performing the operation of input multiplied by weights plus the bias that is because this code is only showing the changes we need to make, there are no other things that were there in the previous one like the __init__ method and all so we didn't add the self.units in the compute_output_shape() method

## Custom Models

It’s straightforward: subclass the keras.Model class, create layers and variables in the constructor, and implement the call() method to do whatever you want the model to do. 

#### Scenario Custom Model

Suppose you want to build the model where, The inputs go through a first dense layer, then through a residual block(a residual block adds its inputs to its outputs) composed of two dense layers and an addition operation, then through this same residual block three more times, then through a second residual block, and the final result goes through a dense output layer.

Note that this model does not make much sense; it’s just an example to illustrate the fact that you can easily build any kind of model you want, even one that contains loops and skip connections.

### Implementing the Custom Residual Layer for the Custom Model

To implement this model, it is best to first create a ResidualBlock layer, since we are going to create a couple of identical blocks (and we might want to reuse it in another model):

In [85]:
class ResidualBlock(keras.layers.Layer):
    def __init__(self, n_layers, n_neurons, **kwargs):
        super().__init__(**kwargs)
        self.hidden = [keras.layers.Dense(n_neurons, activation="elu", kernel_initializer="he_normal") for _ in range(n_layers)]
 
    def call(self, inputs):
        Z = inputs
        for layer in self.hidden:
            Z = layer(Z)
        return inputs + Z


#### Explanation of the Code Block

1) This layer is a bit special since it contains other layers. This is handled transparently by Keras: it automatically detects that the hidden attribute contains trackable objects (layers in this case), that means that it automatically detects that this layer is composed of multiple layers.

2) So their variables are automatically added to this layer’s list of variables, which are the weights and all.

3) Also note that output of the residual layer is inputs + Z, this is because of what a residual layer does as mentioned before

### Defining the Custom model

Next, let’s use the Subclassing API 
to define the model itself:

In [86]:
class ResidualRegressor(keras.models.Model):
    def __init__(self, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.hidden1 = keras.layers.Dense(30, activation="elu",
                                          kernel_initializer="he_normal")
        self.block1 = ResidualBlock(2, 30)
        self.block2 = ResidualBlock(2, 30)
        self.out = keras.layers.Dense(output_dim)

    def call(self, inputs):
        Z = self.hidden1(inputs)
        for _ in range(1 + 3):
            Z = self.block1(Z)
        Z = self.block2(Z)
        return self.out(Z)



1) Now pay attention to the for _ in range(1 + 3):
 Z = self.block1(Z) in the above block, this was done purely because of the scenario that was defined before, where then through this same residual block three more times part was written.

#### Explanation of the code block

We create the layers in the constructor and use them in the call() method. This model can then be used like any other model (compile it, fit it, evaluate it, and use it to make predictions). If you also want to be able to save the model using the save() method and load it using the keras.models.load_model() function, you must implement the get_config() method (as we did earlier) in both the ResidualBlock class and the ResidualRegressor class. Alternatively, you can save and load the weights using the save_weights() and load_weights() methods.

### Notes and Discussion about the Model Class

The Model class is a subclass of the Layer class, so models can be defined and used exactly like layers. But a model has some extra functionalities, including of course its compile(), fit(), evaluate(), and predict() methods (and a few variants), plus the get_layers() method (which can return any of the model’s layers by name or by index) and the save() method (and support for keras.models.load_model() and keras.models.clone_model()).

### Intersting Take on Model Class and Layer Class

If models provide more functionality than layers, why not just define every layer as a model? Well, technically you could, but it is usually cleaner to distinguish the internal components of your model (i.e., layers or reusable blocks of layers) from the model itself (i.e., the object you will train). This means that the Residual layer defined in the above example that should subclass the Layer class, while the Model defined in the above example that is ResidualRegressor should subclass the Model class.

## Losses and Metrics Based on Model Internals

The custom losses and metrics we defined earlier were all based on the labels and the predictions (and optionally sample weights). There will be times when you want to define losses based on other parts of your model, such as the weights or activations of its hidden layers. This may be useful for regularization purposes or to monitor some internal aspect of your model.

To define a custom loss based on model internals you do as follows:- 

1) compute it based on any part of the model you want

2) Then pass the result to the add_loss() method

### Scenario for Losses on Model Internals

For example, let’s build a custom regression MLP model composed of a stack of five hidden layers plus an output layer. This custom model will also have an auxiliary output on top of the upper hidden layer. The loss associated to this auxiliary output will be called the reconstruction loss, this reconstruction loss is the mean squared difference between the reconstruction and the inputs.

By adding this reconstruction loss to the main loss, we will encourage the model to preserve as much information as possible through the hidden layers—even information that is not directly useful for the regression task itself. In practice, this loss sometimes improves generalization (it is a regularization loss). 

Here is the code for this custom model with a custom reconstruction loss:

In [87]:
class ReconstructingRegressor(keras.Model):
    def __init__(self, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.hidden = [keras.layers.Dense(30, activation="selu",
                                          kernel_initializer="lecun_normal")
                       for _ in range(5)]
        self.out = keras.layers.Dense(output_dim)
        

    def build(self, batch_input_shape):
        n_inputs = batch_input_shape[-1]
        self.reconstruct = keras.layers.Dense(n_inputs)
        # super().build(batch_input_shape)

    def call(self, inputs, training=None):
        Z = inputs
        for layer in self.hidden:
            Z = layer(Z)
        reconstruction = self.reconstruct(Z)
        recon_loss = tf.reduce_mean(tf.square(reconstruction - inputs))
        self.add_loss(0.05 * recon_loss)
        return self.out(Z)

#### Let's walk through the code

1) The constructor creates the DNN with five dense hidden layers and one dense output layer.

2) The build() method creates an extra dense layer which will be used to reconstruct the inputs of the model. It must be created here because its number of units must be equal to the number of inputs, and this number is unknown before the build() method is called.

3) The call() method processes the inputs through all five hidden layers, then passes the result through the reconstruction layer, which produces the reconstruction.

#### Important!!!
4) Then the call() method also computes the reconstruction loss (the mean squared difference between the reconstruction and the inputs), and adds it to the model’s list of losses using the add_loss() method. Notice that we scale down the reconstruction loss by multiplying it by 0.05 (this is a hyperparameter you can tune). This ensures that the reconstruction loss does not dominate the main loss.

##### Note about the add_loss method
You can also call add_loss() on any layer inside the model, as the model recursively gathers losses from all of its layers.

5) Finally, the call() method passes the output of the hidden layers to the output layer and returns its output.

#### Training the model

In [88]:
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

In [89]:
model = ReconstructingRegressor(1)
model.compile(loss="mse", optimizer="nadam")
history = model.fit(X_train_scaled, y_train, epochs=2)
y_pred = model.predict(X_test_scaled)

Epoch 1/2
Epoch 2/2


### For Metrics on Model Internals

Similarly, you can add a custom metric based on model internals by computing it in any way you want, as long as the result is the output of a metric object.

For example, you can create a keras.metrics.Mean object in the constructor, then call it in the call() method, passing it the recon_loss, and finally add it to the model by calling the model’s add_metric() method. This way, when you train the model, Keras will display both the mean loss over each epoch (the loss is the sum of the main loss plus 0.05 times the reconstruction loss) and the mean reconstruction error(the metric) over each epoch. Both will go down during training

In [90]:
class ReconstructingRegressor(keras.Model):
    def __init__(self, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.hidden = [keras.layers.Dense(30, activation="selu",
                                          kernel_initializer="lecun_normal")
                       for _ in range(5)]
        self.out = keras.layers.Dense(output_dim)
        self.reconstruction_mean = keras.metrics.Mean(name="reconstruction_error") #This part was added for the metric, not there in the loss described before

    def build(self, batch_input_shape):
        n_inputs = batch_input_shape[-1]
        self.reconstruct = keras.layers.Dense(n_inputs)
        # super().build(batch_input_shape)

    def call(self, inputs, training=None):
        Z = inputs
        for layer in self.hidden:
            Z = layer(Z)
        reconstruction = self.reconstruct(Z)
        recon_loss = tf.reduce_mean(tf.square(reconstruction - inputs))
        self.add_loss(0.05 * recon_loss)

        
        if training: #This part was added for the metric, not there in the loss described before
            result = self.reconstruction_mean(recon_loss) #This part was added for the metric, not there in the loss described before
            self.add_metric(result) #This part was added for the metric, not there in the loss described before

        
        return self.out(Z)

In [91]:
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

In [92]:
model = ReconstructingRegressor(1)
model.compile(loss="mse", optimizer="nadam")
history = model.fit(X_train_scaled, y_train, epochs=2)
y_pred = model.predict(X_test_scaled)

Epoch 1/2
Epoch 2/2


#### In the above code you can see the reconstruction_error which is basically the metric that was not there before

## Computing Gradients Using Autodif 

#### First let's have a look at partial derivative

In [93]:
def part_dev_func(w1, w2):
    return 3 * w1 ** 2 + 2 * w1 * w2

If you know calculus, you can analytically find that the partial derivative of this function with regard to w1 is 6 * w1 + 2 * w2.

You can also find that its partial derivative with regard to w2 is 2 * w1.

Let's try implementing the an approximation of each partial derivative by measuring how much the function’s output changes when you tweak the corresponding parameter a little, which is the eps that we have added below using the part_dev_func that we defined before

In [94]:
w1, w2 = 5, 3

In [95]:
eps = 1e-6

In [96]:
(part_dev_func(w1 + eps, w2) - part_dev_func(w1, w2)) / eps

36.000003007075065

In [97]:
(part_dev_func(w1, w2 + eps) - part_dev_func(w1, w2)) / eps

10.000000003174137

As you can see, at the point (w1, w2) = (5, 3), these partial derivatives are equal to 36 and 10, respectively, so the gradient vector at this point is (36, 10).

### Using Autodiff TensorFlow style

In the above example we computed the partial derivative, But if this were a neural network, the function would be much more complex, typically with tens of thousands of parameters, and finding the partial derivatives analytically by hand would be an almost impossible task.

Also, it is just an approximation, and importantly you need to call f() at least once per parameter (not twice, since we could compute f(w1, w2) just once). This means that when you are computing the partial derivative let's say w.r.t the parameter w1, you will differentiate w.r.t just w1 right?! so you don't need to call f(w1,w2) twice.

TensorFlow makes autodiff pretty simple

In [98]:
w1, w2 = tf.Variable(5.), tf.Variable(3.)

In [99]:
with tf.GradientTape() as tape:
    z = part_dev_func(w1,w2)

In [100]:
z # The output will be 105 which will be the result if you plug in 5 and 3 in w1 and w2 in the equation 3 * w1 ** 2 + 2 * w1 * w2

<tf.Tensor: shape=(), dtype=float32, numpy=105.0>

In [101]:
gradients = tape.gradient(z, [w1, w2])
gradients
# 1)  Here the output for the first case is 36 which comes after you partially differentiate the equation 3 * w1 ** 2 + 2 * w1 * w2 w.r.t to w1 and then you get 6 * w1 + 2 * w2, and then you plug in the values of w1 and w2.

# 2)  Here the output for the second case is 10 which comes after you partially differentiate the equation 3 * w1 ** 2 + 2 * w1 * w2 w.r.t to w2 and then you get  2 * w1, and then you plug in the values of w1 and w2.

[<tf.Tensor: shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=10.0>]

#### Working of the above code
We first define two variables w1 and w2, then we create a tf.GradientTape context that will automatically record every operation that involves a variable, and finally we ask this tape to compute the gradients of the result z with regard to both variables [w1, w2].

#### The Gradient Tape
Perfect! Not only is the result accurate (the precision is only limited by the floatingpoint errors), but the gradient() method only goes through the recorded computations ONCE!!!! (in reverse order), no matter how many variables there are, so it is incredibly efficient.

#### Warning Gradient Tape
To save memory, only put the strict minimum inside the tf.GradientTape() block. Alternatively, pause recording by creating a with tape.stop_recording() block inside the tf.GradientTape() block.

#### IMPORTANT WARNING TAPE!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
The tape is automatically erased immediately after you call its gradient() method, so you will get an exception if you try to call gradient() twice:

#### Persisting the TAPE

If you need to call gradient() more than once, you must make the tape persistent and delete it each time you are done with it to free resources:

If the tape goes out of scope, for example when the function that used it returns, Python’s garbage collector will delete it for you.

In [102]:
with tf.GradientTape(persistent=True) as tape:
    z = part_dev_func(w1,w2)

In [103]:
dz_dw1 = tape.gradient(z, w1) # => tensor 36.0
dz_dw2 = tape.gradient(z, w2)

In [104]:
dz_dw1

<tf.Tensor: shape=(), dtype=float32, numpy=36.0>

In [105]:
dz_dw2

<tf.Tensor: shape=(), dtype=float32, numpy=10.0>

In [106]:
del tape

#### Tape with Constants and watching

By default, the tape will only track operations involving variables, so if you try to compute the gradient of z with regard to anything other than a variable, the result will be None:

In [107]:
c1, c2 = tf.constant(5.), tf.constant(3.)
with tf.GradientTape() as tape:
    z = part_dev_func(c1, c2)


In [108]:
gradients = tape.gradient(z, [c1, c2])
gradients

[None, None]

However, you can force the tape to watch any tensors you like, to record every operation that involves them. You can then compute gradients with regard to these tensors, as if they were variables:

In [109]:
with tf.GradientTape() as tape:
    tape.watch(c1)
    tape.watch(c2)
    z = part_dev_func(c1, c2)

In [110]:
gradients = tape.gradient(z, [c1, c2])
gradients

[<tf.Tensor: shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=10.0>]

#### Uses of Tapes with constant and watching

This can be useful in some cases, like if you want to implement a regularization loss that penalizes activations that vary a lot when the inputs vary little: the loss will be based on the gradient of the activations with regard to the inputs. Since the inputs are not variables, you would need to tell the tape to watch them.

### Reverse Mode Auto-Diff

1) Most of the time a gradient tape is used to compute the gradients of a single value (usually the loss, right as you've seen, you perform the gradient descent w.r.t the loss function right) with regard to a set of values (usually the model parameters, for ex you could consider w1 and w2 that we defined before). 

2) This is where reverse-mode autodiff shines, as it just needs to do one forward pass and one reverse pass to get all the gradients at once.

#### Scenario if you compute the gradient of a vector

1) If you try to compute the gradients of a vector, for example a vector containing multiple losses, then TensorFlow will compute the gradients of the vector’s sum.

#### Calculating gradient per parameter of the model

1) So if you ever need to get the individual gradients (e.g., the gradients of each loss with regard to model's each parameter), you must call the tape’s jacobian() method: it will perform reverse-mode autodiff once for each loss in the vector (all in parallel by default).

2) It is even possible to compute second-order partial derivatives (the Hessians, i.e., the partial derivatives of the partial derivatives).

#### Stopping gradients from backpropagating

In some cases you may want to stop gradients from backpropagating through some part of your neural network. 

To do this, you must use the tf.stop_gradient() function. The function returns its inputs during the forward pass (like tf.identity()), but it does not let gradients through during backpropagation (it acts like a constant)

In [111]:
def f(w1, w2):
    return 3 * w1 ** 2 + tf.stop_gradient(2 * w1 * w2)


In [112]:
with tf.GradientTape() as tape:
    z = f(w1, w2) 
z # same result as without stop_gradient()

<tf.Tensor: shape=(), dtype=float32, numpy=105.0>

In [113]:
gradients = tape.gradient(z, [w1, w2]) 
gradients # => returns [tensor 30., None]

[<tf.Tensor: shape=(), dtype=float32, numpy=30.0>, None]

## Custom Gradients for specific issues

you may occasionally run into some numerical issues when computing gradients. For example, if you compute the gradients of the my_softplus() function for large inputs, the result will be NaN

In [114]:
x = tf.Variable([100.])

In [115]:
with tf.GradientTape() as tape:
    z = my_softplus(x)

In [116]:
tape.gradient(z, x)

<tf.Tensor: shape=(1,), dtype=float32, numpy=array([nan], dtype=float32)>

This is because computing the gradients of this function using autodiff leads to some numerical difficulties: due to floating-point precision errors, autodiff ends up computing infinity divided by infinity (which returns NaN)

Fortunately, we can analytically find that the derivative of the softplus function is just 1 / (1 + 1 / exp(x)), which is numerically stable.

Next, we can tell TensorFlow to use this stable function when computing the gradients of the my_softplus() function by decorating it with @tf.custom_gradient and making it return both its normal output and the function that computes the derivatives

In [117]:
@tf.custom_gradient
def my_better_softplus(z):
    exp = tf.exp(z)
    def my_softplus_gradients(grad):
        return grad / (1 + 1 / exp)
    return tf.math.log(exp + 1), my_softplus_gradients

note that we are using grad / (1 + 1 / exp) instead of 1 / (1 + 1 / exp(x)) because my_softplus_gradients will receive as input the gradients that were backpropagated so far to the softplus function; and according to the chain rule, we should multiply them with this function’s gradients

## Custom Training Loops

In some rare cases, the fit() method may not be flexible enough for what you need to do. For example, the Wide & Deep paper we discussed in Chapter 10 uses two different optimizers: one for the wide path and the other for the deep path. Since the fit() method only uses one optimizer (the one that we specify when compiling the model), implementing this paper requires writing your own custom loop.

#### Important Note about Custom Training Loops
You may also like to write custom training loops simply to feel more confident that they do precisely what you intend them to do (perhaps you are unsure about some details of the fit() method). It can sometimes feel safer to make everything explicit. However, remember that writing a custom training loop will make your code longer, more error-prone, and harder to maintain.

### Building a Custom Training Loop

1) To build a custom training loop with start with first building a model

In [118]:
l2_reg = keras.regularizers.l2(0.05)
model = keras.models.Sequential([ keras.layers.Dense(30, activation="elu", kernel_initializer="he_normal", kernel_regularizer=l2_reg), keras.layers.Dense(1, kernel_regularizer=l2_reg)])
# No need to compile it since we will do the training manually

2) Next, let’s create a tiny function that will randomly sample a batch of instances from the training set.

In [119]:
def random_batch(X, y, batch_size=32):
    idx = np.random.randint(len(X), size=batch_size)
    return X[idx], y[idx]

3) Let’s also define a function that will display the training status, including the number of steps, the total number of steps, the mean loss since the start of the epoch (i.e., we will use the Mean metric to compute it), and other metrics:

In [120]:
def print_status_bar(iteration, total, loss, metrics=None):
    metrics = " - ".join(["{}: {:.4f}".format(m.name, m.result())
                         for m in [loss] + (metrics or [])])
    end = "" if iteration < total else "\n"
    print("\r{}/{} - ".format(iteration, total) + metrics,
          end=end)

#### Explanation of the function
string formatting: {:.4f} will format a float with four digits after the decimal point, and using \r (carriage return) along with end="" ensures that the status bar always gets printed on the same line. In the notebook, the print_status_bar() function includes a progress bar

#### Creating the custom training loop

1) First, we need to define some hyperparameters and choose the optimizer, the loss function, and the metrics (just the MAE in this example)

In [121]:
n_epochs = 5
batch_size = 32
n_steps = len(X_train) // batch_size
optimizer = keras.optimizers.Nadam(learning_rate=0.01)
loss_fn = keras.losses.mean_squared_error
mean_loss = keras.metrics.Mean()
metrics = [keras.metrics.MeanAbsoluteError()]

2) And now we are ready to build the custom loop!

In [122]:
for epoch in range(1, n_epochs + 1):
    print("Epoch {}/{}".format(epoch, n_epochs))
    for step in range(1, n_steps + 1):
        X_batch, y_batch = random_batch(X_train_scaled, y_train)
        with tf.GradientTape() as tape:
            y_pred = model(X_batch)
            main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
            loss = tf.add_n([main_loss] + model.losses)
        gradients = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        for variable in model.variables:
            if variable.constraint is not None:
                variable.assign(variable.constraint(variable))
        mean_loss(loss)
        for metric in metrics:
            metric(y_batch, y_pred)
        print_status_bar(step * batch_size, len(y_train), mean_loss, metrics)
    print_status_bar(len(y_train), len(y_train), mean_loss, metrics)
    for metric in [mean_loss] + metrics:
        metric.reset_states()

Epoch 1/5
11610/11610 - mean: 1.4810 - mean_absolute_error: 0.5859
Epoch 2/5
11610/11610 - mean: 0.6747 - mean_absolute_error: 0.5257
Epoch 3/5
11610/11610 - mean: 0.6298 - mean_absolute_error: 0.5167
Epoch 4/5
11610/11610 - mean: 0.6391 - mean_absolute_error: 0.5188
Epoch 5/5
11610/11610 - mean: 0.6456 - mean_absolute_error: 0.5231


#### Code walk through.

1) • We create two nested loops: one for the epochs, the other for the batches within
an epoch.

2) Then we sample a random batch from the training set.

3) Inside the tf.GradientTape() block, we make a prediction for one batch (using the model as a function).

4) Then we compute the loss: it is equal to the main loss plus the other losses (in this model, there is one regularization loss per layer, the l2 regularizer).

5) Since the mean_squared_error() function returns one loss per instance, we compute the mean over the batch using tf.reduce_mean() (if you wanted to apply different weights to each instance, like for underepresented classes you might want to assign them bigger weights, this is where you would do it)

6) The regularization losses are already reduced to a single scalar each when we call the model.losses, so we just need to sum them (using tf.add_n(), which sums multiple tensors of the same shape and data type).

7) Next, we ask the tape to compute the gradient of the loss with regard to each trainable variable (not all variables!), and we apply them to the optimizer to perform a Gradient Descent step.

8) Then we update the mean loss and the metrics which is the mean absolute error, now as you've previously seen before that there is an overlap between the losses and metrics, however you know that the metric is for the streaming result and you want it over the entire training. Then we display the current mean loss and current metric over the current epoch and we display the status bar.

9) At the end of each epoch, we display the status bar again to make it look complete and to print a line feed, and we reset the states of the mean loss and the metrics.

#### To avoid exploding gradients
If you set the optimizer’s clipnorm or clipvalue hyperparameter, it will take care of this for you. If you want to apply any other transformation to the gradients, simply do so before calling the apply_gradients() method.

#### Weights and Bias Constraints
If you add weight constraints to your model (e.g., by setting kernel_constraint or bias_constraint when creating a layer), you should update the training loop to apply these constraints just after apply_gradients():

In [123]:
for variable in model.variables:
    if variable.constraint is not None:
        variable.assign(variable.constraint(variable))

#### Model behaviour during Training and Testing

Most importantly, this training loop does not handle layers that behave differently during training and testing (e.g., BatchNormalization or Dropout). To handle these, you need to call the model with training=True and make sure it propagates this to every layer that needs it.

#### Conclusion Custom Training
As you can see, there are quite a lot of things you need to get right, and it’s easy to make a mistake. But on the bright side, you get full control, so it’s your call.

## TensorFlow Functions and Graphs

In TensorFlow 1, graphs were unavoidable (as were the complexities that came with them) because they were a central part of TensorFlow’s API. In TensorFlow 2, they are still there, but not as central, and they’re much (much!) simpler to use. To show just how simple, let’s start with a trivial function that computes the cube of its input:

In [124]:
def cube(x):
    return x ** 3

We can obviously call this function with a Python value, such as an int or a float, or we can call it with a tensor:

In [125]:
cube(2)

8

In [126]:
cube(tf.constant(2.0))

<tf.Tensor: shape=(), dtype=float32, numpy=8.0>

Now, let’s use tf.function() to convert this Python function to a TensorFlow Function:

In [127]:
tf_cube = tf.function(cube)
tf_cube

<tensorflow.python.eager.polymorphic_function.polymorphic_function.Function at 0x275cd93b110>

This TF Function can then be used exactly like the original Python function, and it will return the same result (but as tensors):

In [128]:
tf_cube(323723)

<tf.Tensor: shape=(), dtype=int64, numpy=33925063503334067>

In [129]:
tf_cube(tf.constant(242323.42312310))

<tf.Tensor: shape=(), dtype=float32, numpy=1.4229387e+16>

Under the hood, tf.function() analyzed the computations performed by the cube() function and generated an equivalent computation graph

#### Alternatively, we could have used tf.function as a decorator; this is actually more common: 

Here if you add the @ decorator, then tensorflow converts the python function to a tf function, and then returns the output in form of tensors

In [130]:
@tf.function
def tf_cube(x):
    return x ** 3

In [131]:
tf_cube(23)

<tf.Tensor: shape=(), dtype=int32, numpy=12167>

### TensorFlow Computation Graph

1) TensorFlow optimizes the computation graph, pruning unused nodes, simplifying expressions (e.g., 1 + 2 would get replaced with 3), and more.

2) Once the optimized graph is ready, the TF Function efficiently executes the operations in the graph, in the appropriate order (and in parallel when it can).

3) As a result, a TF Function will usually run much faster than the original Python function, especially if it performs complex computations.

#### Pro-Tip
Where applicable, when you want to boost a Python function, just transform it into a TF Function.

#### Moreover, when you write a custom loss function, a custom metric, a custom layer, or any other custom function and you use it in a Keras model (as we did throughout this chapter), Keras automatically converts your function into a TF Function—no need to use tf.function().
For ex:- remember how you implemented the custom huber loss function, and you passed it in the model's fit method, there tensorflow automatically converted the normal python function to tf.function()


### IMPORTANT WARNING TENSORFLOW FUNCTIONS GRAPH!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

1) By default, a TF Function generates a new graph for every unique set of input shapes and data types and caches it for subsequent calls. For example, if you call tf_cube(tf.constant(10)), a graph will be generated for int32 tensors of shape []. Then if you call tf_cube(tf.constant(20)), the same graph will be reused. But if you then call tf_cube(tf.constant([10, 20])), a new graph will be generated for int32 tensors of shape [2]. This is how TF Functions handle polymorphism (i.e., varying argument types and shapes). However, this is only true for tensor arguments: if you pass numerical Python values to a TF Function, a new graph will be generated for every distinct value: for example, calling tf_cube(10) and tf_cube(20) will generate two graphs.




2) If you call a TF Function many times with different numerical Python values, then many graphs will be generated, slowing down your program and using up a lot of RAM (you must delete the TF Function to release it). Python values should be reserved for arguments that will have few unique values, such as hyperparameters like the number of neurons per layer. This allows TensorFlow to better optimize each variant of your model.

## AutoGraph and Tracing

### Tracing :- https://www.tensorflow.org/guide/function#tracing

#### In a nutshell, when autograph generates the graph, it first traces the original python code which means, it differentiates the normal operations from tensorflow operations, as the normal operations won't be part of the tf.graph(). Now when you read the TF Function Rules section, there I have explained why f(tf.constant(2.)) and f(tf.constant(3.)) will return the same random number.r

So how does TensorFlow generate graphs?

1) It starts by analyzing the Python function’s source code to capture all the control flow statements, such as for loops, while loops, and if statements, as well as break, continue, and return statements. This first step is called __AutoGraph__.

2) The reason TensorFlow has to analyze the source code is that Python does not provide any other way to capture control flow statements: it offers magic methods like __ add __() and __ mul __() to capture operators like + and *, but there are no __ while __() or __ if __() magic methods.

3) After analyzing the function’s code, AutoGraph outputs an upgraded version of that function in which all the control flow statements are replaced by the appropriate TensorFlow operations, such as tf.while_loop() for loops and tf.cond() for if statements.

e

#### Below, we'll see an example of autograph converting a regular python function to tf.function()

##### First we create a normal python function and add the tf_function decorator

In [133]:
@tf.function
def sum_squares(n):
    s = 0
    for i in tf.range(n+1):
        s += i ** 2
    return s
        

##### Now we'll see the code that Autograph generates

In [135]:
print(tf.autograph.to_code(sum_squares.python_function))

def tf__sum_squares(n):
    with ag__.FunctionScope('sum_squares', 'fscope', ag__.ConversionOptions(recursive=True, user_requested=True, optional_features=(), internal_convert_user_code=True)) as fscope:
        do_return = False
        retval_ = ag__.UndefinedReturnValue()
        s = 0

        def get_state():
            return (s,)

        def set_state(vars_):
            nonlocal s
            s, = vars_

        def loop_body(itr):
            nonlocal s
            i = itr
            s = ag__.ld(s)
            s += i ** 2
        i = ag__.Undefined('i')
        ag__.for_stmt(ag__.converted_call(ag__.ld(tf).range, (ag__.ld(n) + 1,), None, fscope), None, loop_body, get_state, set_state, ('s',), {'iterate_names': 'i'})
        try:
            do_return = True
            retval_ = ag__.ld(s)
        except:
            do_return = False
            raise
        return fscope.ret(retval_, do_return)



##### Now we'll mention the steps that were taken by AutoGraph to generate this code

1) AutoGraph analyzes the source code of the sum_squares() Python function, and it generates the tf__ sum_squares() function.

2) In this function, the for loop is replaced by the definition of the loop_body() function (containing the body of the original for loop), followed by a call to the for_stmt() function.

3) This call will build the appropriate tf.while_loop() operation in the computation graph.

4) Next, TensorFlow calls this “upgraded” function, but instead of passing the argument, it passes a symbolic tensor—a tensor without any actual value, only a name, a data type, and a shape. This point is explained below with an example

4.1) For example, if you call sum_squares(tf.constant(10)), then the tf__ sum_squares() function will be called with a symbolic tensor of type int32 and shape []. 

4.2) The function will run in graph mode, meaning that each TensorFlow operation will add a node in the graph to represent itself and its output tensor(s) (as opposed to the regular mode, called eager execution, or eager mode). As you saw before, the output is present in the numpy <tf.Tensor: shape=(), dtype=int32, numpy=12167>, 

## TF Function Rules

Most of the time, converting a Python function that performs TensorFlow operations into a TF Function is trivial: decorate it with @tf.function or let Keras take care of it for you. However, there are a few rules to respect:

1) If you call any external library, including NumPy or even the standard library, this call will run only during tracing; it will not be part of the graph. Indeed, a TensorFlow graph can only include TensorFlow constructs (tensors, operations, variables, datasets, and so on). So, make sure you use tf.reduce_sum() instead of np.sum(), tf.sort() instead of the built-in sorted() function, and so on (unless you really want the code to run only during tracing). This has a few additional implications:

1.i) If you define a TF Function f(x) that just returns np.random.rand(), a random number will only be generated when the function is traced, so f(tf.constant(2.)) and f(tf.constant(3.)) will return the same random number, because the code has been already traced for f(tf.constant(2.)) and since f(tf.constant(2.)) and f(tf.constant(3.)) have the same datatype, as per the rule of tracing https://www.tensorflow.org/guide/function#rules_of_tracing tracing will not be done again. But f(tf.constant([2., 3.])) will return a different one. If you replace np.random.rand() with tf.random.uniform([]), then a new random number will be generated upon every call, since the operation will be part of the graph.

1 ii) If your non-TensorFlow code has side effects (such as logging something or updating a Python counter), then you should not expect those side effects to occur every time you call the TF Function, as they will only occur when the function is traced.

1 iii) You can wrap arbitrary Python code in a tf.py_function() operation, but doing so will hinder performance, as TensorFlow will not be able to do any graph optimization on this code. It will also reduce portability, as the graph will only run on platforms where Python is available (and where the right libraries are installed).

2) You can call other Python functions or TF Functions inside you tf.function, but they should follow the same rules, as TensorFlow will capture their operations in the computation graph. Note that these other functions do not need to be decorated with @tf.function

3) If the function creates a TensorFlow variable (or any other stateful TensorFlow object, such as a dataset or a queue), it must do so upon the very first call, and only then, or else you will get an exception. It is usually preferable to create variables outside of the TF Function(e.g., in the build() method of a custom layer). If you want to assign a new value to the variable, make sure you call its assign() method, instead of using the = operator.

4) The source code of your Python function should be available to TensorFlow. If the source code is unavailable (for example, if you define your function in the Python shell, which does not give access to the source code, or if you deploy only the compiled *.pyc Python files to production), then the graph generation process will fail or have limited functionality.

5) TensorFlow will only capture for loops that iterate over a tensor or a dataset. So make sure you use for i in tf.range(x) rather than for i in range(x), or else the loop will not be captured in the graph. Instead, it will run during tracing. (A scenario where you may want the loop to run during tracing could be if the for loop is meant to build the graph, for example to create each layer in a neural network.)

6) As always, for performance reasons, you should prefer a vectorized implementation whenever you can, rather than using loops.