**Building complex models using the functional API:**
- One example of a non_sequential neurela net is a wide and deep neural net. introduced ina 2016 paper
- learns both deep patterns - by forcing data through a deep net and simple rules - through using a shallow approach
- we now build such a neural net and use it to tackle the california housing problem

In [1]:
import tensorflow as tf

In [3]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [19]:
X = pd.read_csv(r"C:\Users\blais\Documents\ML\deep_learning\housing_x.csv")
Y = pd.read_csv(r"C:\Users\blais\Documents\ML\deep_learning\housing_y.csv")

In [20]:
X = X.iloc[:, 1:]
Y = Y.iloc[:,1:]

In [21]:
X_train,X_test,y_train,y_test = train_test_split(X,Y,test_size=0.2)
X_train,X_valid,y_train,y_valid = train_test_split(X_train, y_train, test_size = 0.1)

In [22]:
scaler = StandardScaler()
y_train_scaled = scaler.fit_transform(y_train)

In [23]:
y_valid_scaled, y_test_scaled = scaler.transform(y_valid), scaler.transform(y_test)

In [24]:
X_train, X_valid, y_train, y_valid = X_train.values, X_valid.values, y_train_scaled.reshape(y_train_scaled.shape[0],), y_valid_scaled.reshape(y_valid_scaled.shape[0],)

In [25]:
X_test, y_test = X_test.values, y_test_scaled.reshape(y_test_scaled.shape[0],)

In [26]:
tf.keras.utils.set_random_seed(42)

Get the data in:

In [27]:
# build a wide and deep neural net

normalization_layer = tf.keras.layers.Normalization()
hidden_layer1 = tf.keras.layers.Dense(30, activation="relu")
hidden_layer2 = tf.keras.layers.Dense(30, activation="relu")
concat_layer = tf.keras.layers.Concatenate()
output_layer = tf.keras.layers.Dense(1)

input_ = tf.keras.layers.Input(shape = X_train.shape[1:])
normalized = normalization_layer(input_)
hidden1 = hidden_layer1(normalized)
hidden2 = hidden_layer2(hidden1)
concat = concat_layer([normalized, hidden2])
output = output_layer(concat)

model = tf.keras.Model(inputs=[input_], outputs=[output])

In [28]:
model.summary()

In [29]:
optimizer = tf.keras.optimizers.Adam(learning_rate = 1.0e-3)

model.compile(loss = "mse", optimizer=optimizer, metrics=["RootMeanSquaredError"])

normalization_layer.adapt(X_train)

In [30]:
history = model.fit(X_train, y_train_scaled, epochs=20, validation_data = (X_valid, y_valid_scaled))

Epoch 1/20
[1m372/372[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - RootMeanSquaredError: 1.0456 - loss: 1.1225 - val_RootMeanSquaredError: 0.5206 - val_loss: 0.2710
Epoch 2/20
[1m372/372[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - RootMeanSquaredError: 0.5921 - loss: 0.3508 - val_RootMeanSquaredError: 0.4763 - val_loss: 0.2269
Epoch 3/20
[1m372/372[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - RootMeanSquaredError: 0.5240 - loss: 0.2748 - val_RootMeanSquaredError: 0.4621 - val_loss: 0.2136
Epoch 4/20
[1m372/372[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - RootMeanSquaredError: 0.5010 - loss: 0.2512 - val_RootMeanSquaredError: 0.4558 - val_loss: 0.2078
Epoch 5/20
[1m372/372[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - RootMeanSquaredError: 0.4901 - loss: 0.2403 - val_RootMeanSquaredError: 0.4519 - val_loss: 0.2042
Epoch 6/20
[1m372/372[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m

But - what if you want to send a subset of features through the wide path and a different subset (possibly overlapping through the deep path). In this case - one solution is to use multiple inputs. - For example, suppose we want to send 5 features through the wide path and 6 features through the deep path

In [31]:
X_train.shape

(11888, 24)

In [36]:
input_wide = tf.keras.layers.Input(shape=[13]) # features 0 to 12 
input_deep = tf.keras.layers.Input(shape=[22]) # features 2 to 23
norm_layer_deep = tf.keras.layers.Normalization()
norm_layer_wide = tf.keras.layers.Normalization()

norm_wide = norm_layer_wide(input_wide)
norm_deep = norm_layer_deep(input_deep)

hidden1 = tf.keras.layers.Dense(30, activation="relu")(norm_deep)
hidden2 = tf.keras.layers.Dense(30, activation="relu")(hidden1)
concat = tf.keras.layers.concatenate([norm_wide, hidden2])
output = tf.keras.layers.Dense(1)(concat)

model = tf.keras.Model(inputs=[input_wide, input_deep], outputs=[output])

In [37]:
optimizer = tf.keras.optimizers.Adam(learning_rate=1.0e-3)
model.compile(loss="mse",optimizer=optimizer,metrics=["RootMeanSquaredError"])

X_train_wide, X_train_deep = X_train[:,:13], X_train[:,2:]
X_valid_wide, X_valid_deep = X_valid[:,:13], X_valid[:,2:]
X_test_wide, X_test_deep = X_test[:,:13], X_test[:,2:]
X_new_wide, X_new_deep = X_test_wide[:3], X_test_deep[:3]

In [38]:
norm_layer_wide.adapt(X_train_wide)
norm_layer_deep.adapt(X_train_deep)

In [39]:
history = model.fit((X_train_wide, X_train_deep), y_train_scaled, epochs=35,
                    validation_data = ((X_valid_wide, X_valid_deep), y_valid_scaled))

Epoch 1/35


[1m372/372[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - RootMeanSquaredError: 0.8715 - loss: 0.7883 - val_RootMeanSquaredError: 0.5024 - val_loss: 0.2524
Epoch 2/35
[1m372/372[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - RootMeanSquaredError: 0.5531 - loss: 0.3062 - val_RootMeanSquaredError: 0.4702 - val_loss: 0.2211
Epoch 3/35
[1m372/372[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - RootMeanSquaredError: 0.5209 - loss: 0.2715 - val_RootMeanSquaredError: 0.4603 - val_loss: 0.2118
Epoch 4/35
[1m372/372[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - RootMeanSquaredError: 0.5062 - loss: 0.2564 - val_RootMeanSquaredError: 0.4502 - val_loss: 0.2026
Epoch 5/35
[1m372/372[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - RootMeanSquaredError: 0.5030 - loss: 0.2531 - val_RootMeanSquaredError: 0.4520 - val_loss: 0.2043
Epoch 6/35
[1m372/372[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/

Instead of passing a tuple (X_train_wide, X_train_deep), you can pass a dictionary {"input_wide": X_train_wide, "input_deep": X_train_deep}, if you set the name input_wide and input_deep when creating the inputs. Highly recommended when there are many inputs to clarify the code and avoid getting the wrong order

There also are many use cases in which you may want to have multiple outputs:
- Adding an extra output is quite easy - we just connect it to the appropriate layer and add it to the model's list of outputs.

In [50]:
# building a network with 2 inputs and 2 outputs

input_wide = tf.keras.layers.Input(shape=[13])
input_deep = tf.keras.layers.Input(shape=[22])

normalization_wide = tf.keras.layers.Normalization()
normalization_deep = tf.keras.layers.Normalization()

norm_wide = normalization_wide(input_wide)

norm_deep = normalization_deep(input_deep)
hidden1 = tf.keras.layers.Dense(30, activation="relu")(norm_deep)
hidden2 = tf.keras.layers.Dense(30, activation="relu")(hidden1)

concat = tf.keras.layers.concatenate([norm_wide,hidden2])

output = tf.keras.layers.Dense(1, name="main_out")(concat)

aux_output = tf.keras.layers.Dense(1, name="aux_out")(hidden2)

model = tf.keras.Model(inputs=[input_wide, input_deep], outputs=[output, aux_output])

Each output will need its own loss function.
Therefore - when compiling the model - pass a list of losses. 
Passing a single loss means it should be applied for all outputs. By default, keras will compute all losses and add them to get the final loss used for training. Since we care much more about main output than auxiliary output - we want to give the main output's loss a much greater weight. It is possoible to set all the loss weights when compiling the model:

In [51]:
optimizer = tf.keras.optimizers.Adam(learning_rate=1.0e-3)
model.compile(loss=("mse","mse"), loss_weights=(0.9, 0.1), optimizer=optimizer, metrics=["RootMeanSquaredError","RootMeanSquaredError"])

Now, when training - need to provide labels for all outputs - In this example - the main output and the aux output should predict the same thing and we've simply added the aux for regularization

In [52]:
normalization_wide.adapt(X_train_wide)
normalization_deep.adapt(X_train_deep)

In [53]:
history = model.fit(
    (X_train_wide, X_train_deep),(y_train_scaled, y_train_scaled), epochs=35, 
    validation_data = ((X_valid_wide, X_valid_deep),(y_valid_scaled,y_valid_scaled))
)

Epoch 1/35
[1m372/372[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - aux_out_RootMeanSquaredError: 0.8988 - aux_out_loss: 0.8147 - loss: 0.9319 - main_out_RootMeanSquaredError: 0.9600 - main_out_loss: 0.9449 - val_aux_out_RootMeanSquaredError: 0.5352 - val_aux_out_loss: 0.2846 - val_loss: 0.2853 - val_main_out_RootMeanSquaredError: 0.5340 - val_main_out_loss: 0.2825
Epoch 2/35
[1m372/372[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - aux_out_RootMeanSquaredError: 0.6088 - aux_out_loss: 0.3709 - loss: 0.3715 - main_out_RootMeanSquaredError: 0.6087 - main_out_loss: 0.3716 - val_aux_out_RootMeanSquaredError: 0.5005 - val_aux_out_loss: 0.2486 - val_loss: 0.2425 - val_main_out_RootMeanSquaredError: 0.4915 - val_main_out_loss: 0.2399
Epoch 3/35
[1m372/372[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - aux_out_RootMeanSquaredError: 0.5609 - aux_out_loss: 0.3147 - loss: 0.3175 - main_out_RootMeanSquaredError: 0.5631 - main_out_loss: 0.31

When we evaluate the model, Keras returns the weighted sum of the losses, as well as the individual losses and metrics

In [49]:
eval_results = model.evaluate((X_test_wide, X_test_deep),(y_test_scaled, y_test_scaled))

[1m104/104[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - dense_15_RootMeanSquaredError: 0.4410 - dense_15_loss: 0.1948 - dense_16_RootMeanSquaredError: 0.4463 - dense_16_loss: 0.1995 - loss: 0.1953


The predict() method returns predictions for each output as well:

In [55]:
y_pred_main, y_pred_aux = model.predict((X_new_wide, X_new_deep))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41ms/step


In [56]:
model.output_names

ListWrapper(['main_out', 'aux_out'])

The predict() method returns a tuple, and it does not have a return_dict argument to get a dictionary instead. However - we can create one using model.output_names:

In [57]:
y_pred_tuple = model.predict((X_new_wide, X_new_deep))
y_pred = dict(zip(model.output_names, y_pred_tuple))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 39ms/step


In [58]:
y_pred

{'main_out': array([[ 0.2391272],
        [-1.0580845],
        [-0.7071378]], dtype=float32),
 'aux_out': array([[ 0.23888652],
        [-1.1380088 ],
        [-0.6938181 ]], dtype=float32)}

As seen in this example - you can build all sorts of architectures with the functional api

**Using the Subclassing API to build dynamic models:**
- both the sequential and dynamic api work but - they're static.  Some models involve loops, varying shapes, conditional branching, and other dynamic behaviour. for such cases, or if you simply prefer a more imperative programming style - the subclassing API is for you
- With this approach - you subclass the Model class, create the layers you need in the constructor, and then use them to perform the computations you want in the call() method. For example - creating an instance of the following WideAnddeepModel class gives us an equivalent model to the one we built with the functionalapi

In [59]:
class WideAndDeepModel(tf.keras.Model):
    def __init__(self, units=30, activation="relu", **kwargs):
        super().__init__(**kwargs)
        self.norm_layer_wide = tf.keras.layers.Normalization()
        self.norm_layer_deep = tf.keras.layers.Normalization()
        self.hidden1 = tf.keras.layers.Dense(units, activation=activation)
        self.hidden2 = tf.keras.layers.Dense(units, activation=activation)
        self.main_output = tf.keras.layers.Dense(1)
        self.aux_output = tf.keras.layers.Dense(1)
    
    def call(self, inputs):
        input_wide, input_deep = inputs
        norm_wide = self.norm_layer_wide(input_wide)
        norm_deep = self.norm_layer_deep(input_deep)
        hidden1 = self.hidden1(norm_deep)
        hidden2 = self.hidden2(hidden1)
        concat = tf.keras.layers.concatenate([norm_wide, hidden2])
        output = self.main_output(concat)
        aux_output = self.aux_output(hidden2)
        return output, aux_output

In [60]:
model = WideAndDeepModel(30, activation="relu", name="my_cool_model")

Model looks like the prev one - except - we separate the creation of the layers in the constructor from their usage in the call() method. We also don't need to create the Input objects: we can use the input argument to the call() method. 

Now that we have a model instance, we can compile it, adapt its normalization layers (e.g. using model.norm_layer_wide.adapt(), and model.norm_layer_deep.adapt()), and fit it, evaluate it, and use it to make predictions, exactly like we did with the functional API.


**Saving and Restoring a Model:**
- saving a trained keras model is as simple as it gets:

In [62]:
model.save("../data/my_keras_model.keras")

  return saving_lib.save_model(model, filepath)


when you set the save_format = "tf" - keras saves the model using tensorflow's savedmodel format - a directory with the given name containing several files and subdirectories. In particular - saved_model.pb contains the model's architecture and logic in the form of a serialized graph - keras_metadata.pb file contains extra info needed by keras - the variables subdirectory contains all the parameter values (including the connection weights, the biases, the normalization statistics, and the optimizer's parameters), possibly split across multi

saving just the weights is faster and uses less disk space than saving the whole model, so its perfect to save quick checkpoints during training. If you're training a big model - use checkppints regularly

**Using Callbacks:**
- the fit() method accepts a callbacks argument that lets you specify a list of objects that keras will call before and after training, before and after each epoch, and even before processing each batch. For example, the  ModelCheckpoint callback saves checkpoints of your model at regular intervals during training by default at the end of each epoch.

In [65]:
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("../data/my_checkpoints/checkpoint.weights.h5", save_weights_only=True)

In [None]:
history = model.fit([...], callbacks=[checkpoint_cb])

If you use a validation set during training - you can set save_best_only to True - when creating the ModelCheckpoint. In this case - it will only save your model when its performance on the validation set is the best so far. This way - you don't need to worry about training for too long and overfitting the training set: simply restore the last saved model after training - this will be the best model on the validation set. 


Another way is to use the EarlyStopping callback - it will interrupt training when it measures no progress on the validation set for a number of epochs - defined by the patience argument. - if you set the restore_best_weights argument to true - it will rollback to the best model at the end of training. Can combine both callbacks to - save checkpoints of your model - in case your computer crashes - and to interrupt training early when there is no more progress to avoid wasting time and resources and to reduce overfitting

In [None]:
early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=10,
                                                    restore_best_weights=True)

In [None]:
history = model.fit([...], callbacks=[checkpoint_cb, early_stopping_cb])

Check - tf callbacks package for other callbacks. For extra control - you can easily write your pwn custom callbacks - e.g. the folliwing custom callback will display the ratio between the validation and training loss during training (e.g. to detect overfitting)

In [66]:
class PrintValTrainRatioCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs):
        ratio = logs['val_loss']/logs['loss']
        print(f"Epoch = {epoch}, val/train={ratio:.2f}")

As you might expect, you can implement on_train_begin(), on_train_end(), on_epoch_begin(), on_epoch_end(), on_batch_begin(), and on_batch_end(). Callbacks can also be used during evaluation and predictions, should you ever need them (e.g. for debugging). For evaluation, implement on_test_begin(), on_test_end(), on_test_batch_begin(), on_test_batch_end(), which are called by evaluate(). For prediction - on_predict_begin() - on_predict_end...on_predict_batch_begin or on_predict_batch_end which are called by predict.

**Using TensorBoard for Visualization:**
- Tensorboard - great interactive viz tool to view learning curves during training, compare curves and metrics between multiple runs, visualize the computation graph, analyze training stats.
- View learning curves during training, compare curves and metrics between multiple runs, viz the computation graph, analyze training statistics, view images generated by the model, visualize complex multidimensional data projected down to 3D and automatically clustered for you, profile your network

Run the command below:

In [67]:
%pip install -q -U tensorboard-plugin-profile

Note: you may need to restart the kernel to use updated packages.


In [68]:
!pip install -q -U tensorboard-plugin-profile

To use tensorboard - modify your program so that it outputs the data you want to visualize to special binary logfiles called event files. Each binary data record is called a summary. 
Configure tensorboard to monifor the root log directory and configure the program to write to a different subdirectory every time it runs. This way, the same tensorboard server instance will allow you to visualize and compare data from multiple runs of your program, w/o getting everything mixed up.

- Let's name the root log directory my_logs - and define a little fxn that generates the path of the log subdirectory based on the current date and time, so that its different at every run:

In [69]:
from pathlib import Path
from time import strftime

In [72]:
def get_run_logdir(root_logdir="my_logs"):
    return Path(root_logdir)/strftime("run_%Y_%m_%d_%H_%M_%S")

In [73]:
run_logdir = get_run_logdir()

In [75]:
run_logdir

WindowsPath('my_logs/run_2025_07_20_09_32_35')

Keras provides a convenient TensorBoard() callback that will take care of creating the log directory for you (along with its parent directories if needed) - and will create eventfiles and write summaries to them during training. 

In [None]:
tensorboard_cb = tf.keras.callbacks.TensorBoard(run_logdir, profile_batch=(100,200))