What if you want to send different subsets of input features through the wide or deep paths? We will send 5 features (features 0 to 4), and 6 through the deep path (features 2 to 7). Note that 3 features will go through both (features 2, 3 and 4)

=>  This model is a wide & deep neural network,
Some features go through a wide path, some through a deep path.
The two paths are combined, and the final result is predicted by a single output layer.



In [1]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import fetch_california_housing

In [2]:
# Download California Housing data
housing = fetch_california_housing()
X, y = housing.data, housing.target

In [3]:
# Split data into training, validation, and testing
X_train_full, X_test, y_train_full, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full, test_size=0.2, random_state=42)

In [4]:
# **Normalize the data** To bring the values ​​into a suitable range
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)

# Split data into wide and deep inputs
X_train_A, X_train_B = X_train[:, :5], X_train[:, 2:] # 5 features for Wide, 6 features for Deep
X_valid_A, X_valid_B = X_valid[:, :5], X_valid[:, 2:]
X_test_A, X_test_B = X_test[:, :5], X_test[:, 2:]

In [5]:
# Building a Wide & Deep Model
input_A = keras.layers.Input(shape=[5], name="wide_input")
input_B = keras.layers.Input(shape=[6], name="deep_input")
# input_A,input_B : features overlap
# 5,6 : number of features

hidden1 = keras.layers.Dense(30, activation="relu")(input_B)
hidden2 = keras.layers.Dense(30, activation="relu")(hidden1)

concat = keras.layers.concatenate([input_A, hidden2])
output = keras.layers.Dense(1, name="output")(concat)

model = keras.models.Model(inputs=[input_A, input_B], outputs=[output])

In [6]:
# Compile the model
model.compile(loss="mse", optimizer=keras.optimizers.SGD(learning_rate=1e-3))

In [7]:
# Training the model
history = model.fit((X_train_A, X_train_B), y_train, epochs=20,
                    validation_data=((X_valid_A, X_valid_B), y_valid))

Epoch 1/20
[1m413/413[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step - loss: 3.2694 - val_loss: 1.1872
Epoch 2/20
[1m413/413[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - loss: 0.8485 - val_loss: 0.7354
Epoch 3/20
[1m413/413[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - loss: 0.6656 - val_loss: 0.6551
Epoch 4/20
[1m413/413[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - loss: 0.6059 - val_loss: 0.6176
Epoch 5/20
[1m413/413[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - loss: 0.5702 - val_loss: 0.5810
Epoch 6/20
[1m413/413[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - loss: 0.5401 - val_loss: 0.5522
Epoch 7/20
[1m413/413[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - loss: 0.5280 - val_loss: 0.5339
Epoch 8/20
[1m413/413[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - loss: 0.5021 - val_loss: 0.5211
Epoch 9/20
[1m413/413[0m [32m━━━━━━━━

In [8]:
# Evaluating the model on test data
mse_test = model.evaluate((X_test_A, X_test_B), y_test)
print(f"Mean Squared Error on Test Set: {mse_test}")

[1m129/129[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.4549
Mean Squared Error on Test Set: 0.45564085245132446


In [9]:
# Make Predictions
X_new_A, X_new_B = X_test_A[:3], X_test_B[:3]
y_pred = model.predict((X_new_A, X_new_B))
print("Predictions:", y_pred)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 78ms/step
Predictions: [[0.5517744]
 [1.9098816]
 [2.9944687]]


Example

T5 (Text-to-Text Transfer Transformer) is a model that reformulates all machine learning tasks as text-to-text problems. The professor linked T5 to the Wide & Deep Network because both architectures process data through two separate paths.

The Wide Path captures simple patterns and general features.
The Deep Path extracts complex representations using deep neural networks.
In T5, a flag (identifier) is added to the input text to specify which task the model should perform. This is similar to Wide & Deep Networks, where data is processed through different paths before merging for better accuracy.











---


Saving and loading models

In [11]:
model.save('checkpoint.keras')
new_model = tf.keras.models.load_model('checkpoint.keras')
new_model.summary()
new_model.predict((X_new_A,X_new_B))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 134ms/step


array([[0.5517744],
       [1.9098816],
       [2.9944687]], dtype=float32)