# Introducing the Keras Functional API and Feature Engineering

**Learning Objectives**
1. Understand the Keras Functional API for flexible model building.
2. Implement Wide & Deep models (memorization + generalization).
3. Use Keras preprocessing layers (`Discretization`, `HashedCrossing`, `Embedding`).
4. Apply custom transformations with `Lambda` layers.

### Introduction

In the last notebook, we learned about the Keras Sequential API. The [Keras Functional API](https://www.tensorflow.org/guide/keras#functional_api) provides an alternative way of building models that is more flexible. With the Functional API, we can build models with more complex topologies, multiple input or output layers, shared layers or non-sequential data flows (e.g. residual layers).

In this notebook, first, we'll use what we learned about preprocessing layers to build a Wide & Deep model, and then apply additioanl feature engineering method using Lambda layers.

### Wide & Deep Models
Recall that the idea behind Wide & Deep models is to join the two methods of learning through memorization and generalization by making a wide linear model and a deep learning model to accommodate both. You can have a look at the original research paper here: [Wide & Deep Learning for Recommender Systems](https://arxiv.org/abs/1606.07792).

<img src='assets/wide_deep.png' width='80%'>
<sup>(image: https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html)</sup>

The wide part of the model is associated with the memory element. In this case, we train a linear model with a wide set of crossed features and learn the correlation of this related data with the assigned label. The deep part of the model is associated with the generalization element where we use embedding vectors for features. The best embeddings are then learned through the training process. While both of these methods can work well alone, Wide & Deep models excel by combining these techniques together. 

## Setup
Start by importing the necessary libraries for this lab.

In [None]:
import os

os.environ["KERAS_BACKEND"] = "tensorflow"

import datetime
import shutil

import keras
import numpy as np
import pandas as pd
import tensorflow as tf
from keras import Model
from keras.callbacks import TensorBoard
from keras.layers import (
    CategoryEncoding,
    Concatenate,
    Dense,
    Discretization,
    Embedding,
    Flatten,
    HashedCrossing,
    Input,
    Lambda,
)
from matplotlib import pyplot as plt

In [None]:
%matplotlib inline

### Load raw data 

We will use the taxifare dataset, using the CSV files that we created in the first notebook of this sequence. Those files have been saved into `../data`.

In [None]:
!ls -l ../data/*.csv

### Use tf.data to read the CSV files

We wrote these functions for reading data from the CSV files above in the [previous notebook](2_dataset_api.ipynb).

The `tf.data` API efficiently loads and preprocesses data. 
- `parse_csv`: Parses a CSV row into features and a label. Features are returned as a tuple for Functional API compatibility with multiple inputs.
- `create_dataset`: Builds a `tf.data.Dataset` from CSV files, including mapping `parse_csv`, repeating, shuffling (for training), and batching.

In [None]:
def parse_csv(row):
    ds = tf.strings.split(row, ",")
    # Label: fare_amount
    label = tf.strings.to_number(ds[0])
    # Feature: pickup_longitude, pickup_latitude, dropoff_longitude, dropoff_latitude
    feature = tf.strings.to_number(ds[2:6])  # use some features only
    # Passing feature in tuple so that we can handle them separately.
    return (feature[0], feature[1], feature[2], feature[3]), label


def create_dataset(pattern, batch_size, mode="eval"):
    ds = tf.data.TextLineDataset(pattern)
    ds = ds.map(parse_csv).repeat()
    if mode == "train":
        ds.shuffle(buffer_size=1000)
    ds = ds.batch(batch_size, drop_remainder=True)
    return ds

## Build a Wide & Deep Model with the Keras Functional API

We'll construct a Wide & Deep model, which has two parts:
1.  **Wide Path:** For memorizing specific feature interactions (often using categorical/crossed features).
2.  **Deep Path:** For generalizing patterns (often using numerical features/embeddings through a DNN).
The outputs are then combined for prediction. The Functional API handles this complex structure.

### Define Input Layers

Using the Functional API, each input feature gets its own [`keras.Input` layer](https://keras.io/api/layers/core_layers/input/). These define the entry points for data.
We create `Input` objects for our four coordinate features, specifying their `name`, `shape` (scalar), and `dtype`.

In [None]:
INPUT_COLS = [
    "pickup_longitude",
    "pickup_latitude",
    "dropoff_longitude",
    "dropoff_latitude",
]

inputs = {
    colname: Input(name=colname, shape=(1,), dtype="float32")
    for colname in INPUT_COLS
}

### Define Preprocessing Logic

Next, we define preprocessing for our features.
1.  **Bucketization:** `Discretization` layer converts continuous latitude/longitude inputs into categorical features by assigning them to predefined buckets (`latbuckets`, `lonbuckets`).
2.  **Feature Crossing:** `HashedCrossing` layer combines these bucketized features to create interaction features (e.g., pickup location, dropoff location, and pickup-dropoff interaction). This is useful for the wide part of the model.

With the Functional API, you can define each connection independently, which allows for more intricate model architectures.

In [None]:
dnn_hidden_units = [32, 8]
NBUCKETS = 16

# Define Bucketization boundaries
latbuckets = np.linspace(start=40.5, stop=41.0, num=NBUCKETS).tolist()
lonbuckets = np.linspace(start=-74.2, stop=-73.7, num=NBUCKETS).tolist()

# Bucketization with Discretization layer
plon = Discretization(lonbuckets, name="plon_bkt")(inputs["pickup_longitude"])
plat = Discretization(latbuckets, name="plat_bkt")(inputs["pickup_latitude"])
dlon = Discretization(lonbuckets, name="dlon_bkt")(inputs["dropoff_longitude"])
dlat = Discretization(latbuckets, name="dlat_bkt")(inputs["dropoff_latitude"])

# Feature Cross with HashedCrossing layer
p_fc = HashedCrossing(num_bins=(NBUCKETS + 1) ** 2, name="p_fc")((plon, plat))
d_fc = HashedCrossing(num_bins=(NBUCKETS + 1) ** 2, name="d_fc")((dlon, dlat))
pd_fc = HashedCrossing(num_bins=(NBUCKETS + 1) ** 4, name="pd_fc")((p_fc, d_fc))

### Build the Deep Path

The deep path handles generalization:
1.  The `pd_fc` crossed feature is embedded using an `Embedding` layer to create dense vector representations.
2.  These embeddings are concatenated with the original numerical input coordinates.
3.  The result is passed through a stack of `Dense` layers with ReLU activation.

In [None]:
# Embedding with Embedding layer
pd_embed = Embedding(
    input_dim=(NBUCKETS + 1) ** 4, output_dim=10, name="pd_embed"
)(pd_fc)

# Concatenate and define inputs for deep network
deep = Concatenate(name="deep_input")(
    [
        inputs["pickup_longitude"],
        inputs["pickup_latitude"],
        inputs["dropoff_longitude"],
        inputs["dropoff_latitude"],
        Flatten(name="flatten_embedding")(pd_embed),
    ]
)

# Add hidden Dense layers
for i, num_nodes in enumerate(dnn_hidden_units, start=1):
    deep = Dense(num_nodes, activation="relu", name=f"hidden_{i}")(deep)

### Build the Wide Path

The wide path handles memorization:
1.  The crossed features (`p_fc`, `d_fc`, `pd_fc`) are one-hot encoded using the `CategoryEncoding` layer.
2.  These one-hot encoded features are then concatenated to form the input for the wide part of the model.

In [None]:
# Onehot Encoding with CategoryEncoding layer
p_onehot = CategoryEncoding(num_tokens=(NBUCKETS + 1) ** 2, name="p_onehot")(
    p_fc
)
d_onehot = CategoryEncoding(num_tokens=(NBUCKETS + 1) ** 2, name="d_onehot")(
    d_fc
)
pd_onehot = CategoryEncoding(num_tokens=(NBUCKETS + 1) ** 4, name="pd_onehot")(
    pd_fc
)

# Concatenate and define inputs for wide network
wide = Concatenate(name="wide_input")([p_onehot, d_onehot, pd_onehot])

### Combine Wide and Deep Paths

The outputs of the `deep` and `wide` paths are concatenated. This combined tensor is then fed into a final `Dense` layer with one unit (and no activation for regression) to produce the prediction.

In [None]:
# Concatenate wide & deep networks
concat = Concatenate(name="concatenate")([deep, wide])

# Define the final output layer
output = Dense(1, activation=None, name="output")(concat)

Then, we'll define our custom RMSE evaluation metric and build our wide and deep model.

In [None]:
def rmse(y_true, y_pred):
    squared_error = tf.keras.ops.square(y_pred[:, 0] - y_true)
    return tf.keras.ops.sqrt(tf.keras.ops.mean(squared_error))

### Instantiate and Compile the Model

In Functional API, `keras.Model` is used to define the model by specifying its `inputs` (our dictionary of `Input` layers) and `outputs` (the final `Dense` layer).

In [None]:
model = Model(inputs=list(inputs.values()), outputs=output)

model.compile(optimizer="adam", loss="mse", metrics=[rmse], run_eagerly=True)

In Functional API, `tf.keras.utils.plot_model` generates a diagram of the model structure.

In [None]:
tf.keras.utils.plot_model(model, show_shapes=False, rankdir="LR")

### Train the Wide & Deep Model

Next, we'll set up our training variables, create our datasets for training and validation, and train our model.

(We refer you to the blog post [ML Design Pattern #3: Virtual Epochs](https://medium.com/google-cloud/ml-design-pattern-3-virtual-epochs-f842296de730) for further details on why we express the training in terms of `NUM_TRAIN_EXAMPLES` and `NUM_EVALS` and why, in this training code, the number of epochs is really equal to the number of evaluations we perform.)

In [None]:
BATCH_SIZE = 64
NUM_TRAIN_EXAMPLES = 10000 * 10  # training dataset will repeat, wrap around
NUM_EVALS = 10  # how many times to evaluate
NUM_EVAL_EXAMPLES = 1000  # enough to get a reasonable sample

trainds = create_dataset(
    pattern="../data/taxi-train.csv", batch_size=BATCH_SIZE, mode="train"
)

evalds = create_dataset(
    pattern="../data/taxi-valid.csv", batch_size=BATCH_SIZE, mode="eval"
).take(NUM_EVAL_EXAMPLES // BATCH_SIZE)

In [None]:
%%time
steps_per_epoch = NUM_TRAIN_EXAMPLES // (BATCH_SIZE * NUM_EVALS)

OUTDIR = "./taxi_trained"
shutil.rmtree(path=OUTDIR, ignore_errors=True)  # start fresh each time

history = model.fit(
    x=trainds,
    steps_per_epoch=steps_per_epoch,
    epochs=NUM_EVALS,
    validation_data=evalds,
    callbacks=[TensorBoard(OUTDIR)],
)

Just as before, we can examine the history to see how the RMSE changes through training on the training set and validation set. 

In [None]:
RMSE_COLS = ["rmse", "val_rmse"]

pd.DataFrame(history.history)[RMSE_COLS].plot()

---
## Improve Model Performance with Custom Feature Engineering

Next, we'll try to improve performance by adding more feature engineering:
1.  **Normalization:** Applied to coordinates before distance calculation and other processing.
2.  **Euclidean Distance:** Calculated using a `Lambda` layer.

For simplicity, we'll build a DNN model here, but these techniques apply to Wide & Deep models too.

### Setup Feature Normalization with `Normalization` Layer

The `keras.layers.Normalization` layer standardizes features by scaling them to have zero mean and unit variance.

Since it requires some states (mean and variance), we'll need to either:
1. Precompute the state values and instantiate the layer with it.
```python
keras.layers.Normalization(mean=..., variance=...)
```
2. Or, compute the values using the `adapt()` method.

Here let's take a look at the latter option.

We first load data to compute these statistics. Here we retrieve the latitude and longitude columns.

In [None]:
CSV_COLUMNS = [
    "fare_amount",
    "pickup_datetime",
    "pickup_longitude",
    "pickup_latitude",
    "dropoff_longitude",
    "dropoff_latitude",
    "passenger_count",
    "key",
]

df = pd.read_csv("../data/taxi-train.csv", names=CSV_COLUMNS)
lat_values = pd.concat(
    [df["pickup_latitude"], df["dropoff_latitude"]], ignore_index=True
).to_numpy()
lon_values = pd.concat(
    [df["pickup_longitude"], df["dropoff_longitude"]], ignore_index=True
).to_numpy()

Then, we instantiate `Normalization` layers (`lat_scaler`, `lon_scaler`) and use their `adapt()` method on the loaded latitude and longitude values to learn the mean and variance.
These adapted layers can then be used in the model to apply the learned normalization.

In [None]:
lat_scaler = keras.layers.Normalization(axis=None)
lon_scaler = keras.layers.Normalization(axis=None)

lat_scaler.adapt(lat_values)
lon_scaler.adapt(lon_values)

print("Computed statistics for latitude:")
print(f"mean: {lat_scaler.mean}, variance: {lat_scaler.variance}")
print(f"+++++")
print("Computed statistics for latitude:")
print("mean: {lon_scaler.mean}, variance: {lon_scaler.variance}")

### Define Input Layers

In [None]:
INPUT_COLS = [
    "pickup_longitude",
    "pickup_latitude",
    "dropoff_longitude",
    "dropoff_latitude",
]

# input layer is all float
inputs = {
    colname: Input(name=colname, shape=(1,), dtype="float32")
    for colname in INPUT_COLS
}

### Custom Feature: Euclidean Distance with a `Lambda` Layer

The `euclidean` function calculates straight-line distance. We'll use a [`keras.layers.Lambda` layer](https://keras.io/api/layers/core_layers/lambda/) later to wrap this function, allowing its direct integration into our Keras model for feature engineering. This keeps preprocessing bundled with the model.

In [None]:
def euclidean(params):
    lon1, lat1, lon2, lat2 = params
    londiff = lon2 - lon1
    latdiff = lat2 - lat1
    return tf.sqrt(londiff * londiff + latdiff * latdiff)

### Define Preprocessing, Normalization, and Lambda Layer Integration

Applying the feature engineering steps:
1. **Bucket Boundaries:** Now adjusted for normalized data (range `[-5, 5]`).
2. **Normalization:** Raw coordinates are scaled using the adapted `lon_scaler` and `lat_scaler`.
3. **Lambda Layer:** `euclidean` function calculates distance on these *normalized* coordinates.
4. **Discretization:** Normalized coordinates are bucketized.
5. **Feature Crossing & Embedding:** Applied to the (now normalized and discretized) features.
6. **Concatenate:** The `euclidean_distance` and the final `pd_embed` are combined to be fed into the DNN.

In [None]:
NBUCKETS = 16

latbuckets = np.linspace(start=-5, stop=5, num=NBUCKETS).tolist()
lonbuckets = np.linspace(start=-5, stop=5, num=NBUCKETS).tolist()

# Normalize longitude
scaled_plon = lon_scaler(inputs["pickup_longitude"])
scaled_dlon = lon_scaler(inputs["dropoff_longitude"])

# Normalize latitude
scaled_plat = lat_scaler(inputs["pickup_latitude"])
scaled_dlat = lat_scaler(inputs["dropoff_latitude"])

# Lambda layer for the custom euclidean function
euclidean_distance = Lambda(euclidean, name="euclidean")(
    [scaled_plon, scaled_plat, scaled_dlon, scaled_dlat]
)

# Discretization
plon = Discretization(lonbuckets, name="plon_bkt")(scaled_plon)
plat = Discretization(latbuckets, name="plat_bkt")(scaled_plat)
dlon = Discretization(lonbuckets, name="dlon_bkt")(scaled_dlon)
dlat = Discretization(latbuckets, name="dlat_bkt")(scaled_dlat)


# Feature Cross with HashedCrossing layer
p_fc = HashedCrossing(num_bins=(NBUCKETS + 1) ** 2, name="p_fc")((plon, plat))
d_fc = HashedCrossing(num_bins=(NBUCKETS + 1) ** 2, name="d_fc")((dlon, dlat))
pd_fc = HashedCrossing(num_bins=(NBUCKETS + 1) ** 4, name="pd_fc")((p_fc, d_fc))

# Embedding with Embedding layer
pd_embed = Flatten()(
    Embedding(input_dim=(NBUCKETS + 1) ** 4, output_dim=10, name="pd_embed")(
        pd_fc
    )
)

deep = Concatenate()([euclidean_distance, pd_embed])

### Define the DNN Layers

The concatenated `euclidean_distance` and `pd_embed` tensor is passed through `Dense` layers.

In [None]:
dnn_hidden_units = [32, 8]

# Add hidden Dense layers
for i, num_nodes in enumerate(dnn_hidden_units, start=1):
    deep = Dense(num_nodes, activation="relu", name=f"hidden_{i}")(deep)

# final output is a linear activation because this is regression
output = Dense(1, activation="linear", name="fare")(deep)

### Instantiate and Compile the Engineered Model

Define the Keras Model with the original inputs and the final engineered output.

In [None]:
model = keras.Model(inputs=list(inputs.values()), outputs=output)

# Compile model
model.compile(optimizer="adam", loss="mse", metrics=[rmse], run_eagerly=True)

Let's see how our model architecture has changed now.

In [None]:
tf.keras.utils.plot_model(model, show_shapes=False, rankdir="LR")

### Train the Engineered Model

Train the new model using the same setup as before.

In [None]:
BATCH_SIZE = 64
NUM_TRAIN_EXAMPLES = 10000 * 10  # training dataset will repeat, wrap around
NUM_EVALS = 10  # how many times to evaluate
NUM_EVAL_EXAMPLES = 1000

In [None]:
trainds = create_dataset(
    pattern="../data/taxi-train.csv", batch_size=BATCH_SIZE, mode="train"
)

evalds = create_dataset(
    pattern="../data/taxi-valid.csv", batch_size=BATCH_SIZE, mode="eval"
).take(NUM_EVAL_EXAMPLES // BATCH_SIZE)

steps_per_epoch = NUM_TRAIN_EXAMPLES // (BATCH_SIZE * NUM_EVALS)

history = model.fit(
    trainds,
    validation_data=evalds,
    epochs=NUM_EVALS,
    steps_per_epoch=steps_per_epoch,
)

Plot the RMSE to compare its performance against the first model.

In [None]:
RMSE_COLS = ["rmse", "val_rmse"]

pd.DataFrame(history.history)[RMSE_COLS].plot()

Copyright 2025 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.