# Advanced Feature Engineering in Keras 

## Learning Objectives

1. Process temporal feature columns in Keras.
2. Use Keras layers to perform feature engineering on geolocation features.
3. Create bucketized features, feature crosses and embeddings to combine geospatial and temporal features
 

## Introduction 

In this notebook, we use Keras to build a taxifare price prediction model and utilize feature engineering to improve the fare amount prediction for NYC taxi cab rides. 

Each learning objective will correspond to a __#TODO__ in the [student lab notebook](../labs/4_keras_adv_feat_eng-lab.ipynb) -- try to complete that notebook first before reviewing this solution notebook.

## Set up environment variables and load necessary libraries 
We will start by importing the necessary libraries for this lab.

In [None]:
# Run the chown command to change the ownership of the repository
!sudo chown -R jupyter:jupyter /home/jupyter/training-data-analyst

In [None]:
# You can use any Python source file as a module by executing an import statement in some other Python source file.
# The import statement combines two operations; it searches for the named module, then it binds the results of that search
# to a name in the local scope.
import datetime
import logging
import os

import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow_addons as tfa
from tensorflow.keras import layers, models

# set TF error log verbosity
logging.getLogger("tensorflow").setLevel(logging.ERROR)

print(tf.version.VERSION)

## Load taxifare dataset

The Taxi Fare dataset for this lab is 106,545 rows and has been pre-processed and split for use in this lab.  Note that the dataset is the same as used in the Big Query feature engineering labs.  The fare_amount is the target, the continuous value we’ll train a model to predict.  

First, let's download the  .csv data by copying the data from a cloud storage bucket.

In [None]:
# `os.makedirs()` method will create all unavailable/missing directory in the specified path.
if not os.path.isdir("../data"):
    os.makedirs("../data")

In [None]:
# The `gsutil cp` command allows you to copy data between the bucket and current directory.
!gsutil cp gs://cloud-training/mlongcp/v3.0_MLonGC/toy_data/taxi-train_toy.csv ../data
!gsutil cp gs://cloud-training/mlongcp/v3.0_MLonGC/toy_data/taxi-valid_toy.csv ../data

Let's check that the files were copied correctly and look like we expect them to.

In [None]:
# `ls` shows the working directory's contents.
# The `l` flag list the all files with permissions and details.
!ls -l ../data/*.csv

In [None]:
# By default `head` returns the first ten lines of each file.
!head ../data/*.csv

## Create an input pipeline 

Typically, you will use a two step process to build the pipeline. Step one is to define the columns of data; i.e., which column we're predicting for, and the default values.  Step 2 is to define two functions - a function to define the features and label you want to use and a function to load the training data.  Also, note that pickup_datetime is a string and we will need to handle this in our feature engineered model.  


In [None]:
CSV_COLUMNS = [
    "fare_amount",
    "pickup_datetime",
    "pickup_longitude",
    "pickup_latitude",
    "dropoff_longitude",
    "dropoff_latitude",
    "passenger_count",
    "key",
]
LABEL_COLUMN = "fare_amount"
STRING_COLS = ["pickup_datetime"]
NUMERIC_COLS = [
    "pickup_longitude",
    "pickup_latitude",
    "dropoff_longitude",
    "dropoff_latitude",
    "passenger_count",
]
DEFAULTS = [[0.0], ["na"], [0.0], [0.0], [0.0], [0.0], [0.0], ["na"]]
DAYS = ["Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"]

In [None]:
# A function to define features and labesl
def features_and_labels(row_data):
    for unwanted_col in ["key"]:
        row_data.pop(unwanted_col)
    label = row_data.pop(LABEL_COLUMN)
    return row_data, label


# A utility method to create a tf.data dataset from a Pandas Dataframe
def load_dataset(pattern, batch_size=1, mode="eval"):
    dataset = tf.data.experimental.make_csv_dataset(
        pattern, batch_size, CSV_COLUMNS, DEFAULTS
    )
    dataset = dataset.map(features_and_labels)  # features, label
    if mode == "train":
        dataset = dataset.shuffle(1000).repeat()
        # take advantage of multi-threading; 1=AUTOTUNE
        dataset = dataset.prefetch(1)
    if mode == "preprocess":
        # For preprocessing steps that require calling the adapt function
        dataset = tf.data.experimental.make_csv_dataset(
            pattern, batch_size, CSV_COLUMNS, DEFAULTS, num_epochs=1
        )
        dataset = dataset.map(features_and_labels)
    return dataset

## Create a Baseline DNN Model in Keras

Now let's build the Deep Neural Network (DNN) model in Keras using the functional API. Unlike the sequential API, we will need to specify the input and hidden layers.  Note that we are creating a linear regression baseline model with no feature engineering. Recall that a baseline model is a solution to a problem without applying any machine learning techniques.

In [None]:
# Build a simple Keras DNN using its Functional API
def rmse(y_true, y_pred):  # Root mean square error
    return tf.sqrt(tf.reduce_mean(tf.square(y_pred - y_true)))


def build_dnn_model():
    # input layer
    inputs = {
        colname: layers.Input(name=colname, shape=(1,), dtype="float32")
        for colname in NUMERIC_COLS
    }

    # Keras layer which receives the input
    dnn_inputs = layers.concatenate(inputs.values(), name="concatenate_inputs")

    # two hidden layers of [32, 8] just in like the BQML DNN
    h1 = layers.Dense(32, activation="relu", name="h1")(dnn_inputs)
    h2 = layers.Dense(8, activation="relu", name="h2")(h1)

    # final output is a linear activation because this is regression
    output = layers.Dense(1, activation="linear", name="fare")(h2)
    model = models.Model(inputs, output)

    # compile model
    model.compile(optimizer="adam", loss="mse", metrics=[rmse, "mse"])

    return model

We'll build our DNN model and inspect the model architecture.

In [None]:
model = build_dnn_model()

# We can visualize the DNN using the Keras `plot_model` utility.
tf.keras.utils.plot_model(model, "dnn_model.png", show_shapes=False, rankdir="LR")

## Train the model

To train the model, simply call [model.fit()](https://keras.io/models/model/#fit).  Note that we should really use many more NUM_TRAIN_EXAMPLES (i.e. a larger dataset). We shouldn't make assumptions about the quality of the model based on training/evaluating it on a small sample of the full data.

We start by setting up the environment variables for training, creating the input pipeline datasets, and then train our baseline DNN model.

In [None]:
TRAIN_BATCH_SIZE = 32
NUM_TRAIN_EXAMPLES = 59621 * 5
NUM_EVALS = 5
NUM_EVAL_EXAMPLES = 14906
steps_per_epoch = NUM_TRAIN_EXAMPLES // (TRAIN_BATCH_SIZE * NUM_EVALS)

In [None]:
# `load_dataset` method is used to load the dataset.
trainds = load_dataset("../data/taxi-train_toy*", TRAIN_BATCH_SIZE, "train")
evalds = load_dataset("../data/taxi-valid.csv*", 1000, "eval").take(
    NUM_EVAL_EXAMPLES // 1000
)
pretrainds = load_dataset("../data/taxi-train_toy*", TRAIN_BATCH_SIZE, "preprocess")


# `Fit` trains the model for a fixed number of epochs
history = model.fit(
    trainds, validation_data=evalds, epochs=NUM_EVALS, steps_per_epoch=steps_per_epoch
)

### Visualize the model loss curve

Next, we will use matplotlib to draw the model's loss curves for training and validation.  A line plot is also created showing the mean squared error loss over the training epochs for both the train (blue) and test (orange) sets.

In [None]:
# A function to define plot_curves.
def plot_curves(history, metrics):
    nrows = 1
    ncols = 2
    fig = plt.figure(figsize=(10, 5))

    for idx, key in enumerate(metrics):
        fig.add_subplot(nrows, ncols, idx + 1)
        plt.plot(history.history[key])
        plt.plot(history.history["val_{}".format(key)])
        plt.title("model {}".format(key))
        plt.ylabel(key)
        plt.xlabel("epoch")
        plt.legend(["train", "validation"], loc="upper left");

In [None]:
plot_curves(history, ["loss", "mse"])

### Predict with the model locally

To predict with Keras, you simply call [model.predict()](https://keras.io/models/model/#predict) and pass in the cab ride you want to predict the fare amount for.  Next we note the fare price at this geolocation and pickup_datetime.

In [None]:
# Use the model to do prediction with `model.predict()`.
model.predict(
    {
        "pickup_longitude": tf.convert_to_tensor([-73.982683]),
        "pickup_latitude": tf.convert_to_tensor([40.742104]),
        "dropoff_longitude": tf.convert_to_tensor([-73.983766]),
        "dropoff_latitude": tf.convert_to_tensor([40.755174]),
        "passenger_count": tf.convert_to_tensor([3.0]),
        "pickup_datetime": tf.convert_to_tensor(
            ["2010-02-08 09:17:00 UTC"], dtype=tf.string
        ),
    },
    steps=1,
)

## Improve Model Performance Using Feature Engineering 

We now improve our model's performance by creating the following feature engineering types:  Temporal, Categorical, and Geolocation. 

### Temporal Feature Columns

We incorporate the temporal feature pickup_datetime.  As noted earlier, pickup_datetime is a string and we will need to handle this within the model.  First, you will include the pickup_datetime as a feature and then you will need to modify the model to handle our string feature. There are many different alternatives for extracting the temporal information from the raw data.  One is to embed python functions and embed them in Lambda layers, another is to use native tensorflow functions to extract this information.  The latter method is implemented in this solution.

In [None]:
# TODO 1a
def parse_datetime(s):
    if type(s) is not str:
        s = s.numpy().decode("utf-8")
    return datetime.datetime.strptime(s, "%Y-%m-%d %H:%M:%S %Z")


# TODO 1b
def get_dayofweek(s):
    ts = parse_datetime(s)
    return DAYS[ts.weekday()]


# TODO 1c
@tf.function
def dayofweek(ts_in):
    return tf.map_fn(
        lambda s: tf.py_function(get_dayofweek, inp=[s], Tout=tf.string), ts_in
    )

### Geolocation/Coordinate Feature Columns

The pick-up/drop-off longitude and latitude data are crucial to predicting the fare amount as fare amounts in NYC taxis are largely determined by the distance traveled. As such, we need to teach the model the Euclidean distance between the pick-up and drop-off points.

Recall that latitude and longitude allows us to specify any location on Earth using a set of coordinates. In our training data set, we restricted our data points to only pickups and drop offs within NYC. New York city has an approximate longitude range of -74.05 to -73.75 and a latitude range of 40.63 to 40.85.

#### Computing Euclidean distance
The dataset contains information regarding the pickup and drop off coordinates. However, there is no information regarding the distance between the pickup and drop off points. Therefore, we create a new feature that calculates the distance between each pair of pickup and drop off points. We can do this using the Euclidean Distance, which is the straight-line distance between any two coordinate points. This distance is computed in the transform function below.  Note that lon1, lon2, etc., below represent tensors, and the math operations are overloaded.  As such, the key to defining new features is to understand how tensors are manipulated symbolically.

In [None]:
def euclidean(params):
    lon1, lat1, lon2, lat2 = params
    londiff = lon2 - lon1
    latdiff = lat2 - lat1
    return tf.sqrt(londiff * londiff + latdiff * latdiff)

#### Scaling latitude and longitude

It is very important for numerical variables to get scaled before they are "fed" into the neural network. Here we use min-max scaling (also called normalization) on the geolocation features.  Later in our model, you will see that these values are shifted and rescaled so that they end up ranging from 0 to 1.

First, we create a function named 'scale_longitude', where we pass in all the longitudinal values and add 78 to each value.  Note that our scaling longitude ranges from -70 to -78. Thus, the value 78 is the maximum longitudinal value.  The delta or difference between -70 and -78 is 8.  We add 78 to each longitudinal value and then divide by 8 to return a scaled value. The scale longitude function can be included in the transform function below because these mathematical operators are overloaded by tensorflow.

In [None]:
def scale_longitude(lon_column):
    return (lon_column + 78) / 8.0

Next, we create a function named 'scale_latitude', where we pass in all the latitudinal values and subtract 37 from each value.  Note that our scaling longitude ranges from -37 to -45. Thus, the value 37 is the minimal latitudinal value.  The delta or difference between -37 and -45 is 8.  We subtract 37 from each latitudinal value and then divide by 8 to return a scaled value.

In [None]:
def scale_latitude(lat_column):
    return (lat_column - 37) / 8.0

### Normalizing features

Often there is some amount of preprocessing that can be done ahead of time to scale the data.  In this case, create_normalizer is learning the distribution of the passenger_count feature so that at training and inference time, it can scale passenger_count using the mean and variance learned at pre-processing time.

In [None]:
def create_normalizer(dataset, feature_name):
    normalizer = tf.keras.layers.Normalization(
        axis=None, name=f"{feature_name}_normalizer"
    )
    feature_ds = dataset.map(lambda X, y: X[feature_name])
    normalizer.adapt(feature_ds)
    return normalizer


passenger_count_normalizer = create_normalizer(pretrainds, "passenger_count")

### Putting it all together
We now create two new "geo" functions for our model.  We create a function called "euclidean" to initialize our geolocation parameters.  We then create a function called transform.   The transform function passes our numerical and string column features as inputs to the model, scales geolocation features, then creates the Euclidean distance as a transformed variable with the geolocation features. We then bucketize the latitude and longitude features so that we can represent geographical concepts at a coarser level than longitude and latitude.  We then create feature crosses to represent higher level concepts such as origin and destination.  Building upon this, we create a feature representing the trip itself, which includes the origin and destination.  Since a trip at 3am is different than one at 8am on a Monday in New York, we feed the model information about time.  We then combine the trip and time information in an embedding and feed it to the model so that it can learn efficient representations of these higher level and higher dimensional concepts.  Note that this leads to very sparse features, which require us to have a lot of data so that we can adhere to the 'Rule of five'.

In [None]:
def transform(inputs, numeric_cols, string_cols, nbuckets):

    # We are going to return both features, which we enriched, and embeddings, which the model will enrich.
    transformed = {}
    embeddings = {}

    # Here, inputs['passenger_count'] is a symbolic placeholder for the tensors that will be
    # flowing through this transformation. We are using our normalizer, which we adapted
    # before training began.

    transformed["passenger_count"] = passenger_count_normalizer(
        inputs["passenger_count"]
    )

    # Note how we could have used the scale_longitude function we defined above.  The two
    # statements below are equivalent.
    for lon_col in ["pickup_longitude", "dropoff_longitude"]:
        # transformed[lon_col] = scale_longitude(inputs[lon_col])
        transformed[lon_col] = (inputs[lon_col] + 78) / 8.0

    for lat_col in ["pickup_latitude", "dropoff_latitude"]:
        transformed[lat_col] = (inputs[lat_col] - 37) / 8.0

    # Here we compute the euclidean distance between the origin of the trip and the destination
    position_difference = tf.square(
        inputs["dropoff_longitude"] - inputs["pickup_longitude"]
    )
    position_difference += tf.square(
        inputs["dropoff_latitude"] - inputs["pickup_latitude"]
    )
    transformed["euclidean"] = tf.sqrt(position_difference)

    # We are taking a continuous variable and making it discrete (categorical) with the intention
    # of "crossing" it with another feature
    lat_lon_buckets = [bin_edge / nbuckets for bin_edge in range(1, nbuckets)]
    discretization_layer = tf.keras.layers.Discretization(
        bin_boundaries=lat_lon_buckets, output_mode="int"
    )

    # Here discretization_layer is putting the data in ten buckets.  Note that it did not need to
    # be adapted, since we are defining the bucket boundaries in this case.  You can
    # have it learn the bucket boundaries if you so choose. Note that we are not yet one-hot
    # encoding this feature (output_mode='int') as we will compute a hash of this feature, which
    # we can only do with integers and strings

    bucketed_pickup_longitude_intermediary = discretization_layer(
        transformed["pickup_longitude"]
    )
    bucketed_pickup_latitude_intermediary = discretization_layer(
        transformed["pickup_latitude"]
    )
    bucketed_dropoff_longitude_intermediary = discretization_layer(
        transformed["dropoff_longitude"]
    )
    bucketed_dropoff_latitude_intermediary = discretization_layer(
        transformed["dropoff_latitude"]
    )

    # We are storing float versions of this in our dictionary, which will be returned by our function and
    # connected via functional api to the model

    transformed["bucketed_pickup_longitude"] = tf.cast(
        bucketed_pickup_longitude_intermediary, tf.float32
    )
    transformed["bucketed_pickup_latitude"] = tf.cast(
        bucketed_pickup_latitude_intermediary, tf.float32
    )
    transformed["bucketed_dropoff_longitude"] = tf.cast(
        bucketed_dropoff_longitude_intermediary, tf.float32
    )
    transformed["bucketed_dropoff_latitude"] = tf.cast(
        bucketed_dropoff_latitude_intermediary, tf.float32
    )

    # Below we are computing higher level features that represent locations.  Note how we now
    # use one hot encoding.  This is because this particular feature will not contribute to any
    # future hashing operations.  We create another version of this feature below which will
    # contribute to even higher-level representations.

    hash_pickup_crossing_layer = (
        tf.keras.layers.experimental.preprocessing.HashedCrossing(
            output_mode="one_hot",
            num_bins=nbuckets**2,
            name="hash_pickup_crossing_layer",
        )
    )
    transformed["pickup_location"] = hash_pickup_crossing_layer(
        (bucketed_pickup_longitude_intermediary, bucketed_pickup_latitude_intermediary)
    )
    hash_dropoff_crossing_layer = (
        tf.keras.layers.experimental.preprocessing.HashedCrossing(
            output_mode="one_hot",
            num_bins=nbuckets**2,
            name="hash_dropoff_crossing_layer",
        )
    )
    transformed["dropoff_location"] = hash_dropoff_crossing_layer(
        (
            bucketed_dropoff_longitude_intermediary,
            bucketed_dropoff_latitude_intermediary,
        )
    )

    # We compute int representations of origin and destination
    hash_pickup_crossing_layer_intermediary = (
        tf.keras.layers.experimental.preprocessing.HashedCrossing(
            output_mode="int",
            num_bins=nbuckets**2,
            name="hash_pickup_crossing_intermediary",
        )
    )
    hashed_pickup_intermediary = hash_pickup_crossing_layer_intermediary(
        (bucketed_pickup_longitude_intermediary, bucketed_pickup_latitude_intermediary)
    )
    hash_dropoff_crossing_layer_intermediary = (
        tf.keras.layers.experimental.preprocessing.HashedCrossing(
            output_mode="int",
            num_bins=nbuckets**2,
            name="hash_dropoff_crossing_intermediary",
        )
    )
    hashed_dropoff_intermediary = hash_dropoff_crossing_layer_intermediary(
        (
            bucketed_dropoff_longitude_intermediary,
            bucketed_dropoff_latitude_intermediary,
        )
    )

    # We cross origin and destination to dervice a feature that represents the trip.  Remember
    # how we partitioned the city into a 10x10 grid? Theoretically, you can have a 100*100
    # trip combinations.  It is likely that not all of those are possible in real-life, therefore
    # we use only a 1000 buckets below instead of 10,000.
    hash_trip_crossing_layer = (
        tf.keras.layers.experimental.preprocessing.HashedCrossing(
            output_mode="one_hot",
            num_bins=nbuckets**3,
            name="hash_trip_crossing_layer",
        )
    )
    transformed["hashed_trip"] = hash_trip_crossing_layer(
        (hashed_pickup_intermediary, hashed_dropoff_intermediary)
    )

    # Here we create an embedding, which our model will use to bring trips with similar characteristics
    # together.
    trip_locations_embedding_layer = tf.keras.layers.Embedding(
        input_dim=nbuckets**3,
        output_dim=int(nbuckets**1.5),
        name="trip_locations_embedding_layer",
    )
    embeddings["trip_locations_embedding"] = trip_locations_embedding_layer(
        transformed["hashed_trip"]
    )

    # Now that we have dealt with space, let's give some time to temporal features.  We use a tensorflow addon function
    # to get the number of seconds since the beginning of the 70s and compute other temporal features.

    seconds_since_1970 = tfa.text.parse_time(
        inputs["pickup_datetime"], "%Y-%m-%d %H:%M:%S %Z", output_unit="SECOND"
    )
    seconds_since_1970 = tf.cast(seconds_since_1970, tf.float32)
    hours_since_1970 = seconds_since_1970 / 3600.0
    hours_since_1970 = tf.floor(hours_since_1970)
    hour_of_day_intermediary = hours_since_1970 % 24

    # Feeding our model hour of day allows it to spot cyclical patterns over the course of the day.
    transformed["hour_of_day"] = hour_of_day_intermediary
    hour_of_day_intermediary = tf.cast(hour_of_day_intermediary, tf.int32)
    days_since_1970 = seconds_since_1970 / (3600 * 24)
    days_since_1970 = tf.floor(days_since_1970)

    # January 1st 1970 was a Thursday, so we make it so that when it is
    # zero days since Jan 1st, 1970, we return a 4.
    day_of_week_intermediary = (days_since_1970 + 4) % 7
    transformed["day_of_week"] = day_of_week_intermediary
    day_of_week_intermediary = tf.cast(day_of_week_intermediary, tf.int32)
    hashed_crossing_layer = tf.keras.layers.experimental.preprocessing.HashedCrossing(
        num_bins=24 * 7, output_mode="one_hot"
    )
    hashed_crossing_layer_intermediary = (
        tf.keras.layers.experimental.preprocessing.HashedCrossing(
            num_bins=24 * 7, output_mode="int", name="hashed_hour_of_day_of_week_layer"
        )
    )

    # Here we send the model a signal about what hour of which day it is.  This is why we created
    # 24*7 buckets above
    transformed["hour_of_day_of_week"] = hashed_crossing_layer(
        (hour_of_day_intermediary, day_of_week_intermediary)
    )
    hour_of_day_of_week_intermediary = hashed_crossing_layer_intermediary(
        (hour_of_day_intermediary, day_of_week_intermediary)
    )

    # Now it is time to combine our geographical features with our temporal ones.  We create an
    # intermediary representation of our trip as an int, so that we can cross it with the temporal
    # feature that represents the hour and day of week together.
    hash_trip_crossing_layer_intermediary = (
        tf.keras.layers.experimental.preprocessing.HashedCrossing(
            output_mode="int", num_bins=nbuckets**3
        )
    )
    hashed_trip_intermediary = hash_trip_crossing_layer_intermediary(
        (hashed_pickup_intermediary, hashed_dropoff_intermediary)
    )

    hash_trip_and_time_layer = (
        tf.keras.layers.experimental.preprocessing.HashedCrossing(
            output_mode="one_hot",
            num_bins=(nbuckets**3) * 4,
            name="hash_trip_and_time_layer",
        )
    )
    transformed["hashed_trip_and_time"] = hash_trip_and_time_layer(
        (hashed_trip_intermediary, hour_of_day_of_week_intermediary)
    )
    trip_embedding_layer = tf.keras.layers.Embedding(
        input_dim=(nbuckets**3) * 4,
        output_dim=int(nbuckets**1.5),
        name="trip_embedding_layer",
    )

    # We create an embedding that asks the model to figure out good representations of trips taking both time
    # and space into account
    embeddings["trip_embedding"] = trip_embedding_layer(
        transformed["hashed_trip_and_time"]
    )

    return transformed, embeddings

Next, we'll create our DNN model now with the engineered features. We'll set `NBUCKETS = 10` to specify 10 buckets when bucketizing the latitude and longitude.

In [None]:
NBUCKETS = 10


def build_dnn_model():
    # input layer is all float except for pickup_datetime which is a string
    inputs = {
        colname: layers.Input(name=colname, shape=(1,), dtype="float32")
        for colname in NUMERIC_COLS
    }
    inputs.update(
        {
            colname: tf.keras.layers.Input(name=colname, shape=(1,), dtype="string")
            for colname in STRING_COLS
        }
    )

    # transforms
    transformed, embeddings = transform(
        inputs, numeric_cols=NUMERIC_COLS, string_cols=STRING_COLS, nbuckets=NBUCKETS
    )

    dnn_tabular_inputs = tf.keras.layers.Concatenate()(transformed.values())
    trip_locations_embedding = embeddings["trip_locations_embedding"]
    trip_embedding = embeddings["trip_embedding"]

    # two hidden layers of [32, 8] just in like the BQML DNN
    # Our model now takes two different types of inputs: features which
    # we have engineered (transformed dictionary) and representations
    # we want the model to engineer (embeddings).  We use different
    # activations for the embeddings as those are the ones supported for
    # GPU acceleration.

    ht1 = layers.Dense(32, activation="relu", name="ht1")(dnn_tabular_inputs)
    ht2 = layers.Dense(8, activation="relu", name="ht2")(ht1)
    et1 = layers.LSTM(32, activation="tanh", name="et1")(trip_locations_embedding)
    et2 = layers.LSTM(32, activation="tanh", name="et2")(trip_embedding)
    merge_layer = layers.concatenate([ht2, et1, et2])
    ht3 = layers.Dense(16)(merge_layer)

    # final output is a linear activation because this is regression
    output = layers.Dense(1, activation="linear", name="fare")(ht3)
    model = tf.keras.Model(inputs, output)

    # Compile model
    model.compile(optimizer="adam", loss="mse", metrics=[rmse, "mse"])
    return model

In [None]:
model = build_dnn_model()

Let's see how our model architecture has changed now.

In [None]:
# We can visualize the DNN using the Keras `plot_model` utility.
# The plot below represents all of the tensor operations that go into
# making a prediction, beginning with the raw input features, preprocessing, feature engineering
# and then the prediction itself.
tf.keras.utils.plot_model(
    model, "dnn_model_engineered.png", show_shapes=False, rankdir="LR"
)

In [None]:
# `load_dataset` method is used to load the dataset.
trainds = load_dataset("../data/taxi-train_toy*", TRAIN_BATCH_SIZE, "train")
evalds = load_dataset("../data/taxi-valid.csv*", 1000, "eval").take(
    NUM_EVAL_EXAMPLES // 1000
)

# `Fit` trains the model for a fixed number of epochs.  Note that because we have
# created a lot of sparsity, the performance of the model is heavily dependent on
# the amount of data that we have for training.  As such, this model overfits on
# a small amount of data, as it is complex enough to be able to memorize patterns
# over a small amount of data.

history = model.fit(
    trainds,
    validation_data=evalds,
    epochs=NUM_EVALS,
    steps_per_epoch=steps_per_epoch,
)

As before, let's visualize the DNN model layers. 

In [None]:
plot_curves(history, ["loss", "mse"])

Let's a prediction with this new model with engineered features on the example we had above. 

In [None]:
# Use the model to do prediction with `model.predict()`.
# Since our entire pipeline is integrated into the model, the path
# that this prediction takes is the same as the training process.
# Note that the data format is also the same: a dictionary of tensors!

model.predict(
    {
        "pickup_longitude": tf.convert_to_tensor([-73.982683]),
        "pickup_latitude": tf.convert_to_tensor([40.742104]),
        "dropoff_longitude": tf.convert_to_tensor([-73.983766]),
        "dropoff_latitude": tf.convert_to_tensor([40.755174]),
        "passenger_count": tf.convert_to_tensor([3.0]),
        "pickup_datetime": tf.convert_to_tensor(
            ["2010-02-08 09:17:00 UTC"], dtype=tf.string
        ),
    },
    steps=1,
)

Below we summarize our training results comparing our baseline model with our model with engineered features.

| Model              | Taxi Fare | Description                               |
|--------------------|-----------|-------------------------------------------|
| Baseline           | 12.29     | Baseline model - no feature engineering |
| Feature Engineered | 07.28      | Feature Engineered Model                |

Copyright 2021 Google Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.