# Introducing the Keras Functional API

**Learning Objectives**
  - Understand embeddings and how to create them with the feature column API
  - Understand Deep and Wide models and when to use them
  - Understand the Keras functional API and how to build a deep and wide model with it
  - Learn how to train a Keras model at scale on GCP

## Introduction

In the last notebook, we learned about the Keras Sequential API. The [Keras Functional API](https://www.tensorflow.org/guide/keras#functional_api) provides an alternate way of building models which is more flexible. With the Functional API, we can build models with more complex topologies, multi-input or output layers, shared layers or non-sequential data flows (e.g. residual layers)

In this notebook we'll use what we learned about feature columns to build a Wide & Deep model. Recall, that the idea behind Wide & Deep models is to join the two methods of learning through memorization and generalization by making a wide linear model and a deep learning model to accommodate both. 

<img src='assets/wide_deep.png' width='80%'>
<sup>(image: https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html)</sup>

The Wide part of the model is associated with the memory element. In this case, we train a linear model with a wide set of crossed features and learn the correlation of this related data with the assigned label. The Deep part of the model is associated with the generalization element where we use embedding vectors for features. The best embeddings are then learned through the training process. While both of these methods can work well alone, Wide & Deep models excel by combining these techniques together. 

Once we have trained our model, we will see how to train our model at scale on GCP using AI Platform.

In [1]:
#  Ensure that we have the latest version of Tensorflow installed.
!pip3 freeze | grep tf-nightly-2.0-preview || pip3 install tf-nightly-2.0-preview

tf-nightly-2.0-preview==2.0.0.dev20190919


Start by importing the necessary libraries for this lab.

In [14]:
import datetime
import os
import shutil

import numpy as np
import pandas as pd
import tensorflow as tf

%matplotlib inline
from matplotlib import pyplot as plt
from tensorflow import keras

from tensorflow import feature_column as fc

print(tf.__version__)

2.0.0-dev20190919


## Load raw data 

We will use the taxifare dataset, using the CSV files that we created in the first notebook of this sequence. Those files have been saved into `../data`.

In [4]:
!ls -l ../data/*.csv

-rw-r--r--  1 munn  primarygroup  123590 Sep 19 18:08 ../data/taxi-test.csv
-rw-r--r--  1 munn  primarygroup  579055 Sep 19 18:08 ../data/taxi-train.csv
-rw-r--r--  1 munn  primarygroup  123114 Sep 19 18:08 ../data/taxi-valid.csv


## Use tf.data to read the CSV files

We wrote these functions for reading data from the csv files above in the [previous notebook](2_dataset_api.ipynb).

In [5]:
CSV_COLUMNS = [
    'fare_amount',
    'pickup_datetime',
    'pickup_longitude',
    'pickup_latitude',
    'dropoff_longitude',
    'dropoff_latitude',
    'passenger_count',
    'key'
]
LABEL_COLUMN = 'fare_amount'
DEFAULTS = [[0.0], ['na'], [0.0], [0.0], [0.0], [0.0], [0.0], ['na']]
UNWANTED_COLS = ['pickup_datetime', 'key']


def features_and_labels(row_data):
    label = row_data.pop(LABEL_COLUMN)
    features = row_data
    
    for unwanted_col in UNWANTED_COLS:
        features.pop(unwanted_col)

    return features, label


def create_dataset(pattern, batch_size=1, mode=tf.estimator.ModeKeys.EVAL):
    dataset = tf.data.experimental.make_csv_dataset(
        pattern, batch_size, CSV_COLUMNS, DEFAULTS)

    dataset = dataset.map(features_and_labels)

    if mode == tf.estimator.ModeKeys.TRAIN:
        dataset = dataset.shuffle(buffer_size=1000).repeat()

    # take advantage of multi-threading; 1=AUTOTUNE
    dataset = dataset.prefetch(1)
    return dataset

## Feature columns for Wide and Deep model

For the Wide columns, we will create feature columns of crossed features. To do this, we'll create a collection of Tensorflow feature columns to pass to the `tf.feature_column.crossed_column` constructor. The Deep columns will consist of numberic columns and any embedding columns we want to create. 

In [15]:
# 1. One hot encode dayofweek and hourofday
fc_dayofweek = fc.categorical_column_with_identity(
    key="dayofweek", num_buckets=7)
fc_hourofday = fc.categorical_column_with_identity(
    key="hourofday", num_buckets=24)

# 2. Bucketize latitudes and longitudes
NBUCKETS = 16
latbuckets = np.linspace(start=38.0, stop=42.0, num=NBUCKETS).tolist()
lonbuckets = np.linspace(start=-76.0, stop=-72.0, num=NBUCKETS).tolist()
fc_bucketized_plat = fc.bucketized_column(
    source_column=fc.numeric_column(key="pickuplon"), boundaries=lonbuckets)
fc_bucketized_plon = fc.bucketized_column(
    source_column=fc.numeric_column(key="pickuplat"), boundaries=latbuckets)
fc_bucketized_dlat = fc.bucketized_column(
    source_column=fc.numeric_column(key="dropofflon"), boundaries=lonbuckets)
fc_bucketized_dlon = fc.bucketized_column(
    source_column=fc.numeric_column(key="dropofflat"), boundaries=latbuckets)

# 3. Cross features to get combination of day and hour
fc_crossed_day_hr = fc.crossed_column(
    keys=[fc_dayofweek, fc_hourofday], hash_bucket_size=24 * 7)
fc_crossed_dloc = fc.crossed_column(
    keys=[fc_bucketized_dlat, fc_bucketized_dlon],
    hash_bucket_size=NBUCKETS * NBUCKETS)
fc_crossed_ploc = fc.crossed_column(
    keys=[fc_bucketized_plat, fc_bucketized_plon],
    hash_bucket_size=NBUCKETS * NBUCKETS)
fc_crossed_pd_pair = fc.crossed_column(
    keys=[fc_crossed_dloc, fc_crossed_ploc],
    hash_bucket_size=NBUCKETS**4)

We also add our engineered features that we used previously.

In [16]:
def add_engineered_features(features):
    # Subtract one since our days of week are 1-7 instead of 0-6
    features["dayofweek"] = features["dayofweek"] - 1

    # Compute Euclidean distance
    features["latdiff"] = features["pickuplat"] - features["dropofflat"]
    features["londiff"] = features["pickuplon"] - features["dropofflon"]
    features["euclidean_dist"] = tf.sqrt(
        x=features["latdiff"]**2 + features["londiff"]**2)

    return features

### Gather list of feature columns

Next we gather the list of wide and deep feature columns we'll pass to our Wide & Deep model in Tensorflow. Recall, wide columns are sparse, have linear relationship with the output while continuous columns are deep, have a complex relationship with the output. We will use our previously bucketized columns to collect crossed feature columns and sparse feature columns for our wide columns, and embedding feature columns and numeric features columns for the deep columns.

In [20]:
wide_columns = [
    # Feature crosses
    fc_crossed_day_hr, fc_crossed_dloc,
    fc_crossed_ploc, fc_crossed_pd_pair,

    # Sparse columns
    fc_dayofweek, fc_hourofday
]

deep_columns = [
    # Embedding_column to "group" together ...
    fc.embedding_column(categorical_column=fc_crossed_pd_pair, dimension=10),
    fc.embedding_column(categorical_column=fc_crossed_day_hr, dimension=10),

    # Numeric columns
    fc.numeric_column(key="pickuplat"),
    fc.numeric_column(key="pickuplon"),
    fc.numeric_column(key="dropofflon"),
    fc.numeric_column(key="dropofflat"),
    fc.numeric_column(key="latdiff"),
    fc.numeric_column(key="londiff"),
    fc.numeric_column(key="euclidean_dist"),

    fc.indicator_column(categorical_column=fc_crossed_day_hr),
]

## Build a Wide and Deep model in Keras

To build a wide-and-deep network, we connect the sparse (i.e. wide) features directly to the output node, but pass the dense (i.e. deep) features through a set of fully connected layers. Here’s that model architecture looks using the Functional API.

In [21]:
deep = tf.keras.layers.DenseFeatures(deep_columns)()



NameError: name 'inputs' is not defined

In [None]:



for numnodes in dnn_hidden_units:
    deep = tf.keras.layers.Dense(numnodes, activation='relu')(deep)        
wide = tf.keras.layers.DenseFeatures(linear_feature_columns)(inputs)
both = tf.keras.layers.concatenate([deep, wide])
output = tf.keras.layers.Dense(1, activation='sigmoid')(both)
model = tf.keras.Model(inputs, output)
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
return model