# Training a model with `traffic_last_5min` feature


## Introduction

In this notebook, we'll train a taxifare prediction model but this time with an additional feature of `traffic_last_5min`.

In [None]:
try:
    from google.cloud import aiplatform

except ImportError:
    !pip3 install -U google-cloud-aiplatform --user

    print("Please restart the kernel and re-run the notebook.")

If the above command resulted in an installation, please restart the notebook kernel and re-run the notebook.

In [None]:
import os
import shutil

import pandas as pd
import tensorflow as tf

from datetime import datetime
from matplotlib import pyplot as plt
from tensorflow import keras

from google.cloud import aiplatform
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, DenseFeatures
from tensorflow.keras.callbacks import TensorBoard

print(tf.__version__)
%matplotlib inline

In [None]:
PROJECT = 'cloud-training-demos' # REPLACE WITH YOUR PROJECT ID
BUCKET = 'cloud-training-demos' # REPLACE WITH YOUR BUCKET NAME
REGION = 'us-central1' # REPLACE WITH YOUR BUCKET REGION e.g. us-central1

In [None]:
# For Bash Code
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION

In [None]:
%%bash
gcloud config set project $PROJECT
gcloud config set ai/region $REGION

## Load raw data

In [None]:
!ls -l ../data/taxi-traffic*

In [None]:
!head ../data/taxi-traffic*

## Use tf.data to read the CSV files

These functions for reading data from the csv files are similar to what we used in the Introduction to Tensorflow module. Note that here we have an addtional feature `traffic_last_5min`.

In [None]:
CSV_COLUMNS = [
    'fare_amount',
    'dayofweek',
    'hourofday',
    'pickup_longitude',
    'pickup_latitude',
    'dropoff_longitude',
    'dropoff_latitude',
    'traffic_last_5min'
]
LABEL_COLUMN = 'fare_amount'
DEFAULTS = [[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]]


def features_and_labels(row_data):
    label = row_data.pop(LABEL_COLUMN)
    features = row_data

    return features, label


def create_dataset(pattern, batch_size=1, mode=tf.estimator.ModeKeys.EVAL):
    dataset = tf.data.experimental.make_csv_dataset(
        pattern, batch_size, CSV_COLUMNS, DEFAULTS)

    dataset = dataset.map(features_and_labels)

    if mode == tf.estimator.ModeKeys.TRAIN:
        dataset = dataset.shuffle(buffer_size=1000).repeat()

    # take advantage of multi-threading; 1=AUTOTUNE
    dataset = dataset.prefetch(1)
    return dataset

In [None]:
INPUT_COLS = [
    'dayofweek',
    'hourofday',
    'pickup_longitude',
    'pickup_latitude',
    'dropoff_longitude',
    'dropoff_latitude',
    'traffic_last_5min'
]

# Create input layer of feature columns
feature_columns = {
    colname: tf.feature_column.numeric_column(colname)
    for colname in INPUT_COLS
    }

## Build a simple keras DNN model

In [None]:
# Build a keras DNN model using Sequential API
def build_model(dnn_hidden_units):
    model = Sequential(DenseFeatures(feature_columns=feature_columns.values()))
    
    for num_nodes in dnn_hidden_units:
        model.add(Dense(units=num_nodes, activation="relu"))
    
    model.add(Dense(units=1, activation="linear"))    
    
    # Create a custom evaluation metric
    def rmse(y_true, y_pred):
        return tf.sqrt(tf.reduce_mean(tf.square(y_pred - y_true)))

    # Compile the keras model
    model.compile(optimizer="adam", loss="mse", metrics=[rmse, "mse"])
    
    return model

Next, we can call the `build_model` to create the model. Here we'll have two hidden layers before our final output layer. And we'll train with the same parameters we used before.

In [None]:
HIDDEN_UNITS = [32, 8]

model = build_model(dnn_hidden_units=HIDDEN_UNITS)

In [None]:
BATCH_SIZE = 1000
NUM_TRAIN_EXAMPLES = 10000 * 6  # training dataset will repeat, wrap around
NUM_EVALS = 60  # how many times to evaluate
NUM_EVAL_EXAMPLES = 10000  # enough to get a reasonable sample

trainds = create_dataset(
    pattern='../data/taxi-traffic-train*',
    batch_size=BATCH_SIZE,
    mode=tf.estimator.ModeKeys.TRAIN)

evalds = create_dataset(
    pattern='../data/taxi-traffic-valid*',
    batch_size=BATCH_SIZE,
    mode=tf.estimator.ModeKeys.EVAL).take(NUM_EVAL_EXAMPLES//1000)

In [None]:
%%time
steps_per_epoch = NUM_TRAIN_EXAMPLES // (BATCH_SIZE * NUM_EVALS)

LOGDIR = "./taxi_trained"
history = model.fit(x=trainds,
                    steps_per_epoch=steps_per_epoch,
                    epochs=NUM_EVALS,
                    validation_data=evalds,
                    callbacks=[TensorBoard(LOGDIR)])

In [None]:
RMSE_COLS = ['rmse', 'val_rmse']

pd.DataFrame(history.history)[RMSE_COLS].plot()

In [None]:
model.predict(x={"dayofweek": tf.convert_to_tensor([6]),
                 "hourofday": tf.convert_to_tensor([17]),
                 "pickup_longitude": tf.convert_to_tensor([-73.982683]),
                 "pickup_latitude": tf.convert_to_tensor([40.742104]),
                 "dropoff_longitude": tf.convert_to_tensor([-73.983766]),
                 "dropoff_latitude": tf.convert_to_tensor([40.755174]),
                "traffic_last_5min": tf.convert_to_tensor([114])},
              steps=1)

## Export and deploy model

In [None]:
OUTPUT_DIR = "./export/savedmodel"
shutil.rmtree(OUTPUT_DIR, ignore_errors=True)
EXPORT_PATH = os.path.join(OUTPUT_DIR,
                           datetime.now().strftime("%Y%m%d%H%M%S"))
tf.saved_model.save(model, EXPORT_PATH)  # with default serving function
os.environ['EXPORT_PATH'] = EXPORT_PATH

Note that the last `gcloud` call below, which deploys the mode, can take a few minutes, and you might not see the earlier `echo` outputs while that job is still running. If you want to make sure that your notebook is not stalled and your model is actually getting deployed, view your models in the console at https://console.cloud.google.com/vertex-ai/models, click on your model, and you should see your endpoint listed with an "in progress" icon next to it.

In [None]:
%%bash
TIMESTAMP=$(date -u +%Y%m%d_%H%M%S)
MODEL_NAME=taxifare_$TIMESTAMP
ENDPOINT_NAME=taxifare_endpoint_$TIMESTAMP
IMAGE_URI="us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-3:latest"
ARTIFACT_DIRECTORY=gs://${BUCKET}/${MODEL_NAME}/
echo $ARTIFACT_DIRECTORY

gcloud storage cp --recursive ${EXPORT_PATH}/* ${ARTIFACT_DIRECTORY}

# Model
gcloud ai models upload \
    --region=$REGION \
    --display-name=$MODEL_NAME \
    --container-image-uri=$IMAGE_URI \
    --artifact-uri=$ARTIFACT_DIRECTORY

MODEL_ID=$(gcloud ai models list \
    --region=$REGION \
    --filter=display_name="$MODEL_NAME" | cut -d" " -f1 | head -n2 | tail -n1)

echo "MODEL_NAME=${MODEL_NAME}"
echo "MODEL_ID=${MODEL_ID}"

# Endpoint
gcloud ai endpoints create \
  --region=$REGION \
  --display-name=$ENDPOINT_NAME

ENDPOINT_ID=$(gcloud ai endpoints list \
  --region=$REGION \
  --filter=display_name="$ENDPOINT_NAME" | cut -d" " -f1 | head -n2 | tail -n1)

echo "ENDPOINT_NAME=${ENDPOINT_NAME}"
echo "ENDPOINT_ID=${ENDPOINT_ID}"

# Deployment
DEPLOYED_MODEL_NAME=${MODEL_NAME}_deployment
MACHINE_TYPE=n1-standard-2
MIN_REPLICA_COUNT=1
MAX_REPLICA_COUNT=3

gcloud ai endpoints deploy-model $ENDPOINT_ID \
  --region=$REGION \
  --model=$MODEL_ID \
  --display-name=$DEPLOYED_MODEL_NAME \
  --machine-type=$MACHINE_TYPE \
  --min-replica-count=$MIN_REPLICA_COUNT \
  --max-replica-count=$MAX_REPLICA_COUNT \
  --traffic-split=0=100

Take note of the `ENDPOINT_ID` printed above, as you will need it in the next lab.

The above model deployment can be initiated from the Vertex AI Python SDK as well, as seen below. In this case, we do not need to create the Endpoint ourselves (we could though), but it is implicitly created during the `model.deploy()` call.

Copyright 2021 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License