# Create TensorFlow Deep Neural Network Model

**Learning Objective**
- Create a DNN model using the high-level Estimator API 

## Introduction

We'll begin by modeling our data using a Deep Neural Network. To achieve this we will use the high-level Estimator API in Tensorflow. Have a look at the various models available through the Estimator API in [the documentation here](https://www.tensorflow.org/api_docs/python/tf/estimator). 

Start by setting the environment variables related to your project.

In [1]:
PROJECT = "qwiklabs-gcp-636667ae83e902b6"  # Replace with your PROJECT
BUCKET =  "qwiklabs-gcp-636667ae83e902b6_al"  # Replace with your BUCKET
REGION = "us-east1"            # Choose an available region for AI Platform  
TFVERSION = "1.13"                # TF version for AI Platform

In [2]:
import errno
import math
import numpy as np
import os
import shutil
import tensorflow as tf

os.environ["BUCKET"] = BUCKET
os.environ["PROJECT"] = PROJECT
os.environ["REGION"] = REGION
os.environ["TFVERSION"] = TFVERSION

In [3]:
%%bash
if ! gsutil ls | grep -q gs://${BUCKET}/; then
    gsutil mb -l ${REGION} gs://${BUCKET}
fi

In [4]:
%%bash
ls *.csv

babyweight_test.csv
babyweight_train.csv
babyweight_valid.csv


## Create TensorFlow model using TensorFlow's Estimator API ##

We'll begin by writing an input function to read the data and define the csv column names and label column. We'll also set the default csv column values and set the number of training steps.

In [5]:
# Ensure that we have Tensorflow 1.13 installed.
!pip3 freeze | grep tensorflow==1.13.1 || pip3 install tensorflow==1.13.1

tensorflow==1.13.1


In [6]:
print(tf.__version__)

1.13.1


#### **Exercise 1**

To begin creating out Tensorflow model, we need to set up variables that determine the csv column values, the label column and the key column. Fill in the TODOs below to set these variables. Note, `CSV_COLUMNS` should be a list and `LABEL_COLUMN` should be a string. It is important to get the column names in the correct order as they appear in the csv train/eval/test sets. If necessary, look back at the previous notebooks at how these csv files were created to ensure you have the correct ordering. 

We also need to set `DEFAULTS` for each of the CSV column values we prescribe. This will also the a list of entities that will vary depending on the data type of the csv column value. Have a look back at the previous examples to ensure you have the proper formatting.

In [7]:
%%bash
head -3 babyweight_train.csv

weight_pounds,is_male,mother_age,mother_race,father_race,cigarette_use,mother_married,ever_born,plurality,gestation_weeks,had_ultrasound
6.6689834255,Unknown,30,2.0,2.0,False,True,2.0,Single(1),39.0,False
6.75055446244,True,22,2.0,9.0,False,False,1.0,Single(1),40.0,True


In [8]:
# Determine CSV, label, and key columns
CSV_COLUMNS = 'weight_pounds,is_male,mother_age,mother_race,father_race,cigarette_use,mother_married,ever_born,plurality,gestation_weeks,had_ultrasound'.split(',')
LABEL_COLUMN = 'weight_pounds'

# Set default values for each CSV column
CSV_DEFAULTS = [[0.0], ['Unknown'], [0.0], ['0.0'], ['0.0'], ['False'], ['True'], [1.0], ['Single(1)'], [0.0], ['False']]
TRAIN_STEPS = 1000

### Create the input function

Now we are ready to create an input function using the Dataset API.

#### **Exercise 2**

In the code below you are asked to complete the TODOs to create the input function for our model. Look back at the previous examples we have completed if you need a hint as to how to complete the missing fields below. 

In the first block of TODOs, your `decode_csv` file should return a dictionary called `features` and a value `label`.

In the next TODO, use `tf.gfile.Glob` to create a list of files that match the given `filename_pattern`. Have a look at the documentation for `tf.gfile.Glob` if you get stuck.

In the next TODO, use `tf.data.TextLineDataset` to read text file and apply the `decode_csv` function you created above to parse each row example. 

In the next TODO you are asked to set up the dataset depending on whether you are in `TRAIN` mode or not. (**Hint**: Use `tf.estimator.ModeKeys.TRAIN`). When in `TRAIN` mode, set the appropriate number of epochs and shuffle the data accordingly. When not in `TRAIN` mode, you will use a different number of epochs and there is no need to shuffle the data. 

Finally, in the last TODO, collect the operations you set up above to produce the final `dataset` we'll use to feed data into our model. 

Have a look at the examples we did in the previous notebooks if you need inspiration.

In [9]:
def decode_csv(line_of_text):
    fields = tf.decode_csv(records = line_of_text, record_defaults = CSV_DEFAULTS)
    features = dict(zip(CSV_COLUMNS, fields))
    features['mother_race'] = tf.cast(features['mother_race'], 'string')
    features['father_race'] = tf.cast(features['father_race'], 'string')
    features['plurality'] = tf.cast(features['plurality'], 'string')
    label = features.pop(LABEL_COLUMN) # remove label from features and store
    return features, label

In [10]:
# Create an input function reading a file using the Dataset API
# Then provide the results to the Estimator API
def read_dataset(filename_pattern, mode, batch_size = 512):
    def _input_fn():
    
        # Create list of files that match pattern.  Does support internal wildcarding e.g. "babyweight*.csv"
        file_list = tf.gfile.Glob(filename_pattern)

        # Create dataset from file list
        dataset = tf.data.TextLineDataset(filenames = file_list).skip(count = 1)
        dataset = dataset.map(map_func = decode_csv)

        # In training mode, shuffle the dataset and repeat indefinitely
        if mode == tf.estimator.ModeKeys.TRAIN:
            dataset = dataset.shuffle(buffer_size = 10 * batch_size)
            num_epochs = None 
        else:
            num_epochs = 1 

        dataset = dataset.repeat(count = num_epochs).batch(batch_size = batch_size)
        return dataset

        # This will now return batches of features, label
        return dataset
    return _input_fn

### Create the feature columns

Next, we define the feature columns

#### **Exercise 3**

There are different ways to set up the feature columns for our model. 

In the first TODO below, you are asked to create a function `get_categorical` which takes a feature name and its potential values and returns an indicator `tf.feature_column` based on a categorical with vocabulary list column. Look back at the documentation for `tf.feature_column.indicator_column` to ensure you call the arguments correctly.

In the next TODO, you are asked to complete the code to create a function called `get_cols`. It has no argumnets but should return a list of all the `tf.feature_column`s you intend to use for your model. **Hint**: use the `get_categorical` function you created above to make your code easier to read.

In [11]:
def get_categorical(name, values):
    return tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_vocabulary_list(key=name, vocabulary_list=values))

def get_cols(num_cols, cat_cols, cat_vocab):
    all_cols = []
    for col in num_cols:
        all_cols.append(tf.feature_column.numeric_column(key = col))
    for col in cat_cols:
        all_cols.append(get_categorical(col, cat_vocab[col]))
    return all_cols

In [12]:
CSV_COLUMNS

['weight_pounds',
 'is_male',
 'mother_age',
 'mother_race',
 'father_race',
 'cigarette_use',
 'mother_married',
 'ever_born',
 'plurality',
 'gestation_weeks',
 'had_ultrasound']

In [13]:
# in this toy dataset "had_ultrasound" is a meaningless placeholder, don't try to use it

In [14]:
num_cols = ['mother_age', 'ever_born', 'gestation_weeks']
cat_cols = ['is_male', 'mother_race', 'father_race', 'cigarette_use', 'mother_married', 'plurality']

In [15]:
cat_vocab = {
            'is_male': ['True', 'False', 'Unknown'], 
             'cigarette_use': ['True', 'False'], 
             'mother_married': ['True', 'False'], 
             'mother_race': [ '1.0',  '7.0',  '2.0',  '0.0',  '3.0', '18.0', '28.0',  '5.0', '48.0',  '4.0', '68.0',  '9.0', '78.0',
        '6.0', '38.0', '58.0'], 
             'father_race': [ '1.0',  '7.0',  '2.0',  '0.0',  '3.0', '18.0', '28.0',  '5.0', '48.0',  '4.0', '68.0',  '9.0', '78.0',
        '6.0', '38.0', '58.0'], 
             'plurality': ['Single(1)', 'Twins(2)', 'Multiple(2+)', 'Triplets(3)',
       'Quintuplets(5)', 'Quadruplets(4)'] 
            }

### Create the Serving Input function 

To predict with the TensorFlow model, we also need a serving input function. This will allow us to serve prediction later using the predetermined inputs. We will want all the inputs from our user.

#### **Exercise 4**
In the TODOs below, create the `feature_placeholders` dictionary by setting up the placeholders for each of the features we will use in our model. Look at the documentation for `tf.placeholder` to make sure you provide all the necessary arguments. You'll need to create placeholders for the features
- `is_male`
- `mother_age`
- `plurality`
- `gestation_weeks`
- `key`

You'll also need to create the features dictionary to pass to the `tf.estimator.export.ServingInputReceiver` function. The `features` dictionary will reference the `fearture_placeholders` dict you created above. Remember to expand the dimensions of the tensors you'll incoude in the `features` dictionary to accomodate for batched data we'll send to the model for predicitons later. 

In [16]:
def serving_input_fn(cat_cols, num_cols):
    num_placeholders = {col: tf.placeholder(dtype=tf.float32, shape=[None], name=col) for col in num_cols}     
    cat_placeholders = {col: tf.placeholder(dtype=tf.string, shape=[None], name=col) for col in cat_cols}
    
    feature_placeholders = {**num_placeholders, **cat_placeholders}
    
    features = {
        key: tf.expand_dims(input = tensor, axis = -1)
        for key, tensor in feature_placeholders.items()
    }
    
    return tf.estimator.export.ServingInputReceiver(features = features, receiver_tensors = feature_placeholders)

### Create the model and run training and evaluation

Lastly, we'll create the estimator to train and evaluate. In the cell below, we'll set up a `DNNRegressor` estimator and the train and evaluation operations. 

#### **Exercise 5**

In the cell below, complete the TODOs to create our model for training. 
- First you must create your estimator using `tf.estimator.DNNRegressor`. 
- Next, complete the code to set up your `tf.estimator.TrainSpec`, selecting the appropriate input function and dataset to use to read data to your function during training. 
- Next, set up your `exporter` and `tf.estimator.EvalSpec`.
- Finally, pass the variables you created above to call `tf.estimator.train_and_evaluate`

Be sure to check the documentation for these Tensorflow operations to make sure you set things up correctly.

In [19]:
def train_and_evaluate_dnn(train_data, eval_data, output_dir, num_cols, cat_cols, cat_vocab):
    EVAL_INTERVAL = 300
    run_config = tf.estimator.RunConfig(
        save_checkpoints_secs = EVAL_INTERVAL,
        tf_random_seed=42,
        keep_checkpoint_max = 3)

    estimator = tf.estimator.DNNRegressor(model_dir=output_dir,
                                         feature_columns = get_cols(num_cols, cat_cols, cat_vocab),
                                         hidden_units = [64,32],
                                         config=run_config)
    
    train_spec = tf.estimator.TrainSpec(input_fn = read_dataset(train_data, mode = tf.estimator.ModeKeys.TRAIN),
        max_steps = TRAIN_STEPS)
    
    exporter = tf.estimator.BestExporter(name = "exporter", serving_input_receiver_fn = serving_input_fn(cat_cols, num_cols))
    eval_spec = tf.estimator.EvalSpec(input_fn = read_dataset(eval_data, mode=tf.estimator.ModeKeys.EVAL))

    train_exists = os.path.isfile(train_data)
    eval_exists = os.path.isfile(eval_data)
    
    if not train_exists:
        raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), train_data)
        
    if not eval_exists:
        raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), eval_data)
        
    tf.estimator.train_and_evaluate(estimator=estimator, train_spec=train_spec, eval_spec=eval_spec)

Finally, we train the model!

In [20]:
# Run the model
shutil.rmtree(path = "babyweight_trained_dnn", ignore_errors = True) # start fresh each time
train_and_evaluate_dnn("babyweight_train.csv", "babyweight_valid.csv", "babyweight_trained_dnn", num_cols, cat_cols,cat_vocab)

INFO:tensorflow:Using config: {'_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_protocol': None, '_keep_checkpoint_max': 3, '_save_checkpoints_secs': 300, '_log_step_count_steps': 100, '_save_checkpoints_steps': None, '_num_ps_replicas': 0, '_save_summary_steps': 100, '_tf_random_seed': 42, '_train_distribute': None, '_device_fn': None, '_experimental_distribute': None, '_model_dir': 'babyweight_trained_dnn', '_global_id_in_cluster': 0, '_evaluation_master': '', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fa4f6c5f240>, '_master': '', '_is_chief': True, '_service': None, '_eval_distribute': None, '_num_worker_replicas': 1, '_task_id': 0, '_task_type': 'worker', '_keep_checkpoint_every_n_hours': 10000}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The

Look at the results of your training job above. What RMSE (`average_loss`) did you get for the final eval step?

Copyright 2017-2018 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License

In [23]:
def train_and_evaluate_gbt(train_data, eval_data, output_dir, num_cols, cat_cols, cat_vocab):
    EVAL_INTERVAL = 300
    run_config = tf.estimator.RunConfig(
        save_checkpoints_secs = EVAL_INTERVAL,
        tf_random_seed=42,
        keep_checkpoint_max = 3)

    estimator = tf.estimator.BoostedTreesRegressor(model_dir=output_dir,
                                                   n_batches_per_layer = 1,
                                         feature_columns = get_cols(num_cols, cat_cols, cat_vocab),
                                         n_trees=50,
                                         max_depth=6,   
                                         learning_rate=0.05,          
                                         config=run_config)
    
    train_spec = tf.estimator.TrainSpec(input_fn = read_dataset(train_data, mode = tf.estimator.ModeKeys.TRAIN),
        max_steps = TRAIN_STEPS)
    
    exporter = tf.estimator.BestExporter(name = "exporter", serving_input_receiver_fn = serving_input_fn(cat_cols, num_cols))
    eval_spec = tf.estimator.EvalSpec(input_fn = read_dataset(eval_data, mode=tf.estimator.ModeKeys.EVAL))

    train_exists = os.path.isfile(train_data)
    eval_exists = os.path.isfile(eval_data)
    
    if not train_exists:
        raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), train_data)
        
    if not eval_exists:
        raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), eval_data)
                  
    tf.estimator.train_and_evaluate(estimator=estimator, train_spec=train_spec, eval_spec=eval_spec)

In [24]:
shutil.rmtree(path = "babyweight_trained_gbt", ignore_errors = True) # start fresh each time
train_and_evaluate_gbt("babyweight_train.csv", "babyweight_valid.csv", "babyweight_trained_gbt", num_cols, cat_cols,cat_vocab)

INFO:tensorflow:Using config: {'_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_protocol': None, '_keep_checkpoint_max': 3, '_save_checkpoints_secs': 300, '_log_step_count_steps': 100, '_save_checkpoints_steps': None, '_num_ps_replicas': 0, '_save_summary_steps': 100, '_tf_random_seed': 42, '_train_distribute': None, '_device_fn': None, '_experimental_distribute': None, '_model_dir': 'babyweight_trained_gbt', '_global_id_in_cluster': 0, '_evaluation_master': '', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fa4c9789400>, '_master': '', '_is_chief': True, '_service': None, '_eval_distribute': None, '_num_worker_replicas': 1, '_task_id': 0, '_task_type': 'worker', '_keep_checkpoint_every_n_hours': 10000}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The

In [27]:
import math

In [30]:
print("Final RMSE for DNN: %s" % round(math.sqrt(1.12), 2))
print("Final RMSE for GBT: %s" % round(math.sqrt(1.05), 2))

Final RMSE for DNN: 1.06
Final RMSE for GBT: 1.02
