<a href="https://colab.research.google.com/github/AjeetSingh02/Notebooks/blob/master/TFestimatorClassificationSaveLoad.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Overview**

This end-to-end walkthrough trains a logistic regression model using the tf.estimator API followed by saving the model and then re-loading it with a different name for predictions.

First things first: **Import libraries**

In [1]:
import os
import shutil
import numpy as np
import pandas as pd
import tensorflow as tf
from IPython.display import clear_output

**Load the dataset**

We will use the Titanic dataset with the (rather morbid) goal of predicting passenger survival, given characteristics such as gender, age, class, etc.

In [2]:
dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')
dfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv')

This is how our data set looks like this. Column ```survived``` is our target column and rest are features.

<table>
  <tr>
    <th>Column Name</th>
    <th>Description</th>
  </tr>
<tr>
    <td>survived</td>
    <td>Passenger survived or not</td>
  </tr>
  <tr>
    <td>sex</td>
    <td>Gender of passenger</td>
  </tr>
  <tr>
    <td>age</td>
    <td>Age of passenger</td>
  </tr>
    <tr>
    <td>n_siblings_spouses</td>
    <td>siblings and partners aboard</td>
  </tr>
    <tr>
    <td>parch</td>
    <td>of parents and children aboard</td>
  </tr>
    <tr>
    <td>fare</td>
    <td>Fare passenger paid.</td>
  </tr>
    <tr>
    <td>class</td>
    <td>Passenger's class on ship</td>
  </tr>
    <tr>
    <td>deck</td>
    <td>Which deck passenger was on</td>
  </tr>
    <tr>
    <td>embark_town</td>
    <td>Which town passenger embarked from</td>
  </tr>
    <tr>
    <td>alone</td>
    <td>If passenger was alone</td>
  </tr>
</table>

In [3]:
dftrain.head()

Unnamed: 0,survived,sex,age,n_siblings_spouses,parch,fare,class,deck,embark_town,alone
0,0,male,22.0,1,0,7.25,Third,unknown,Southampton,n
1,1,female,38.0,1,0,71.2833,First,C,Cherbourg,n
2,1,female,26.0,0,0,7.925,Third,unknown,Southampton,y
3,1,female,35.0,1,0,53.1,First,C,Southampton,n
4,0,male,28.0,0,0,8.4583,Third,unknown,Queenstown,y


**Feature Engineering for the Model**

Estimators use a system called feature columns to describe how the model should interpret each of the raw input features. An Estimator expects a vector of numeric inputs, and feature columns describe how the model should convert each feature.

In [4]:
LABEL = "survived"
feature_columns = []
NUMERIC_COLUMNS = ['age', 'fare']
CATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck', 
                       'embark_town', 'alone']

for feature_name in CATEGORICAL_COLUMNS:
  vocabulary = dftrain[feature_name].unique()
  feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))

for feature_name in NUMERIC_COLUMNS:
  feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))

The below two functions are responsible for feeding the data to the model for training and evaluation respectively.

In [5]:
def make_train_input_fn(df, num_epochs):
    return tf.compat.v1.estimator.inputs.pandas_input_fn(
    x = df,
    y = df[LABEL],
    batch_size = 128,
    num_epochs = num_epochs,
    shuffle = True,
    queue_capacity = 1000
  )

In [6]:
def make_prediction_input_fn(df):
  return tf.compat.v1.estimator.inputs.pandas_input_fn(
    x = df,
    y = None,
    batch_size = 128,
    shuffle = False,
    queue_capacity = 1000
  )

**Train the model**

In [7]:
# Instantiate the pre-made estimator
model = tf.estimator.LinearClassifier(feature_columns)

# Train the model
model.train(make_train_input_fn(dftrain, num_epochs=10))

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp2nsigeyn', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Instructions for updating:
Use Vari

<tensorflow_estimator.python.estimator.canned.linear.LinearClassifierV2 at 0x7fea60184b70>

**Predict on evaluation dataset**

In [8]:
predDicts = list(model.predict(make_prediction_input_fn(dfeval)))

INFO:tensorflow:Calling model_fn.


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmp2nsigeyn/model.ckpt-49
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


Let's see first 10 predictions

In [9]:
preds = []
for pred in predDicts[:10]:
    preds.append(np.argmax(pred["probabilities"]))
preds

[0, 1, 1, 1, 0, 1, 1, 0, 1, 1]

**Save the model**

In [10]:
inputFn = \
tf.estimator.export.build_parsing_serving_input_receiver_fn(
    tf.feature_column.make_parse_example_spec(feature_columns)
)

OUTDIR = 'modelDir'
shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time

modelBasePath = os.path.join(OUTDIR, "model")
modelPath = model.export_saved_model(modelBasePath, inputFn)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']
INFO:tensorflow:Signatures INCLUDED in export for Regress: ['regression']
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Restoring parameters from /tmp/tmp2nsigeyn/model.ckpt-49
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: modelDir/model/temp-1595116057/saved_model.pb


**Reload the model**

We will use different model name just to be sure that we are not using the existing model.

In [11]:
savedModelPath = modelPath
importedModel = tf.saved_model.load(savedModelPath)

**Predict using imported model**

To predict on an unseen data set using loaded estimator model, we will have to follow following small steps:


<ol>

<li>Loop through the whole dataset rows.

<ol>
<li>Create tf.train.Example() object. This object will be responsible for passing our data to the model for prediction.</li>
<li>Loop through all the columns and based on the datatype of the column add that column value to the example object using the appropriate type out of bytes_list, float_list, int64_list. More info about these types here: https://www.tensorflow.org/tutorials/load_data/tfrecord</li>
<li>Predict using this example object and the imported model. Note that this example object will serve the same purpose as passing a single row to a sklearn model for prediction.</li>
</ol>
</li>
</ol>

Below is the implementation of the same.

In [12]:
def predict(dfeval, importedModel):
    colNames = dfeval.columns
    dtypes = dfeval.dtypes
    predictions = []
    for row in dfeval.iterrows():
        example = tf.train.Example()
        for i in range(len(colNames)):
            dtype = dtypes[i]
            colName = colNames[i]
            value = row[1][colName]
            if dtype == "object":
                value = bytes(value, "utf-8")
                example.features.feature[colName].bytes_list.value.extend(
                    [value])
            elif dtype == "float":
                example.features.feature[colName].float_list.value.extend(
                    [value])
            elif dtype == "int":
                example.features.feature[colName].int64_list.value.extend(
                    [value])
                
        predictions.append(
            importedModel.signatures["predict"](
                examples=tf.constant([example.SerializeToString()])
                )
        )
            
    return predictions

In [13]:
# Deleting the label column from dfeval since we will be passing the 
# dataset itself instead of a function which does it for us.

dfeval.drop(columns=["survived"], inplace=True)

In [14]:
predictions = predict(dfeval, importedModel)

Let's see first 10 predictions.

In [15]:
newPreds = []
for pred in predictions[:10]:
    # change 'probabilities' with 'predictions' in case 
    # of regression model.
    newPreds.append(np.argmax(pred["probabilities"])) 
newPreds

[0, 1, 1, 1, 0, 1, 1, 0, 1, 1]