## A simple Iris classifier using pre-built TensorFlow Estimators

In [1]:
import tensorflow as tf
import numpy as np
import pandas as pd
tf.VERSION

'1.13.1'

In [2]:
# Constants
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth',
                    'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']

In [3]:
# Load datasets
training_set = pd.read_csv("iris_training.csv",names=CSV_COLUMN_NAMES, header=0)
test_set = pd.read_csv("iris_test.csv",names=CSV_COLUMN_NAMES, header=0)

In [4]:
train_x, train_y = training_set, training_set.pop('Species')
test_x, test_y = test_set, test_set.pop('Species')

In [5]:
train_x.head()

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth
0,6.4,2.8,5.6,2.2
1,5.0,2.3,3.3,1.0
2,4.9,2.5,4.5,1.7
3,4.9,3.1,1.5,0.1
4,5.7,3.8,1.7,0.3


In [15]:
train_y.head()

0    2
1    1
2    2
3    0
4    0
Name: Species, dtype: int64

### Simple Estimator Workflow
* Create one or more input functions.
* Define the model's feature columns.
* Instantiate an Estimator, specifying the feature columns and various hyperparameters.
* Call one or more methods on the Estimator object, passing the appropriate input function as the source of the data.

#### Input Function

You must create input functions to supply data for training, evaluating, and prediction.

An input function is a function that returns a tf.data.Dataset object which outputs the following two-element tuple:

* features - A Python dictionary in which:
    * Each key is the name of a feature.
    * Each value is an array containing all of that feature's values.
* label - An array containing the values of the label for every example.


In [9]:
# Input function simple example:

# def input_evaluation_set():
#     features = {'SepalLength': np.array([6.4, 5.0]),
#                 'SepalWidth':  np.array([2.8, 2.3]),
#                 'PetalLength': np.array([5.6, 3.3]),
#                 'PetalWidth':  np.array([2.2, 1.0])}
#     labels = np.array([2, 1])
#     return features, labels

In [36]:
def input_fn(feature_frame, label_series, col_list):
    features = {}
    for i in range(len(col_list)-1):# -1 to skip species
        features[col_list[i]] = feature_frame[col_list[i]].values
    labels = label_series.values
    return features, labels

#### Feature columns

A **feature column** is an object describing how the model should use raw input data from the features dictionary. When you build an Estimator model, you pass it a list of feature columns that describes each of the features you want the model to use. The `tf.feature_column` module provides many options for representing data to the model.

For Iris, the 4 raw features are numeric values, so we'll build a list of feature columns to tell the Estimator model to represent each of the four features as 32-bit floating-point values.

In [7]:
# feature columns describe how to use the input.
# We'll use df.keys() which extracts column names and list comprehension to work this out
feature_columns = [tf.feature_column.numeric_column(key=key) for key in train_x.keys()]
feature_columns

[NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

#### Instantiate Estimator as a Linear Classifier

In [8]:
# Build classifier using estimator
classifier = tf.estimator.LinearClassifier(
    feature_columns=feature_columns,
    n_classes=3,
    model_dir="/tmp/iris_model")

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/iris_model', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x0000027553C42780>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


Train the classifier

In [41]:
classifier.train(
    input_fn=lambda:input_fn(train_x,train_y,CSV_COLUMN_NAMES),#wrap function in lambda so it is not a 'result'
    steps=1000)

INFO:tensorflow:Calling model_fn.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/iris_model\model.ckpt.
INFO:tensorflow:loss = 131.83344, step = 1
INFO:tensorflow:global_step/sec: 432.572
INFO:tensorflow:loss = 37.13907, step = 101 (0.232 sec)
INFO:tensorflow:global_step/sec: 1323.46
INFO:tensorflow:loss = 27.859365, step = 201 (0.075 sec)
INFO:tensorflow:global_step/sec: 955.967
INFO:tensorflow:loss = 23.044888, step = 301 (0.107 sec)
INFO:tensorflow:global_step/sec: 954.361
INFO:tensorflow:loss = 20.058022, step = 401 (0.109 sec)
INFO:tensorflow:global_step/sec: 867.745
INFO:tensorflow:loss = 18.008251, step = 501 (0.109 sec)
INFO:tensorflow:global_step/sec: 899.654
INFO:tensorflow:loss = 16.505016, step = 601 (0.114 sec)
INFO:tensor

<tensorflow_estimator.python.estimator.canned.linear.LinearClassifier at 0x27553c421d0>

Evaluate the classifier

In [44]:
accuracy_score = classifier.evaluate(input_fn=lambda:input_fn(test_x,test_y,CSV_COLUMN_NAMES),
                                     steps=100)['accuracy']
print('Accuracy: {}'.format(accuracy_score))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-04-14T21:05:26Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/iris_model\model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [10/100]
INFO:tensorflow:Evaluation [20/100]
INFO:tensorflow:Evaluation [30/100]
INFO:tensorflow:Evaluation [40/100]
INFO:tensorflow:Evaluation [50/100]
INFO:tensorflow:Evaluation [60/100]
INFO:tensorflow:Evaluation [70/100]
INFO:tensorflow:Evaluation [80/100]
INFO:tensorflow:Evaluation [90/100]
INFO:tensorflow:Evaluation [100/100]
INFO:tensorflow:Finished evaluation at 2019-04-14-21:05:27
INFO:tensorflow:Saving dict for global step 1000: accuracy = 0.96666664, average_loss = 0.12096447, global_step = 1000, loss = 3.628934
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 1000: /tmp/iris_model\model.ckpt-1000
Accuracy: 0.96

**96.6%** - Not bad. Not bad at all.