# Estimator example
In this example, I will do a quick walkthrough on how to use the estimators API.

## Overview
- Classifier: [DNN](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNClassifier)
- Dataset: [IRIS](https://en.wikipedia.org/wiki/Iris_flower_data_set)
- Final acc ~96%

### Resources
- [Estimators documentation](https://www.tensorflow.org/api_docs/python/tf/estimator)
- [DNN Classifier Documentation](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNClassifier)
- [Tensorflow github repository](https://github.com/tensorflow/tensorflow)
- Note: this notebook was inspired by the official tensorflow estimators quickstart guide, which can be found [here](https://www.tensorflow.org/get_started/estimator)

In [1]:
# common packages
import tensorflow as tf
import numpy as np
import sys
import os

# displaying images
from matplotlib.pyplot import imshow
%matplotlib inline

# download data
from six.moves.urllib.request import urlopen


# Helper to make the output consistent
SEED = 42
def reset_graph(seed=SEED):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

reset_graph()


# set log level to supress messages, unless an error
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

# Version information
print("Python: {}".format(sys.version_info[:]))
print('TensorFlow: {}'.format(tf.__version__))

# Check if using GPU
if not tf.test.gpu_device_name():
    print('No GPU found')
else:
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

Python: (3, 5, 4, 'final', 0)
TensorFlow: 1.4.0
Default GPU Device: /device:GPU:0


## Obtain and load data

In [2]:
## Download data paths
ROOT_DATA = "../ROOT_DATA/"
DATA_DIR = "IRIS"

IRIS_TRAINING_PATH = os.path.join(ROOT_DATA, DATA_DIR, "iris_training.csv")
IRIS_TRAINING_URL = "http://download.tensorflow.org/data/iris_training.csv"

IRIS_TEST_PATH = os.path.join(ROOT_DATA, DATA_DIR, "iris_test.csv")
IRIS_TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"

In [3]:
# download data
## training
if not os.path.exists(IRIS_TRAINING_PATH):
    raw = urlopen(IRIS_TRAINING_URL).read()
    with open(IRIS_TRAINING_PATH, "wb") as f:
        f.write(raw)
    print(IRIS_TRAINING_PATH, "path written")
else:
    print(IRIS_TRAINING_PATH, "path exists")

## test
if not os.path.exists(IRIS_TEST_PATH):
    raw = urlopen(IRIS_TEST_URL).read()
    with open(IRIS_TEST_PATH, "wb") as f:
        f.write(raw)
    print(IRIS_TEST_PATH, "path written")
else:
    print(IRIS_TEST_PATH, "path exists")

../ROOT_DATA/IRIS/iris_training.csv path exists
../ROOT_DATA/IRIS/iris_test.csv path exists


In [4]:
# Load dataset
training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
                  filename=IRIS_TRAINING_PATH,
                  target_dtype=np.int,
                  features_dtype=np.float32)
test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
              filename=IRIS_TEST_PATH,
              target_dtype=np.int,
              features_dtype=np.float32)

## Classifier

### Dataset information

In [5]:
# print some dataset information
print("training shape:", training_set.data.shape)
print("test shape:", test_set.data.shape)

# quick check
assert training_set.data.shape[1] == test_set.data.shape[1], \
    "Rut row raggy, features don't match ({} vs {})".format(training_set.data.shape[1],
                                                            test_set.data.shape[0])
if training_set.data.shape[1] == test_set.data.shape[1]:
    NUM_FEATURES = training_set.data.shape[1]
    print("num features = {}".format(NUM_FEATURES))
    
print("training target information: {} targets, {} classes".format(\
    len(training_set.target), len(set(training_set.target))))
print("test target information: {} targets, {} classes".format(\
    len(test_set.target), len(set(test_set.target))))

training shape: (120, 4)
test shape: (30, 4)
num features = 4
training target information: 120 targets, 3 classes
test target information: 30 targets, 3 classes


In [6]:
# create feature column
# in this example training_set.data.shape[1] is 4
feature_columns = [tf.feature_column.numeric_column("x", shape=[NUM_FEATURES])]
if len(set(training_set.target)) == len(set(test_set.target)):
    NUM_CLASSES = len(set(training_set.target))
    print("Number of classes = {}".format(NUM_CLASSES))
else:
    print("number of classes in training and test set don't match")
    NUM_CLASSES = len(set(training_set.target))
    print("WARNING: num classes has been set to {} to match training set".format(NUM_CLASSES))

Number of classes = 3


### Build Basic DNN Classifier

In [7]:
# DNN classifier will be projected to 20, then filtered down to num classes
# hidden units = [input: NUM_FEATURES] -> 20, 16, 12, 8 -> [out: NUM_CLASSES]
classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,
                                        hidden_units=[20, 16, 12, 8],
                                        n_classes=NUM_CLASSES,
                                        model_dir="./saver/iris_model")

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_log_step_count_steps': 100, '_keep_checkpoint_every_n_hours': 10000, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f11a6654940>, '_master': '', '_tf_random_seed': None, '_is_chief': True, '_task_id': 0, '_num_worker_replicas': 1, '_service': None, '_num_ps_replicas': 0, '_task_type': 'worker', '_model_dir': './saver/iris_model', '_session_config': None, '_keep_checkpoint_max': 5, '_save_summary_steps': 100}


In [8]:
# create input function
# "x" will accept our input data
# y will accept our input labels
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": np.array(training_set.data)},
    y=np.array(training_set.target),
    num_epochs=None,
    shuffle=True)

### Train our classifier

In [9]:
classifier.train(input_fn=train_input_fn, steps=2500)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into ./saver/iris_model/model.ckpt.
INFO:tensorflow:loss = 273.882, step = 1
INFO:tensorflow:global_step/sec: 419.817
INFO:tensorflow:loss = 23.5532, step = 101 (0.240 sec)
INFO:tensorflow:global_step/sec: 478.717
INFO:tensorflow:loss = 6.82936, step = 201 (0.210 sec)
INFO:tensorflow:global_step/sec: 448.616
INFO:tensorflow:loss = 8.22148, step = 301 (0.221 sec)
INFO:tensorflow:global_step/sec: 471.132
INFO:tensorflow:loss = 8.47243, step = 401 (0.212 sec)
INFO:tensorflow:global_step/sec: 509.801
INFO:tensorflow:loss = 4.56142, step = 501 (0.197 sec)
INFO:tensorflow:global_step/sec: 340.667
INFO:tensorflow:loss = 5.29664, step = 601 (0.293 sec)
INFO:tensorflow:global_step/sec: 370.546
INFO:tensorflow:loss = 5.60795, step = 701 (0.270 sec)
INFO:tensorflow:global_step/sec: 471.172
INFO:tensorflow:loss = 7.73368, step = 801 (0.212 sec)
INFO:tensorflow:global_step/sec: 474.105
INFO:tensorflow:loss = 9.7728

<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x7f11a6654518>

### Evaluate our classifier

IMPORTANT: Note the `num_epochs` specification here.  It is 1.  This is because when we evaluate our classifier with our test data we only want to run the set through the classifier once.  `test_input_fn` will iterate over the data (once in this case) then raise a `OutOfRangeError` which will stop the evaluation.

In [10]:
test_input_fn = tf.estimator.inputs.numpy_input_fn(
      x={"x": np.array(test_set.data)},
      y=np.array(test_set.target),
      num_epochs=1,
      shuffle=False)

In [11]:
test_metrics = classifier.evaluate(input_fn=test_input_fn)

INFO:tensorflow:Starting evaluation at 2017-11-06-16:13:12
INFO:tensorflow:Restoring parameters from ./saver/iris_model/model.ckpt-2500
INFO:tensorflow:Finished evaluation at 2017-11-06-16:13:12
INFO:tensorflow:Saving dict for global step 2500: accuracy = 0.966667, average_loss = 0.0544802, global_step = 2500, loss = 1.63441


In [12]:
print("Test accuracy: {:.4f}%".format(test_metrics["accuracy"]*100))

Test accuracy: 96.6667%
